RPM deployment for Continuous Delivery @ImmobilienScout24

worked on by: Maximilien Riehl

Due date

I finally got the thesis registration to the Prüfungsbüro and updated the due date accordingly. I will deliver on September 20th, 2012

Outline

The idea behind Continuous Delivery is that releasing software towards production should not fill operations, developers and managers with fear but be an easy, transparent, reproducible and safe step.

Since the sole reliable way to achieve faster, more reliable and easier releases is through automation, this thesis' goal is to elaborate on the migration towards deployment automation in a case study where the software (which is a very complex, loosely coupled web application) will be delivered in an automated fashion using RPM packets. The software is currently delivered using a semi-automatic, network share based deployment stack with a cycle time of one week.

The deployment migration boils down to setting up an infrastructure to push the software packets throughout the staging process using RPM packets and actually use it, as well as implementing a deployment pipeline that takes source code as input and produces deployed and tested stages.

Thesis Requirements

  • Deploy at least one module of the web application to production in an automated fashion
  • Elaborate on rollout strategy (risk management, transition, coordination amongst service providers)
  • Elaborate on (mostly automated) testing which is a crucial part of continuous delivery. This also includes deployment process and configuration testing!
  • Elaborate on liabilities and consequences of using automated RPM deployment (configuration, deployment changes, software service coordination)
  • Build a working deployment chain and provide ways to host the binaries (repositories) and mark them with state (propagation through the CLD pipeline stages)
  • Elaborate on security since a bastion architecture is present and naive automation would break the security
  • Document decisions
  • Open-source any open-sourceable components (see this blog post)

Milestones and Planning

A milestone is a scheduled event signifying the completion of a major deliverable or a set of related deliverables. A milestone has zero duration and no effort -- there is no work associated with a milestone. It is a flag in the workplan to signify some other work has completed. Usually a milestone is used as a project checkpoint to validate how the project is progressing and revalidate work. (Source: http://www.mariosalexandrou.com/definition/milestone.asp)

Milestone no. Past days Progress Goals target accomplished wrench
1 DONE 25 100% Deploy feedservice to dev (automatic test environment) DONE
2 DONE 27 100% Deploy feedservice to tuv50 (automatic production like test environment) DONE
3 DONE 12 100% Deploy feedservice to tuv (manual production like test environment) DONE
4 DONE ? 100% Deploy RPMized feedservice to production one machine at a time DONE
5 DONE ? 100% Open source yum-repo-server and yum-repo-client DONE
6 DONE ? 100% Complete schedule and coordination with external contractors like computacenter DONE
7 DONE ? 100% Take part in a ZDD (ZeroDowntimeDeployment) with the old manual deployment stack to evaluate advantages / liabilities of the new stack DONE
8 DONE ? 100% Migration of other selected function groups (Rest-api, XMLRPCAPI) DONE
9 DONE ? 100% Finish writing thesis DONE
...

Weekly Status (Most recent at the top, so you don't need to scroll forever). Usually updated on Mondays.

Week 11 (CW 38) (LAST)

DONE Thesis printed and delivered

Next steps : prepare the final presentation

Week 11 (CW 37)

Activities

  • Last additions to the Durchführung part (DONE)
  • Last additions to the appendix (DONE)
  • Rewriting dubious or difficult to understand sections (DONE?)
  • Work on the conclusion (results) and Ausblick (DONE)

Results

  • 44 pages (not counting the appendix). The images are eating up loads of space so I think this ok (still I'd like to get down to ~30)

Next Steps

  • Release/Print next week on Wednesday, deliver the thesis on thursday

Week 11 (CW 36)

Activities

  • Improved the exception handling from the yadt-config-rpm-maker which was rewritten in Python (not by me)
  • Wrote a bit for the Ausarbeitung
  • Refactored the Ausarbeitung A LOT.
  • Meeting (20 min.) with Prof. Dr. Prechelt

Results

  • Thesis went down from 54 pages to 36, currently. I am obviously not including the appendix pages in this number.
  • Prof. Prechelt had good insights about how I can estimate the amount of work I did without writing about every detail.
  • Banishing any informations that are not essential into the appendix is the way to go
  • I planned for a Zweitgutachter : Prof. Dr. Adrian Paschke (suggested by Prof. Prechelt) agreed to take the role

Next Steps

  • I want to get the thesis under 30 pages, but as it's currently 36 pages, and there are still some sections to \include{}, it seems out of grasp. Maybe agressive refactoring will help.
  • Try to get as many people as possible to read the Ausarbeitung (3, currently, where only one person has good computer science knowledge). Goal : identify weaknesses (and plain old language errors), inconsistencies and difficult to follow sections. Those will be banished into the appendix.
  • Meeting with a team leader who knows how other businesses deploy (using chef & co). Incidentally this person also has a focus on persistence/recovery (chaosmonkey) so this will be VERY enlightening

Week 10 (CW 35)

Activities

  • Comparing yadt with puppet and chef (deployment orchestration tools)
  • Reading through psychologics papers about fear of change in order to undermine some sections with citations
  • Made the build chain more transparent and reliable by ensuring that expected RPMs are actually produced and uploaded to yum-repo-server, failing fast if RPMs are missing
  • Restructured the beginning of the thesis so that alternatives solutions have more context
  • Removed many TODO sections from the thesis

Next steps

  • Migration of the XMLRPC function group

Week 9 (CW 34)

Activities

  • Writing the Ausarbeitung

Problems

  • Moved out of my appartment to another one, which did cost me a lot of time (and pain)
  • The "Fachbereichsdruckerei" service (which I wanted to entrust with my thesis for printing and hardcovering) is absolutely ridiculous, I wrote a small request for information a month ago AND STILL NO ANSWER!!! I have decided to go with an online service - it's not going to be cheap but at least I can rely on it..

Next steps

  • Writing more

Week 8 (CW 33)

Activities

  • Writing the Ausarbeitung
  • More internal documentation
  • Read the "Foundation of enterprise software deployment" paper
  • Read the "Reducing Complexity of Software Deployment with Delta Configuration" paper
  • Investigating deployment issues as they happen, because both deployment stacks are now being used in parallel

Results

  • Expanded thesis part about deployment with two stacks in parallel
  • Expanded thesis basics
  • Experienced that in some cases, if RPM does not seem to be able to deliver something, you're probably (but not certainly) looking at it the wrong way and in fact it is dead simple (while delivering a jre certificate using RPM)

Problems

  • Was a bit sick during the weekend

Next steps

  • Maybe have a look at yadtshell-receiver/controller and/or twisted if I have the chance
  • Continue writing, have both computer literates and illiterates review the thesis
  • Cut down thesis size (>30 pages is way too much) - this has to wait until I am done writing, only then will I be able to shorten it effectively

Week 7 (CW 32)

Activities

  • Writing the Ausarbeitung
  • Writing internal documentation
  • Appointment with Prof. Dr. Prechelt, everything is fine

Results

  • Thesis is approx. 30 pages now

Problems

  • I am not convinced that it is possible to translate "deployment" into the german language without making the reader crazy. I have replaced everything with a latex macro in order to be able to swap between translations within seconds

Next steps

  • More focus on writing the thesis

Week 6 (CW 31)

Activities

  • Writing the Ausarbeitung
  • Delivery chain artifact cleanup (-> yum-repo-server extensions and improvements)

Results

  • Thesis is approx. 20 pages (with lots of WIP sections still)
  • Cleanup jobs run on a scheduled basis

Problems

  • None

Next steps

  • More practical work since I seem to be more productive writing on evenings / week ends

Week 5 (CW 30)

Activities

  • Discussed the old deployment stack with a system developer who gave me loads of information and lessons learned that comes in handy (it answers the question "why is the migration necessary")
  • Worked on the main Part of the Ausarbeitung

Results

  • My mentor had a good suggestion for relevant literature (the IEEE digital library). I found interesting and related papers, but the cost was a bit off - nothing is for free I guess. I now have a solid bibliography.
  • My appointment with Prof. Dr. Prechelt got delayed two weeks into the future without any notice. Fortunately, I was able to discuss my thesis structure with my mentor and this is no longer blocking me.
  • The rest of the team is almost done with PDF, I'll get back in the loop for WEB, API or APP because it will be both interesting and challenging

Problems

  • The way LaTeX decides to place figures (embedded floating images) in an "optimized" fashion is annoying me

Next steps

  • I have reduced the practical work I do on a daily basis and instead focus on writing and wiring the parts together.

Week 4 (CW 29)

Estimate

  • I am on track according to my plan. I currently have one entire spare week of room for eventual plan changes (if everything goes according to plan that is, Murphy respectfully disagrees)

Activities

  • Setting up a delivery pipeline for the thesis : all files (except build products) are kept in version control and a post commit hook builds and emails the pdf to me along with a changelog built from the svn commit log
  • RPM deployment for the TAT module of the legacy webapp -> RPMization, configurations, deployment staging
  • Discussed ZDD (milestone #7) with OPs, I will take part next week (25.07.2012)
  • Work on the open source yum-repo-server (refactorings, improvements to out-of-the-box experience)
  • Add monitoring to yum-repo-server
  • Let delivery chain use the yadtshell-controller (-> huge improvement to delivery chain security) for DEV staging, paralellization is also an option (not per-chunk but per-host paralellization of course)
  • Thoughts on alternative rollout strategies

Results

  • Wrote nagios checks for the yum-repo-server with a very sympathetic application manager (who did most of the work actually). This is very interesting from a risk management perspective (what happens when risk management fails)
  • Incoming ZDD will deliver material for my Ausblick (what the limits of RPM deployment are and why we can't automate everything)
  • I have a plan for the main part of my Ausarbeitung and it looks good to me. I will discuss it with my mentor and with Prof. Dr. Prechelt next week (week5)
  • I finally got the thesis registration to the Prüfungsbüro and updated the due date accordingly. I will deliver on September 20th, 2012.
  • I developed an algorithm that selects the next module to RPMize based on risk and complexity minimization. It relies on very basic digraph theory.

Next Steps

  • Write main part of the thesis (blocked by : discussion with mentor / Prof. Dr. Prechelt). I have planned 2 weeks for this.

Problems

  • Read a book that was very interesting but had nothing to do with my thesis whatsoever. At least I am even more motivated now!
  • What is the current scientific "state of the art"? Does my thesis corroborate or complete existing ones? I found nothing even closely related yet but maybe I'm looking in the wrong places. Maybe Prof. Dr. Prechelt can shed some light on this next week? The fact that ImmobilienScout24 decided to create its own solutions because there is nothing comparable on the market indicates that it is definitely not very common to CLD with RPMs
  • The deployment controller we are using had a minor security flaw (with major consequences, unfortunately). This is good news (safer product) but also somehow unfortunate as we will have to wait for a new version.

Week 3 (CW 28)

Activities

  • Pushing yum-repo-server through the management approval process for open-sourcing it
  • Refactoring business logic out of yum-repo-server in order to open source it
  • Basic authentication mechanism for yum-repo-server
  • We built a command line client (repoclient) that interacts with yum-repo-server to ease up manual management tasks
  • I wrote a really smart bash autocompletion mechanism for the repoclient that makes working with it a breeze
  • I graphed the service dependencies with an operations colleague

Results

  • Open sourcing of yum repo server approved
  • Almost ready to open source yum-repo-server and repoclient (just a few more refactorings are needed!)
  • Discussions on repository deletion safety -- do we trust people inside the company network, is an automation allowed to delete repositories?
  • Deployed feedservice to TUV (one server with the old stack, one server with the new stack, then both with the new stack)
  • Got RPMized feedservice approved by QA
  • Deployed feedservice to production (took down feedservices deployed by the old stack and deployed them with the new stack)
  • Services dependency tree done

Next Steps

  • Proceed with next legacy module (probably TAT?) as feedservice is done
  • Maintain yum-repo-server on github
  • Finish thesis structure, schedule appointment with Prof. Dr. Prechelt to discuss it

Problems

  • The practical part is advancing very well, but I am not happy with my current thesis structure.
  • I am writing the thesis piece by piece and will wire those pieces up by using includes when I'm ready to "ship". I think this is a good methodology but I will have to be careful to enforce DRY

Week 2 (CW 27)

Activities

  • Delivery chain workshop on Wednesday - Objectives are : Who does what, who knows what, apportioning of tasks and responsibilities, detection of blockers and issues
  • Fixing dbautomation with help from a system developer
  • More work on the yum repo scheduler (refactorings, improved job logging, integration fixes, command line client that wraps our API)
  • Thoughts on security since TUV is production like

Results

  • Thanks to application manager support, we are almost ready to roll out the feedservice on TUV
  • Deployment/Build chain is working again from DEV to TUV50. I've drawn a huge but complete diagram of it.
  • Team SD built a shining new YADT broadcaster for us, which solves some security issues like having to distribute the deployer SSH keys on each machine (because YADT SSH's into the target hosts)
  • Finished the book from Humble. Some good insights but overall I expected a bit more, and I find most of the "real world" examples given hard to believe.

Next Steps

  • Will team up with an operations colleague to draw a dependency map (per server cell and per function group). This is the core step for building a rollout strategy for production.
  • Roll out one feedservice on TUV (app + db + corba-nameservice with the "old" deployment stack (SSC/DeployDaemon)) and roll out one feedservice with the old stack in the same environment. Monitoring, analysis, testing, comparison.

Problems

  • Issues with dbautomation (automatic DB deployment) which is still quite error prone due to the complexity of the process
  • I am working on the Gliederung of the thesis and the sheer amount of information and insights I have is overwhelming - and I'm only starting. I'm not happy with my structure yet because it's difficult to focus on the big picture.
  • The first attempt at securing our API did not work out well with YADT because it tries to parse the password
  • How can I give credit for internal documentation to its rightful owner without violating company policies?

Week 1 (CW 26)

Activities

  • Register the thesis (signature missing as of monday, will take care of that on thursday)
  • Searching for relevant literature, currently reading Continuous Delivery (Humble)
  • Implementing a YUM repository service (using TDD) with the taskforce
  • Prepared build chain configurations for automated deployment in a development and automated testing environment
  • Configured target hosts

Results

  • repository service working and usable, but not yet production ready (performance? security? additional features?)
  • implemented a scheduler with apscheduler (python) to periodically generate metadata on configured repositories

Next Steps

  • write more acceptance tests for the scheduler.
  • migrate away from the current repository solution and towards the newly implemented repository service
  • Run the deployment chain that sets up the automated testing environment with the last known good from the development environment

Problems

  • Didn't find a lot of literature. This is not critical since I'm doing a case study, but annoying nevertheless.
  • Lots, lots, LOTS of blog posts about continuous delivery.
  • A very documentation friendly colleague is leaving the task force (developers are in the task force for 5 weeks).
Topic revision: r34 - 20 Sep 2012, MaximilienRiehl
 
  • Printable version of this topic (p) Printable version of this topic (p)