In today’s software industry, it all comes down to agility. Developing lean artifacts, rapidly, in short, frequent cycles is a must to survive the inpatient market. Managing this kind of operation isn’t easy, and therefore the community is motivated to keep pushing this field forward to be more efficient and flexible than ever. Marrying modern DevOps culture with state-of-the-art Continuous Delivery pipelines is widely practiced in both startups as well as large enterprises. There’s a lot of knowledge out there to get you started on that path, but if (and when) you do, make sure to address the next four catches.
Balance Thorough Testing Against Time-to-Market
Once you implement a Continuous Delivery pipeline to support your entire workflow, you’ll start noticing an issue with how workflows are processed: Each job within the pipeline runs for a long time.
Usually, this is because we want to make sure all tests pass and no new errors are introduced into the system. An important goal but realistically it means that once a developer has finished working on a feature/fix, they’re effectively out of the game while waiting for it to finish running (I’ve even seen pipelines that take up to several hours to run). This waiting period is not agile. There is a quality <-> time-to-market tradeoff that must be made here. One textbook solution is to utilize TDD methodology to cut down the testing time within the Continuous Delivery pipeline. It’s an excellent solution and one that will shave down most of the wait. But it comes with a downside. Shifting the critical mass of your tests from functional to unit (as in traditional TDD) has implications on the way you mitigate and manage risk. In short, some bugs might slip through unit tests and be discovered only in functional tests. If this is happening because of the nature of the tests, you can use a canary/dark deployment approach and rely on a small group of users to catch bugs before triggering a rollback (blue/green swap). If bugs are slipping through due to imperfect implementation, you will need to enforce Continues-Testing to see the big picture regarding quality improvement in each Continuous Delivery pipeline cycle.
A Single Point of Failure
We all know the following scenario: you’re working on your regular daily routine, teams push code to the Continuous Delivery pipeline, and services are continuously deploying to environments. Then the Continuous Delivery master server fails, and everything freezes in place. Recovering the service and reviving all lost jobs is a headache and time consuming.
When a Continuous Delivery (e.g. Jenkins) master fails (due to software or hardware crashes), the result is extended downtime for the entire product team. Administrators detect these failures manually or through homegrown scripts. Once a failure is noticed, administrators scamper to get the master back up as quickly as possible This is often done manually and can easily take a few hours or more. On larger projects, the downtime experienced from a failure can be the equivalent of several days of lost project time.
So, when you decide to set up your awesome Continuous Delivery pipeline that will support your entire workload, plan ahead for this eventuality. Design for high-availability and consider the same scaling problems you would just as if it were any other service/bottleneck. P.S. here is how to setup Jenkins properly with high-availability.
There Is No One Architectural Truth
Continuous Delivery pipeline designs are like snowflakes: from afar they all look alike, but no one is like the other. Since Continuous Delivery services want to support every style/approach/tech (essentially everything), they have to have a flexible architecture that can be constructed and configured in a million and one ways. It’s safe to say that the numbers of Continuous Delivery configs I’ve seen are equal to the number of brains employed to set them up (have you ever seen any identical Jenkins configs?). The point is, the field suffers from a lack of standardization to gain extra flexibility, and that places more responsibility on the architect.
There is a lot of knowledge out there. Use it and learn from others pain and experience. Take on best practices that fit your architecture + tech-stack, and be consistent. Try to minimize proprietary hacks and anti-patterns, stay updated, and be clean and tidy. If you don’t, nothing will enforce it for you and your spaghetti pipeline will be unmaintainable in no-time.
Distribution Of Responsibilities
This ideal state also introduces a problem. Let’s assume the following:
- Architecture is completely SOA.
- Many services exist and live independently from one another.
- Deployment is decoupled. Each team/service can roll out whenever.
- Teams have a good work culture and are full owners of their service. They’re an autonomous micro-company responsible for everything from product design to service scale when live.
We began to realize that under these circumstances, developers within teams hold the same keys that enrich the company’s live product/s rapidly and efficiently but can also cause those product/s to misbehave, break or even worse, completely stop. So what do you do? Well, you’ll have to take a leap of faith. It’s a matter of proper DevOps education for everyone in the company, where employees are educated to be owners (as opposed to old-fashioned high-tech monarchies we all know; see asshole-driven-development). A proper education will emphasize the responsibilities that come with ownership power and will cause teams to interact, cooperate, help others, innovate and grow towards a mutual goal: build awesome stuff. That’s without even mentioning the ‘sense of making an impact’ which is what almost every employee is after.
Oh, and this is scalable. It doesn’t matter if you have 20 employees or a 1,000.
“any employee in the Toyota Production System has the authority to stop production to signal a quality issue, emphasizing that quality takes precedence (Jidoka). The way the Toyota bureaucratic system is implemented to allow for continuous improvement (kaizen) from the people affected by that system so that any employee may aid in the growth and improvement of the company.”
The Toyota Way, Wikipedia
Unfortunately, I can’t promise all smooth sailing from here on out. Problems that will halt your production-line are inevitable. Bugs, as human errors, are here to stay. There is a need for Continuous Testing frameworks that fits today’s challenges and technologies and can join the Continuous Delivery pipeline game to continuously improve your product, it’s building-blocks, and overall quality.