Mistakes to Avoid When Setting Up a CI Pipeline

Written August 19, 2022.

Continuous Integration (CI) is one of the basic building blocks of modern software engineering. It offers many advantages and is an indispensable tool in any DevOps endeavor. However, there are a few mistakes that look innocent at first, but may be a burden in the long run. In this post I’ll tell you what to look for and which pitfalls to avoid.

Top Five Mistakes to Avoid

There are numerous advantages why your team should use a CI pipeline, just to name a few:

Automation reduces maintenance and update cost.
It increases reliability and fault traceability through continuous feedback and tests.
It enables simplified rollback after bugs are identified.

Last but not least, extending a CI pipeline to Continuous Delivery (CD) makes release management a choice rather than a chore. Artifacts are automatically created, and we can choose to release a build artifact anytime.

In the following I list a few common misconceptions I have seen out there that are in conflict with best practices and that you should avoid.

Mistake 1: Rely Exclusively on the UI

Many modern CI systems (like Teamcity or Jenkins) offer a powerful and versatile UI. You can declare build steps, dependencies and triggers. This makes it very easy to get results fast.

There is a downside, however: If you put build steps in scripts that are declared in the UI then those are no longer part of your source code base. You lose version control and changes to those build recipes may get out of sync with your code base. That means you can no longer build older versions of your software. In the worst case you’ll even lose the code if it is not backed up.

Developers may also want to reproduce problems on the build agents and if they need to look up the exact build steps the build agent has executed (or worse, they need to gain permission first) it will be a huge hit for productivity.

The solution is simple: use the UI to get started, then put the exact steps in scripts and the necessary build configurations inside a declaration in your source code base. The configuration set up in the UI can even be exported, see e.g. this post for TeamCity. That way the build configuration is part of the code base and under version control, can be accessed by all developers with source code access and does not reside at some other odd place.

The CI system only needs to check out/clone the source code. It should not host any complex tasks, scripts or configurations.

Some CI systems like GitHub actions or Travis default to reading the configuration directly from the source code.

Mistake 2: Put Your Secrets At a Not-So-Secret Place

Unfortunately, I found that secrets – like username and password of a technical user or API tokens – can be found in the source code base of some projects. This is particularly detrimental in public projects, see e.g. this post.

There are advantages to directly committing your secrets which makes it tempting to do so. Developers have full access once they have checked out the source code. They don’t need to do anything anymore after they logged in. The CI pipeline can deploy directly without any additional steps. That sounds useful, doesn’t it?

Yes, it is convenient. But it comes at a huge price: if your source code leaks (a laptop gets stolen or hacked or your version control system gets compromised) then not only your development environment is compromised, but your entire deployment, too! In the worst case even your customer data may be accessible which might threaten the very existence of your organization! If the government doesn’t get you, your bad reputation almost certainly will!

So even if it’s tempting, don’t fall for it — take the extra steps to protect your deployment! This is the one major exception to the advice above in Mistake 1: build scripts and configurations must be put under source control, secrets must not!

Secrets MUST be protected and MUST NOT, UNDER ANY CIRCUMSTANCES BE HARD-CODED IN YOUR SOURCES!

Possible solutions: in simple cases, credentials can be stored in the local environment of each developer. This is also the way the build agent would feed them to the build job. More sophisticated solutions are the AWS Secrets Manager or HashiCorp Vault.

Mistake 3: Use Plugins Whenever Possible

Big and popular CI systems like Jenkins have huge plugin repositories with thousands of plugins. They solve a problem right away without needing to figure out the corresponding command line.

This is a blessing and a curse. You get started quickly, but what if the command line changes? In case the plugin is smart enough and gets updated in a timely manner you don’t need to change anything. Otherwise, you have to figure out the new command line yourself.

However, local developers still have to do this “figuring out”. There is no way around it and if the plugin is not updated on time you have to figure out the entire command line anyway.

All in all, relying on plugins as a shortcut to figuring out the command line is more risky than it’s worth. This is essentially a corollary to Mistake 1: do not store information in the CI, but prefer your source code base.

Try to minimize use of plugins; if you can do without them, then do it.

Mistake 4: Tolerate Fragile Tests

This is a very common culprit: a project with decent test coverage runs smoothly and the code works before pushing the changes. Then a test fails on the build agent. Maybe a problem with some server on the internet? A rare coincidence in the random number generator? Cosmic rays?

Let’s just rerun the job … and … yep, the tests are green.

Yes, it happens. If you are not guilty of having rerun a job with a “random test failure” you are probably unemployed. And it’s not a problem if it happens once in the lifetime of the project.

The problem manifests if this happens regularly. The path from a single, one-time trial to chronic rerunitis before merging is dangerously smooth.

The solution is both simple and painful: find the culprit and fix it! It may be the same test that is fragile. Or it may be a certain succession of tests where a previous test didn’t clean up the environment properly.

The last resort is to disable and remove a failing test without fixing it. Yes, it is heresy and I shall burn in hell for suggesting this! But given the choice between the devil and the deep blue sea, disabling the test is the lesser evil. You don’t fix your bug, but you salvage the credibility of the rest of the code. If frequently failing tests are left in a project they defy the very purpose of automated tests, increasing code quality. Removing a failing test does not improve code quality, but it will not reduce it, either. If the entire test suite ceases to be a reliable indicator of your code base, it is almost guaranteed to drag down the rest of your code. But, still, leave this option as a last resort if you cannot fix the root cause for the failure.

Make sure your tests are a reliable indicator for your code base. High coverage is desirable, but not at the cost of fragility. Fight rerunitis!

Mistake 5: Rely on Automation For All Your Tests

After having replaced humans by robots on your entire factory floor, close your HR department. Well, not quite, at least not on the software factory floor.

There is no question that CI improves your workflow and is an integral part of modern software development. But it is not the panacea to all questions of QA. Some things cannot be tested automatically (like Bluetooth on mobile phones), at least not without major hassle. Good QA engineers will test your app not only by entering 10000-byte long Unicode Klingon and Elven names into the username field, but they will even try to use regular English names (something no software engineer will think of when writing their tests). They will notice if the table scrolling feels sluggish if your list of friends exceeds the first half of the screen.

A project lead in charge of a team of hundreds of developers once told me that prior to every release he installed the app on his cellphone and played with it for just a few minutes to make sure there are no major blunders. If he didn’t do that it would be the customers who are neither patient nor forgiving. If he can do that, so can others.

CI does not replace all manual testing. It replaces mundane, repetitive tasks, but not the real job of a QA engineer.

Summary

There you have it — five mistakes to avoid to make sure you get most out of your CI pipeline. Strive to keep your code in your repository, with secrets being the one major exception. Don’t use fancy plugins if you can avoid them. Do not tolerate fragile tests, but either fix or remove them. And don’t replace all manual testing with automation.