Integration tests are slow and hard to maintain as they have significantly more system touch points than their unit test counterparts and change more frequently as a result. These complex or sophisticated tests serve a purpose, one that unit tests cannot substitute for, so there is no way to get around writing them and focus exclusively on unit tests. Because they are complex their failure is painful, so we decided to take a look at the most common mistakes (as defined by the community) that set you up for failure and how to avoid them and how to do integration testing.
How to do Smoke integration testing
Stackoverflow: 1,146 results
Smoke testing is exactly what it sounds like: turn it on and see if any smoke comes out. If there’s smoke then there’s really no point to continue testing. This is your most basic quality gate that insures that the critical functionality of your application is working. The result of a failed smoke test should always be the same, dead on delivery, and this is the best way to decide if a test should be categorized as a smoke test or as a performance, regression, etc., test.
So why is it so important to categorize a test as a smoke test correctly? Because smoke tests are the most basic of quality gates they need to be run consistently and continuously, meaning that they take some time to execute and are bottlenecks to higher level testing. Therefore, with smoke testing, less is more. If you don’t stick to your main path functionality tests:
- Your smoke tests will take too long to execute blocking you from failing fast.
- They’ll be so much smoke you won’t be able to see the fire.
A good example of a smoke test is “check if a user is able to log in.” Despite the simplicity of the statement, the test may actually cover various levels of your application: is the load balancer working correctly? Does the page render? Is communication with the database functional? Can we find relevant records?
Smoke tests are not about pinpointing and solving problems they’re about discovering showstoppers that need to be, ideally, blocked from production and/or fixed immediately. Performance testing and regression testing should have their own place in your testing suite, but neither should be mistaken with smoke testing. The best rule of thumb here is KISS.
Your Test Suite Design Kinda Sucks
The debate around which type of tests can replace each other (see here, here, here, and here) is actually quite interesting. The answer here is though that if you’re asking yourself if tests can replace or substitute for each other you need to consider that there may be a problem with your test suite design.
Unit tests are the cheapest tests in terms of development time, duration, and maintenance. The higher in the stack a test runs the higher the cost of maintenance and execution. As end-to-end tests require a full stack to operate they’re more expensive, just their startup might take several minutes. Further, because unit tests are on the unit level, debugging them is much simpler than debugging an end-to-end test. All levels of testing provide value, but if you can assert the same condition at two different levels, opt for the lower and cheaper one.
That said, we did come across something interesting in our own test data when examining this discussion. To provide some background, first, we work in TDD so we always develop unit tests. Second, using the SeaLights platform we analyze not only unit test code coverage but, code coverage on all test levels and aggregate code coverage. Meaning that we can see the actual implications of the discussion above.
We were surprised to find, that despite working in TDD and actively aiming to minimize test overlaps (there are cases, such as negative tests, where overlaps are a good thing) our unit and integration tests were asserting the same conditions. While this wasn’t causing our integration tests to fail our test suite was neither effective nor efficient. Even working in TDD with “correct” test suite designs doesn’t ensure that you are maximizing your effectiveness or knowing how to do integration testing. As a result we now use our coverage metrics to plan our test optimization and development.
Do you know what your tests aren’t covering?
Bad Implementation, Good Methodologies
- Automate It All: Automated testing is pretty much accepted as a fact by this point. However, if you’re not writing tests first it is very easy to over-engineer your automation. Unit tests especially require a careful design (or even redesign) of code. Some components are less testable than others, but if they can be decoupled smaller units should probably be easier to assess.
- Manual Runs of Sophisticated Tests: where are the benefit of writing sophisticated, time-consuming, expensive and repetitive tests if they are run manually? Not only do automated tests take less time, they enable focus on problems, analyzing test results, and debugging. Meaning that you spend more time responding to automated test failures rather than remember to run all the necessary tests, check the output, and compare it with the previous run.
- Diving Deep with Databases: In regards to databases, but relevant to most external resources, it is worthwhile to remember to keep testing dependencies to a minimum. If an application can run with SQLite during testing, stick with that. Setting up and connecting to a real database or service may cost time while providing little value. That doesn’t mean integration with the external resource should not be checked. It should, but separate from the integration of your own components.
- Stability: (aka flakiness) If tests fail repeatedly, it’s only a matter of time before people stop using them. While keeping all tests running at 100% all the time may not be feasible, not running tests at all is not the solution. Fixing them is. If fixing a test is too expensive an endeavor, it might be better to disable the test than to generate information noise.
Lastly, don’t overdo it. All testing needs to serve a business purpose. Testing just for the sake of testing is not productive. Code coverage is a useful metric, but checking if “return 4 returns 4” is not a good investment of anyone’s time.
Sometimes your tests may not be poorly designed, but nevertheless perform poorly on testing infrastructure. Other times, you may have not taken into account the specifics of an automated environment. Either way, the infrastructure tests run on is as important as the tests themselves. There needs to be very clear communication between business units so there is no time wasted on parallel work. While computer time costs less than manual labor, this should not be a reason for writing poorly performing code.
Invest in your infrastructure as you would in hiring new employees: make sure the requirements are met; in this case, the testing requirements. Running e2e tests may require a lot of memory and processing power. Testing on virtual machines may be even more demanding; even more so when GUI comes in. Don’t let your infrastructure become a bottleneck in release process. When you see tests waiting in an execution queue or taking too long to complete just add more machines. Spinning up new AWS instances, for example, doesn’t take too long and developers time is worth more.
Send me a free whitepaper on how to shift-left testing with Docker
In a few days we’ll be releasing a whitepaper detailing step by step how SeaLights switched from a linear process mindset to support non-linear deployments by utilizing Docker. With it you’ll get the tools you need to help you improve your speed and efficiency. See how we leveraged containers to test sooner and fail even faster and apply it to your CI/CD.
Plug in your info and get the whitepaper once it goes live >>