You have probably stumbled upon the concept of the Testing Pyramid at some point. It’s a visual metaphorical representation of what a product’s test coverage should look like on the different testing layers. It is a somewhat simplistic look at the topic and people often argue about the naming convention for the different layers. However, it is not meant to be a detailed representation but rather a general idea and aspirational goal.
The main point of it still remains valid, especially for teams doing Agile Development. It highlights the value of testing, but even more so the importance of knowing where and how to test. We want our tests to cost less and the results to be fast. How should we accomplish this? Let’s find out together!
A Pyramid Blueprint
To find out how to best select test cases, let’s deconstruct the testing pyramid. When people talk about it they usually refer to something like this:
The concept tells us we should have very few tests at the top because the higher you go up the pyramid, running the tests becomes more expensive and slower. As indicated by the image above – have less heavy turtles, more light bunnies. It takes longer to receive feedback from upper layer tests and they often require more effort to write and maintain. Therefore we want to write tests on the lower layers, where the return on investment is much higher.
We can have multiple layers depending on the system we are working with. In general the idea itself is presented with 3 layers:
- The top layer is end-to-end tests executed directly via the user interface.
- The middle layer is testing through a service layer of the application, such as API testing.
- The bottom layer is made of tests in the smallest possible form, such as unit tests.
The writing and maintenance costs per layer depend on the project state and the development approach, but still, the test runs are always faster the further down you go. Does this mean we should only write tests on the bottom layer? If we do so, we would no longer have a pyramid. While our coverage should lean toward the lowest layer, we can and should have tests on all layers. Why? Let’s find out through a look at all layers from the perspective of a simple login form.
What’s in a login form? At least a username, a password, and a button. Even with just this we already have multiple test cases. We can input a username and a password and click login. We can verify we are redirected to the correct next page after successful authentication. But wait — has the username been filled? Has the password been filled? Is the username existing? Is the password correct?
The more questions we see, the more testing potential we are locating. With each question we have to ask ourselves on which layer should that test go.
The top layer
Here we are mimicking a player’s behavior. Everything they do, we do as well. Typing out or sending the keys to the input fields, clicking the login button, verifying we reach the correct next stage. Similar to a user execution, even a single test will take at least a few seconds to run. The duration of a whole suite of such tests, or even multiple suites, grows very quickly with each test. In addition to the duration, there is a high maintenance effort. Any change in the UI is likely to require changes to the tests.
Rough summary of the test behavior:
- Integration of different components.
- Actual user behavior.
- Slow feedback.
- Fragile.
We want the positive outcomes but not the drawbacks, so we do minimal coverage here.
The middle layer
For most of the test cases, we mentioned we don’t really need the user perspective. For example, we can try doing a direct request for the login form with our data being set up beforehand. We can follow the redirect, to verify that we reach the desired next page. We could avoid spending time typing characters and clicking buttons while testing the different scenarios relating to the authentication of the credentials. There is no UI slowing us down and the tests are not as fragile. Unless the services themselves change, you have little maintenance to do.
Rough summary of the test behavior:
- Integration of different components
- Not actual user behavior
- Fast feedback
- Less fragile
We do lose the user view but we gain quite a few improvements in speed and lower maintenance effort so we do more tests here.
The bottom layer
This layer is essentially small pieces of test code verifying different functions. For example, we can verify that the input for the username or password cannot exceed a certain length, that special characters are handled correctly, etc. There is basically no interaction between the various components here which in turn improves the speed of the tests even more. Maintenance is rare and easy as it directly connects to any changes done on the component at hand. Once again comparing to the previous layers we observe the following:
- No integration
- Not actual user behavior
- Very fast feedback
- Not fragile
Even further gains in terms of speed and maintenance effort can be observed here, but at the same time we lose integration coverage.
In the end we do want to put all 3 layers to use. We want to gain the benefits from all while avoiding the heavy impact of their drawbacks.
Building the Pyramid
How do we turn this approach into practice? We ask ourselves which test cases we should focus on and then we look where these pieces fit within our pyramid.
What cases should be tested?
Simply put: all the different ones. All that differ in functionality or expected outcome. Let’s use as an example something many games have — quests. On the bottom test layer, you can check that the conditions are set up as desired. On the middle test layer, you could check that the conditions can be met with the right user actions. On the top layer, you would check that everything is displayed correctly in the game itself.
Does it make sense to have a test for every single quest? Absolutely not. It would make sense to cover the different quest conditions, the different ways to complete the conditions, and the different interactions the user can do with them. For example, in a strategy city builder you may have a condition to gather a certain amount of game currencies.
These conditions could be achieved in multiple ways:
- Collecting a production
- Picking up a reward
- Purchasing the currency
- Etc.
In addition quests might allow certain actions toward them:
- Skipping/aborting them
- Collecting their reward
- Etc.
It makes sense to have tests for all of these items, because they’re different test cases. It would not make sense to have tests for all game quests with this condition because the functionality and expected result would be the same.
On what layer should the coverage be?
One way to answer this is by asking yourself what it is you want to test, and if this functionality can be checked on a lower test layer. Just keep in mind you cannot push everything down a layer. If you want to see the UI correctly displaying a green check-mark when the quest has been completed, you may not be able to do that on any layer but the top one. If the server only sends the info that the quest is completed, and the client is meant to react to this — you need to verify this in the UI.
But if you simply want to check how the quest responds to a certain action, you can fit this into a lower test layer.
Another way to try to answer this question is to consider ‘when‘ you can do a test. In general the following things apply when it comes to timing of the different test layers:
- The tests on the bottom layer can be done already during the development. They can even be done before development and used to verify the system behavior as it’s being built.
- The tests on the second layer require different components to be already in place. Therefore these become a possibility a bit further down the road.
- The tests on the top layer need most things ready and running. These come even later than the other layers and often closer to end of development.
The timeline could help you figure out what’s the most suitable layer for your tests.
In case you need further ideas, you can always dissect your test case into the simplest possible steps. Then check if you want to cover those individually on a lower layer, integrated on a higher layer, or if you want to go for coverage for both.
Balancing the Pyramid
“I have tests, what now?” Keep writing. Teach others how to do it. Tests are especially helpful for the long-term stability and quality of your system. Make sure you’re writing tests which provide value and and push them as far as you can down the pyramid. But don’t simply exclude any test layer, they all serve their purpose. Find the right balance which does resemble a pyramid.
Discuss the coverage among the team whenever possible. Share the ownership and responsibility. Take input from others and let everyone contribute because they might have cases in mind you haven’t thought of.
Avoid test duplication. If a test case exists on one layer, it doesn’t need to be covered on other layers. If your system already suffers from duplicate tests, add coverage of additional functionality in your tests, to make them unique. Rather than throwing away your work, re-purpose it.
Lastly, don’t put aside tests because they require effort. All good things take time. You’re investing in the future because you would be validating correct behavior every time you touch the system again. No matter if it’s a running or a new project there’s always space for and benefit from writing new tests.
Summary
We can have tests for the same system on the different test layers. In fact, not only can we do that, but we should do that.
Bottom layer tests are super fast and cheap and they give us direct feedback as we’re working on something. We should have many of these and push as many as we can toward that layer.
What the bottom layer tests are lacking is integration of the different components, and this is where we introduce the second test layer. The tests here should validate the connectivity of the various aspects we’re working with.
We should still have some top layer end-to-end tests. With these we verify not only that the functionality and services are working as expected, but also the display behavior meets our expectations. We should however try to not to start with them but consider them as a last line of defense.
Doing all of this is investing in the long term quality and health of a project. It will verify the system is being built in the right way, it will create coverage for potential regression issues, it will bring fast and precise feedback about where and how things break. Most importantly it will free up time for the team and give them the opportunity to build-in even more quality in the system.
InnoGames is hiring! Check out open positions and join our awesome international team in Hamburg at the certified Great Place to Work®.