In our current world of test design we like to focus heavily on following a specified pattern by which we will create our tests. In a perfect world, we would be able to follow the rules we have set. However, in real life working environments you will occasionally find that there is no good solution that fits. This blog considers 8 ‘golden rules’ including if, why and when we break our own rules.

Step Definitions should not use “back doors” to insert or otherwise manipulate data

Use existing established mechanism (API, import capabilities, services, etc.) rather than, for example, direct database updates.
Otherwise you are a) by-passing business logic (which may change in the future) and b) subsequent test may only pass because of the (incorrect) way you have entered data via the back door.

So now that we have established the first rule let’s think about when it is acceptable to break it. The guide we adhere to is as follows:

1. Check if an API call, import capability or service within the system allows the change you’re attempting to make.

2. Check if a database stored procedure exists that handles the change you’re trying to make.

3. If not then consider updating database using a direct database update.

This order of priority ensures that we avoid making any changes that bypass any business logic unless absolutely necessary. Let us look at one of our own examples where this was the case.

We want to be able to replay old data through an interface to observe the results with some new code. The task relies on a boolean value in the database to determine the records to be reprocessed. To ensure we only process the records in question, we do a direct database update to this value to ensure that only what we want to check is included. We do this because there is no API call or other system method to replay this old data. Doing this allows us to accurately target our tests. Furthermore, it removes the risk of including any clutter that might have been inherited from the database dump while allowing our tests to be fully repeatable.

Separate what the tests do… from the environment in which they do it.

If a test starts-up the system it wants to test, you limit the ability to run tests against other environments.
For example, running tests against a UAT environment (as part of environment proving), or against a pre-prod environment (as part of load testing), etc.

Our set up is deigned to work for all environments and to not need a specific set up per environment. All set up steps are decoupled from the environment type and are therefore able to be run anywhere. This is achieved by making our test aware of which environment it has connected to by interrogating the system session and grabbing the info it needs.

Tests should be self-describing

“This test is so outside the box, I can't-- I mean WON'T even tell you what you are looking for." You will know it when you find it."

To avoid having tests where it is difficult to understand it is important to remember the following otherwise as the quote suggests you won’t know what the tests goal is or what it is achieving.

You shouldn’t need to be a developer to work out what is going on
If you need to “look at the code” to understand what’s going on, it’s not right.
Don’t hide “control” data in spreadsheets
Some comments for “context” are ok, but too many comments in a feature file is symptomatic of a poor language.

The main instance where we break this is when we are running direct SQL as seen in the previous rule. We then mitigate this is by making the scenario we are running be an clear descriptor to what the SQL will achieve thus minimizing tests which might not be as clear.

Prefer tables in SpecFlow to external spread sheets

It’s hard to read a test if you have to keep referring to a spread sheet to figure out what is doing
Spread sheets are hard to merge / compare
It’s hard to describe a change in behaviour for an enhancement (“cell AC4 should now be …”)

As you can imagine common sense rules apply here if we are trying to import a small set say 10-20 rows of info into the underlying system it is probably best to do it within a SpecFlow feature file. However if you are trying to upload 40,000 rows of price data I doubt you would want to have a feature file that is largely illegible and filled with all those rows so it might be a better idea to throw that in an external document that gets referenced from your scenario.

A series of small steps is better than one “big” step

Suppose I have an operation that consists of a sequence of actions A,B,C,D,E.
If I have one step that does all of this in one go then
- When the step fails, it makes it hard to see which action actually failed (see #8 below)
- It is hard to re-use the underlying actions in a different sequence/scenario

This should always be done, if only for this last point. if every operation has its own bespoke test step, that might look neat but you’re going to repeating your work over and over. Your life will be much easier - and your automated test tool much more easily expanded and long-lived - if you are able to reuse existing steps for new scenarios. Imagine your business has several processes involving varying types of price upload - perhaps verifying prices in the system, a valuation process and exposure reporting. You could create the following steps:

When I verify prices
When I value the portfolio with latest prices
When I report all exposures

All of which have a price upload ‘under the hood’, which you have to build into each and every new test you create. If you split this out and built a “And I upload the following prices”, you would have done this once and used it three times already.

No business logic in step definitions

Step definitions should call existing code (logic), not introduce their own.
They should not “simulate” the logic that we “know” is performed in the system.
If the logic is in the system… call it
If you can’t call it (because system is not architected be tested in this way… e.g. business logic in a GUI), then lobby for it to be changed.

The greatest struggle here is this last point. You’re working to a project deadline and there’s an operation in a new system feature you need to test. You can call system logic to carry out steps one to three, but step four relies upon the GUI to be initiated. You ask for it to be changed, but the development resource can’t be spared. You could stay true to your ruleset and raise test tech debt to be covered at a later date and leave the feature untested. However, let’s be pragmatic for a just a moment: even development tech debt in a system is unappealing in lieu of another genuine feature for the user (a more stable foundation to build upon vs a sexy new button?). Furthermore, a user may perceive testing tech debt to be an inefficiency of the test rather than worthwhile architecture improvement, which means your new item will be prioritised right after the entropic heat death of the universe.

In this instance, it might be worth adding that little bit of logic to cover as much of that operation as possible - the important thing is that it is a conscious decision made by both tester and user that this leaves a potential blind spot.

Try to turn off asynchronous or time-triggered functionality

Asynchronous components make tests fragile … you need to “wait” or “poll” to determine if something has happened.
Where possible, control the triggering in the test.
E.g. If a job runs every X minutes, then disable the scheduling and introduce a step definition that allows you to explicitly run it when you want.

Relying on services like the above couples your test to another system in a way that generates nasty dependencies - did your test fail, did the service fail, or is there a bug in the way they communicate? The above, however, make a dangerous assumption - that the explicit triggering of the service under test is functionally identical to when it runs by itself. Let’s say your system allows you or your user to run an operation that performs some action and updates something accordingly - perhaps logistical information where the cost of transporting goods is calculated based on user-entered dates. The user decides they also want it to run automatically whenever they make a change.

So your dev takes a look at the existing code and decides it can be improved, or it’s a fragile part of the system, etc - and builds a separate automatic version that uses 80% of the same code. Is triggering it in the old a real test of the service, if they aren’t identical? Perhaps the only way to truly test the automatic part, is to let it trigger itself. See system architecture note above…

When a test fails, it should be obvious why it has failed

Tests that do lots of things in one go are great when things are working (one test, checks multiple things – aren’t we clever), but when the test fails you don’t really know why
The real value of a test is when it catches a subtle error 5 years from now, when none of the original team is still around. Will that future team be able to work out what has actually failed?
At the risk of spamming you with another (short-ish) blog entry…. see Who are you writing tests for?

There’s never really a reason to diverge from this. It might be easier to write the test at the beginning by combining multiple elements in one, or to not name the test scenarios or data in an intuitive way - but when you look at that test again when it picks up regression in a week, month, or 3 years when you’ve left and come back again, you’ll wonder what the hell you were thinking the first time. The main ‘drawback’, or rather barrier to entry of automated testing is the upfront cost vs just testing it manually. If a test makes it obvious what’s failed a year after it was written to someone who didn’t even write it - that’s where the real value lies. A well written test ages like a fine wine.