This is the first in a series of postings about the challenges of test automation with Openlink's Endur/Findur platform.

Before we look at solutions, we should step back and see what problem we are trying to solve.

A good place is to reflect on where historically defects have arisen and start formulating a plan to improve things next time around. Here then are the results of a review of more than 500 defects raised during an Endur European Power implementation ..... and what lessons we learned.

In a future post we'll look at how to apply those lessons.

Four categories, starting with the least common.

#4 Core Code Defects

Least common, but, boy, do they hurt.

Endur/Findur Core code defects were the least frequent bugs, although they felt painful at the time; not least because you can feel helpless waiting for the Openlink machine to slowly crank out a fix.

You can't prevent 'em, so the secret is find 'em early. "Early" gives you time to find an alternative solution. "Early" gives you time to wait for the fix. So "spike" in areas of high risk, or areas where the solution is particularly constrained. There are lots of ways to do deal entry (and lots of ways to get yourself out of trouble) so that's not high risk... but if product control demand a particular option model then gets some examples running early; if it doesn't work your staring at a big delay to get the fix.

Ask around. Before you start. You're almost certainly not the first to be doing this and if you're not sure which are the high risk areas, speak to someone that is.

#3 Simple Programming Errors

The least common class of defect, of our own causing, were logic errors. Some futures cascade logic that calculated the net position incorrectly, or an invoice generation script that couldn't handle a clock change, etc.

No excuses. Nothing wrong with the core code. The developer just screwed up. It happens.

And yet in this project (with woefully inadequate unit testing, as it happens) it accounted for less than 20% of all defects.

All of which is a shame because this is the area of code that we know best how to control. The software development community has a large bag of tricks to address this; not least comprehensive unit testing.

So unit test by all means. In fact, do add unit tests. Just be aware that 80% of defects in an Endur implementation probably lie elsewhere...

#2 Endur Usage Errors

The logic was sound, but you got the value from the wrong field, the wrong leg, the wrong result....

The majority of coding errors were, in fact, caused by incorrect usage of Endur; either the API or database access. The logic was fine and you can't blame Openlink if the developer used the API incorrectly.

How is this different to simple programming errors?

Simple difference; but crucial.

You can't catch these defects through conventional unit tests. A conventional unit test would mock the interaction with Endur, so the test could be run in isolation. But the majority of the defects were caused precisely because of this interaction; mock that away and you mock the problem away.

And the winner is ...

#1 Configuration Errors

Configuration defects were the root cause of almost half of all defects found. Either resulting directly in failure or the subsequent failure of some code (which could be attributed to the configuration change).

This should not be surprising for two reasons:

A large proportion of the project is delivered through configuration; we should expect a similarly large proportion of the defects.
Endur configuration is very flexible. Too flexible. It's all too easy to enter the "wrong answer" or break something that was working with no immediate feedback.

And ask yourself the following questions

You have automated tests for your code. What do you have for configuration?
You have version control for your code. What do you have for the configuration?
The people that change code are used to configuration management. What about the people that change configuration?

Lessons Learned

Core Code

Aggressively explore high risk areas at the start of the project. Don't passively wait for them to come and find you.

Unit testing

Do it. But what's your plan for the remaining 80%?

Automated Endur testing

The bulk of the defects will be with your interaction with Endur. Don't mock the problem away; embrace it with automated tests that cover the full interaction with Endur.

Automated Configuration Testing

Configuration is at least 50% of the solution and deserves 50% of your attention. Your automated tests should include the configuration on which your success depends.

Treat Configuration With The Same Discipline As Software

It's as important as the software and needs to be treated with the equivalent discipline. If your software control is better than your config control, it's time to up your game with config.

Help is at hand...

We'll return with some potential solutions and how to apply these lessons in part 2.