2015
03.19

Basic expectations for acceptance test framework

In previous post I have described a story of working with acceptance tests and how I encountered problems that motivated me to create LightBDD framework. In this post I have written about different types of tests, especially describing nature of acceptance and end-to-end tests. Now, I would like to focus on my observations regarding requirements for a framework allowing developers to work on behavioral tests effectively.

Basic requirements

During work in different companies I have realized that the expectations for acceptance tests and testing framework depends on the company size and its culture. The first team in which we started looking on improving our testing tools was a part of a small company with a very informal culture. Product Owner and Quality Assurance were dedicated to our team and were pairing with us in order to formulate scenarios that fulfill their expectations but also fit to the system architecture. They were interested how our acceptance and end-to-end tests look like. Both tests were having only one purpose at that time – to ensure that our software works fine.

That was a time when we have realized that tests written in SpecFlow were too difficult to maintain (I have described reasons previously). We have started asking questions what we really need from a testing framework.

Clear tests

The first set of questions were related to the fact that we were receiving requirements from PO/QA in form of business scenarios. We wanted to be able to quickly response PO/QA questions like:

Is this scenario already covered by tests?

What is this test checking exactly?

We thought that the best option would be to model our tests to allow to preserve a nice given-when-then that PO/QA were preparing for us. If our tests would reflect provided scenarios, as close as possible, they would be easy to present to PO/QA but also would be easy to read and understand by developers.

Maintainability

With the knowledge about maintenance problems related to tests written in frameworks like SpecFlow / Fitnesse, we realized that it was a crucial requirement for a testing framework. At that point we knew that it is a tricky problem, because maintainability issues reveal after a longer period, when project grows a bit. It is safe to say that project consisting of 1 scenario written in any testing framework looks easy to maintain, but would it be the same if there are 30 different scenarios? What if there are even more? All projects evolve (unless they are dead), so do the tests. Some of the scenarios become no longer applicable and are removed, some are added, while others are extended or shrunk with few steps. Finally some scenarios may become more precised or generalized, so their steps would just be altered.

All of those changes brought a following questions that we started considering in our design decisions:

How easy would it be to add a new scenario?

How easy would it be to add or remove steps to any given scenario?

How easy would it be to rename scenarios or steps?

If scenarios are removed, how easy would it be to clean methods that are no longer used by any scenarios?

How easy would it be to restructure and reorganize test suite?

If project has 5, 30, 100 scenarios, how long it would take to apply those changes to all of them?

By how easy we mean:

  • how many manual steps have to be taken by developer / PO / QA in order to apply change?
  • are all of those steps have to be applied into one place / project / location / repository, or they have to be made in different places?
  • how long it would take to apply such change?

Clean code

Maintainability does not refer only to changing code. It is also about:

  • understanding existing tests by new people in a team,
  • investigating why they are failing,
  • checking which scenarios are still valid after changed requirements.

It brings a following questions to be answered:

How easy would it be to understand how given scenario works?

Is it possible to analyze scenario flow, without debugging it?

How easy would it be to debug given scenario?

We wanted to have a framework that:

  • does not require using literals with regular expressions everywhere,
  • does not generate a bunch of files with unreadable code,
  • does not use loose binding between scenarios and underlying methods,
  • does not require usage of static contexts or any complex constructs to pass state between scenario methods,
  • has an intuitive behavior,
  • is easy to navigate with Visual Studio.

Traceability

Previously, I have mentioned that acceptance tests covers much wider scope than unit tests. During investigation of failed acceptance or end-to-end tests we have often been asking questions like:

What was the test stage when scenario has failed at?

Which operation performed on GUI failed the scenario?

Which component on end-to-end journey behaved incorrectly?

We wanted a framework that would allow those questions to be easily answered at the first glance, without spending minutes on analyzing logs and stack trace.

Execution progress monitoring

Acceptance tests are slow. End-to-end tests are even slower. All of us has spent so much time staring at Teamcity, waiting for tests to finish in order to close a ticket, to release project on production or to finally go home leaving board green. So many times it occurred that some of those tests were broken, causing whole build to fail. Those failing builds were taking much more time to execute than the ‘normal’ builds, making waiting even worse (I have described reasons for this behavior in Test characteristics section of this post)… If we only knew what was happening with those tests, we could immediately detect issue, stop the tests, fix it, rerun them and go home… Of course, during fix we were adding more Console.WriteLine() or _log.Debug() statements to the test methods to detect those problems much faster next time, but there were always some places where such logging was missing. Also, the practice itself was not good, because it made whole tests code less clear to read and required additional typing.

So, what we really wanted was a framework which would allow to answer a following questions without the need of any additional developer intervention:

What is the progress of tests that are currently being executed on CI?

Why current execution takes 2 minutes more than normally?

What are currently executed tests doing now?

Are those tests just slower but still passing, or is something horrible happening with them?

Simple solution is the best one

All of those requirements that I have just described could make an impression that we wanted to have a very complex, sophisticated framework and it would take at least a year to build it – it was exactly opposite! The first version of a testing framework that fulfilled all of those requirements, consisted of a class with 1 public method in total. It was quite difficult to call it even a framework…

Within a week, after a few design meetings we came with the idea to use a standard NUnit Framework with a few conventions to write our acceptance tests:

  • reflect a Given-When-Then scenario name as a test method name,
  • represent each scenario step as a method call in test,
  • name each step method the same as step in scenario (replace spaces with underscores),
  • wrap all steps with a RunScenario method, so step methods could be passes as delegates that would allow to omit brackets and will allow to display execution progress,
  • separate all test implementation details from test by using partial classes.

An example scenario taken from Wikipedia page:

would look like as follows:

with the example implementation as follows:

The BDDRunner.RunScenario() method was responsible for doing two things only:

  • executing step delegates in provided order,
  • printing step name before it’s execution.

That’s it!
So, how all requirements were fulfilled? – Lets see:

Requirement Solution
Clear tests Used conventions allowed PO/QA easily understanding tests, even if they were written purely in code. We were still able to pair and work together on them. We were also able to quickly browse our existing tests to check if given scenario is already in place or not.
Maintainability We decided to place all our tests directly in code, representing all feature elements (features, scenarios, steps) with corresponding code constructs like classes or methods. This allowed us to use all standard developers’ tools (IDE, Resharper) and methods (refactoring, static analysis, running tests from IDE) to maintain our test code effectively.
Clean code Instead of reinventing the wheel, we decided to use existing tools to do things that they are doing well. Everybody knew NUnit framework, how to write tests with it and what behaviors can be expected from it. We went with this well known test structure. The convention that we used for structuring our tests gave us better clarity of what given test is doing. Explicit steps execution allowed us to analyze them quickly and effectively (after all it is only a matter of navigating to step method implementation).
Traceability Representing each step as a method with self describing name and printing step name before it’s execution allowed us to localize and understand scenario failures quicker, by analyzing exception/assertion stack trace or checking a execution console output in both, CI and Visual Studio.
Execution progress monitoring Again, because each step name was printed before it’s execution, we have got an execution progress monitoring for free. It finally allowed us to track on CI what is the current stage of executed tests and quickly determine that some of the steps are executing longer or failing. Also, because the Teamcity was using time stamps for printing console logs, we could analyze which steps are executing longer and focus on their optimization.

LightBDD

I have noticed that the small BDDRunner class become very helpful for our team to develop both, acceptance and end-to-end tests, so I decided to create an opensource project and share it with others. The class that I have described above became a first version of LightBDD – there is a first commit showing how it looked then.

Thank you.

PS. In the upcoming post, I will describe how requirements changed when I joined a larger company with a corporation-like environment, and how LightBDD evolved into a current form.

2015
03.04

Beyond unit tests

All the projects I was working on were covered by various test types to ensure that developed code is functioning as expected. It is interesting, however, that almost each project had a slightly different combination of test types. Also, I have noticed that each company was naming and structuring those tests in a different way. Because all of those definitions are blurred a bit, I thought that it would be a good idea to take a closer look and describe how the tests were constructed, what was their purpose and what was the working experience with them from the developer perspective.

A different test types

Below, I have enumerated a most memorable types I have seen:

Type Description
unit tests Definitely the most common and well know test types. Well defined by Martin Fowler in his UnitTest bliki article. We used them to test pure business logic in isolation from external dependencies, like file system access, database, network etc.
They are the fastest tests as their scope is very small and all external dependencies are mocked.
integration tests I should probably say: an application internal integration tests, as we used them to test all classes responsible for communication with external dependencies like database, file system etc. within a developed service or application.
They have the same scope as unit tests, but they are much slower.
automated GUI acceptance tests We used those to test desktop or web application GUI interface by using automation tools like Selenium or QTP. In one project, they were used to verify business scenarios of desktop application deployed and configured in testing environment, so the tests were heavy and slow as scope was referring to whole application.
In other project, tests were referring only to a thin presentation layer of web application, where other parts such as back-end services were isolated.
service acceptance tests We used those tests in various projects and companies to verify behavior of services (HTTP or message based). They were user scenario specific, usually defined by Product Owner and/or Quality Assurance. Their scope was single service or few services making a logically autonomous component.
end to end tests Those tests were used in projects focusing on bigger systems, especially with SOA architecture and we used them to ensure that all services or components were working together properly. Same as acceptance tests, they were scenario based but the scope was basically a whole system.
manual user acceptance tests Those tests were manually executed by QA to ensure that application works as expected. Depending on software nature, they were similar to one of previous three test types.

In fact there were a much longer list of those (like regression, smoke, load and performance tests, etc.), but I have decided to omit them, as their structure, scope and way how they works is similar to the mentioned tests, and the only difference is reason why they are created, their function or just a different name.

Beyond unit tests

Apparently the unit tests are the most obvious and well known test types that could be spot in various projects. During my work history, those tests were always present (maybe except the very first projects I was working on). The presence of other test types was not as obvious. I would say that in projects that have been started before an Agile Methodology era, most of the tests were manual. The tests were defined in a very loosely manner (like: check that application starts and it is possible to do X), or they were structured in a list of steps to execute and expectations for execution results. I would skip the manual tests in further parts of this post, as they were usually done by a separate team of testers and there was nothing related to programming itself. I would like to just mention two things about them:

  • manual user acceptance tests were a base for writing their automated versions (later referred as acceptance tests),
  • the formalized version of manual tests was written in a form of steps and expectations, so in reality it was a precursor for BDD-like tests.

The integration tests were definitely not unit tests, because they were testing integration with external dependencies such as databases. If present, we used them to test all classes interacting directly with externals, like the ones following repository pattern. In order to run them, we had to have a real database to connect to (if possible, we tried to use an in-memory / file-based database version like SQLite to make easier to run tests) or the sample input files to play with. Beside that, those tests were not much different from unit tests. Because of this strong similarity, I would omit them from now on.

Now, if we take a look on GUI tests, it is easy to spot that they are really similar to service tests. The only difference is that GUI tests uses GUI as an interface, while service tests are using HTTP or messages as an interface to communicate with tested service/application. We used those tests to check the application behavior. Usually we were following an approach where the tested application was installed and run in the same way as it would be run after final release, so the successfully executed tests were giving us a proof that application would behave the same way when installed in production. The assumption that tested component has to be the same as on production means that during testing we have not been altering any program code with mocks. We were also trying to use only public and official APIs to run the tests (i.e. GUI, HTTP interfaces, messages, input files, etc.) and avoiding direct alterations of internal component state like manually altering data in database etc. Of course there were cases, where we decided to violate this rule, but usually it was dictated by the poor interfaces definition while the tests were being created for existing software or a significant test speedup. It is worth to mention that tests written in this form are more high-level and much slower than unit tests. Also, they are more behavior specific, focusing on the action result, not the way how it is achieved.

The last kind, end to end tests were used in projects consisting of multiple autonomous components. Similarly to acceptance tests, we were deploying all components in a form which would be deployed in production. Obviously, those tests were the slowest ones, because all the tested components had to perform a specific action in order to succeed the whole test – nothing was mocked there.

I have found interesting the way how Martin Fowler’s has identified those tests by their function:

  • Acceptance tests, covering a list of scenarios that define behavior of a specific feature (like login, shop basket, etc.),
  • User journey tests, covering all actions that have to be taken from the user perspective in order to achieve a specific goal,

and their scope:

Test characteristics

In comparison to unit tests, which are low-level, focused on a small part of code and fast, those tests:

  • are high-level, business scenario / behavior based,
  • refers to wide part of code, covering one or multiple components, hence
  • they are usually much slower to execute.

There are few interesting consequences of this characteristic. First of all, those are high-level tests, focusing on behaviors of tested component or whole system. They implement scenarios, often written in BDD form:

  1. Given an opened login window,
  2. when user enters valid credentials,
  3. and user clicks the login button,
  4. then the login window should close,
  5. and user should successfully log in to the application,
  6. and user account details should be displayed on the screen.

or

  1. Given a sample wav file present in input folder,
  2. when an EncodeFileMessage is sent to Encoding Service with sample file path and MP3 output format specified,
  3. then the Encoding Service should publish a FileEncodedEvent,
  4. and that published FileEncodedEvent should have a path to encoded file in MP3 format.

Those scenarios are focusing on what is happening in the system, not how it is done, so they are usually using a public API of application for triggering an action and later query / validate its outcome. The scenarios are referring to a business feature or a whole user journey, which means that the scope of those tests is much wider than in unit tests, covering a part of component, a whole one, a few components or even a whole system.

A test scope has a big influence on how those components are tested. If test refers only to one component, the component:

  • may be started directly from a process that performs a test, or
  • may be deployed into a dedicated testing environment and accessed remotely by the test.

If scope corresponds to multiple components, it usually means that all of them have to be deployed into a testing environment and configured to communicate with each other. If component has to be deployed before testing, a dedicated test environment has to be present in order to run such tests. It also implies a time overhead related to component installation, configuration and start-up.

With the most common testing approach, the tested component is executed in a separate process than test, so that test code communicates with it in asynchronous manner.

The test scope and asynchronous communication have a big impact on test execution time. Those tests are slow, of course, the execution speed depends on a project, type of tests and their structure. It may vary between less than a second for service test to more than few minutes for end-to-end test.

A huge factor in execution speed plays a way how assertions are defined in such tests. They base on component public API, which means that they usually check things like:

  • a requested information has been displayed on a screen,
  • a message X has been received,
  • a resource Y became available over HTTP, or
  • a file appeared on FTP server.

Those assertions are time based. They have to repeatedly check specified condition up to a defined timeout, because the tests are asynchronous and components require time to process requests in order to fulfill those criteria. In case when something goes wrong, this type of assertion would consume a full timeout before it fails. It can lead to situations where successful tests take few minutes to execute, while tests executed on a faulty system could take few hours until they all fail (it is a real example). It is worth to mention that the biggest time killers are those, checking that specific condition did not happened, as they always use whole timeout to succeed. As that kind of tests are never good enough (it is always possible that the tested condition would happen just after assertion finish), we have been always trying to eliminate or limit them if possible – usually the same scenarios could be easily covered with unit tests.

To summarize, the nature of acceptance and end-to-end tests makes them significantly distinct from the unit tests. In the next post, I will describe how we came up with an expectations for a testing framework allowing us to write acceptance tests in easy manner.

%d bloggers like this: