Wednesday, September 16, 2009

Evolution of unit testing

We take unit testing seriously, and make it a cornerstone of our development process.  However, as I alluded to in earlier post, our approach stretches the traditional meaning of unit testing. With a system that was not designed for testability, often times we have to write a lot of unrelated logic and integrate with external databases and services just to test a ‘unit’.  Further, while the desire would be to practice TDD, what we often practice is closer to Test Proximate Development.  Rather than start with the test first we tend to create tests at approximately the same time or in parallel with our development. Nevertheless, we have evolved and continue to increase the quantity and quality of our tests.  The path to our current methodology may be a familiar one.

Once you start down the dark path, forever will it dominate your destiny

We started out with a code-base that wasn’t necessarily written with testability in mind.  Nonetheless, the system was large enough, complex enough and mission critical enough to require not only testing our changes but that those tests be automated.  If we were to have any hope of introducing the amount of change demanded by the business, at the pace demanded and without introducing excessive amounts of bugs or letting a catastrophic blunder out into the wild, we had to begin building a suite of automated tests.

We made the most obvious choice and began using nUnit to write tests. We’ve used a variety of developer tools to run these tests throughout development, tools like TestDriven.NET, and later ReSharper.  We also set up CruiseControl.NET, which we were already using to automate our builds, to run these tests as part of the continuous integration process.

The biggest challenge, of course, was that there was no clear separation between business logic and data access code.  Therefore, right from the get go, our ‘unit tests’ weren’t unit tests in the purist sense.  They exercised the code, but also required interaction with the backing store.  Further, the majority of tests required a lot of setup data to either already exist in the database or be created prior to the test run in order for the the ‘unit’ of functionality to be testable (e.g. in order to test an Order Adjustment, a Customer, an Order, an Item, and other transactional records all had to exist to create and test an Order Adjustment).  In the beginning this meant that the majority of tests randomly retrieved whatever data was in the test system to test with or otherwise assumed that requisite data would be present allowing the tests to succeed.

That is why you fail

There are a few glaring problems with this approach that quickly exposed themselves. 

  • Tests would fail and succeed erratically. One test might alter test data used by a subsequent test in a way that would make it fail; order and timing mattered.  This left us chasing ghosts, often troubleshooting tests rather than application code.
  • The test database grew indefinitely as test data was dumped into the database on each build but never cleaned.  And builds were frequent.
  • The test database, originally set up on an old un-used PC, saw its performance degrade as the number of tests we wrote increased.  It got to the point where a failed test might take minutes to fix, but an hour to run the test suite to verify the fix.  Often times, after waiting an hour or more, we’d find out another failure had occurred. Fix-and-wait turnaround time became prohibitive.

We tackled these issues as they became productivity drains in no particular order and with no particular strategy.  At first we addressed our exponential data growth and performance problems with solutions barely adequate to get us by, to keep us writing and running tests.

Of course we threw hardware at the problem, incrementally (more memory here, an extra drive there) as problems arose. Eventually we upgraded to a beefy server, but that was much later.  The bulk of our first phase attempts were concentrated on creating a pruning script to be triggered at the beginning of each test run.

Train yourself to let go of everything you fear to lose

The pruning script attempted to clean out all the data created by the prior run.  This script is rather long and complex, recursively traversing from parent tables to child tables to delete in reverse order (all manually written).  You might ask, why not just wipe the database clean and fill it with known data prior to each run?  This was considered but ruled out based on what can be boiled down to:  DELETE statements work regardless of the columns in a table, INSERT statements don’t, which makes pruning a little more resilient to schema changes.  It seemed to me like a dummy data creation script would be much harder to maintain, but others may question that assumption.

Attachment leads to jealousy. The shadow of greed, that is

Co-dependent tests came next.  We began to refactor our tests (as they became problems) to be properly isolated and autonomous.  These test were re-written to create their own setup data, as they should have in the first place.

Having autonomously run-able tests, and more hardware resources, while continuously tweaking our pruning script, allowed us to grow our test suite to more than 2000 tests.  These tests ran in less than 20 minutes.  But of course these solutions were band-aids and living on borrowed time.

At an end your rule is... and not short enough it was

Working toward the elimination of the need for a pruning script, we began requiring that each test not only create its own data but also clean up after itself by removing that same data.  My initial solution was for each class to implement a Purge() method which would recursively call the Purge() methods of its children.  Thereby, each unit test could be wrapped in a try-finally, and within the finally all data created within the test would be purged. 

We wrote a considerable amount of these Purge methods, which encountered some of the same order of execution/referential integrity issues experienced by the pruning script, but they worked more or less.  A good percentage of tests were now cleaning up after themselves.  But I had a bad feeling about the Purge pattern, every time I wrote a Purge method it was as if millions of voices suddenly cried out in terror, and were suddenly silenced.  Writing unit testing code directly into application code classes can do that to you.  The purge code, in retrospect, was nothing more than hand coded compensating transactions.  Purge methods weren’t an elegant solution.  Off and on we toyed with the idea of using transactions and Enterprise Services to perform rollbacks, but each time it came up I could have sworn I had a good reason why it wouldn’t work but I can’t recall one now.  Eventual epiphany caused me to conclude that my Purge endeavor was ill-conceived, and a more elegant solution would likely be found in the use of transactions.

Mind what you have learned. Save you, it can

I recently went back to the drawing board on our cleanup approach, and decided to look at TransactionScope for a simpler solution.  The idea was an obvious one, wrap our tests in a transaction which always rolls back, thereby superannuating the need for Purge methods.  After a few quick proofs, I found TransactionScope not only worked, was cleaner, but also performed better than the manual Purge methods. I then encapsulated the transactional behavior in a base class from which all our test classes could inherit.

using System;
using System.Transactions;
using NUnit.Framework;

namespace Foo.Test.Common
{
[TestFixture]
public abstract class TransactionalTestBase : IDisposable
{
#region Setup/Teardown

[SetUp]
public virtual void Setup()
{
trx = new TransactionScope();
}

[TearDown]
public virtual void Teardown()
{
Dispose();
}

#endregion

private TransactionScope trx;
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}

protected virtual void Dispose(bool disposing)
{
if (disposing)
{
if (trx != null)
{
trx.Dispose();
trx = null;
}
}
}
}
}



Always there are two. A master and apprentice.



We’ve only recently begun to replace our current Purge methods with the transactional approach, and I think it holds promise for defeating Darth DataCleaner.  But I fear Darth DataSetupious is still out there forcing us inexorably toward repositories and mocking frameworks. Although, in my mind, the need to create and destroy data for testing purposes will always remain, there may be a new hope for bringing balance to our tests.



No comments:

Post a Comment

Post a Comment