Monday, December 7, 2009

The Agony of EF

 

We’ve been working with EF v1 (.NET 3.5) for a little while now, using it sparingly when creating new code requiring CRUD functionality, and refactoring older code with similar characteristics. You may recall, from an earlier post, that our approach was to use EF to create a data-layer to replace the Datasets, Datatables and Datareaders we currently use with type-safe rich Entity Objects.  It was not and is not our intention to replace our existing business objects with EF.  To this end, we employed a repository pattern that serves up ‘detached’ EF objects which are then used to hydrate and persist our business objects.

All too easy

Initially, the results were very positive.  With some of the simpler cases we started with (mapping tables with few or no relationships) an immediate benefit was obvious.  We could replace the SELECT stored procedures with equivalent Linq-to-Entities statements, and remove entirely the INSERT, UPDATE and DELETE stored procedures in favor of EF persistence.  With simple use of the designer to import entities from the database into our model, and the implementation of some boilerplate repository code, we could eliminate at least four stored procedures per entity in our database.

A dream to some, a nightmare to others

Slowly, as we attempted to work with slightly more sophisticated cases, issues began to emerge; issues that are probably well documented across the web and perhaps most notably in the vote of no confidence.  The biggest issue we encountered was the general lack of tiering/layering support in EF.  Entity objects could be detached, but the consequences were painful. 

  • Relationships were not preserved.  Parent and child objects could not be traversed. Even worse, since the relationships were null entities rather than key columns, even the identifiers could not be accessed thus making it difficult to query and manually load related objects.  Lazy-loading disconnected relations isn’t possible and doesn’t even make sense.
  • Re-attaching detached objects was not straightforward.  Any non-trivial situation resulted in object graphs where some objects where attached while others detached, a mixture EF couldn’t handle.

These two alone were enough to make me doubt EF’s usefulness beyond trivial usage (or building a one-tier application), and we hadn’t even gotten to performance, or model merging issues.

Making the simple simpler; the complex more complicated

Nevertheless, I was determined to understand the limitations fully before discarding EF v1 (abandoning O/RM altogether, waiting for the promise of EF4, or switching O/RMs).  While our EF Repository pattern wasn’t mainstream usage, the issues we were facing with our data-layer are the same issues what would be faced by anyone using EF across layers and tiers.  The most likely parallel would be anyone attempting to use EF with WCF to create services.  In this camp, we were not alone, there are a considerable number posts lamenting the difficulties and a fair number of proposed solutions and attempted workarounds; some clever, some not so.

After investigating dozens of these solutions, implementing one other candidate with only partial success, I eventually came across Reattaching Entity Graphs with the Entity Framework.  This solution uses serialization to ‘detach’ the object graphs thereby preserving the relationships.   It also contains some very clever logic for reattaching the object graph while at the same time including a method for instructing the attachment process on how far to traverse down the various graph pathways when performing persistence.

After about a day or two I was able to integrate this solution into our code and solve our detachment and relationship issues.  Now I can look forward to facing the performance issues (this solution does require eager-loading the related entities using Include), the source control branch merging issues inevitable with one large edmx file, and the many others lying in wait for me.

A game of trade-offs

This experience reinforces one of the lingering doubts I have about O/RMs, and other tools, frameworks and patterns in general.  It often feels like a zero-sum gain, where we’re just moving the complexity around rather than reducing it.  Instead of struggling with writing stored procedures or dynamic sql and mapping the datasets and readers to object properties and back again (which I never found to be that onerous to begin with), we’re struggling with mapping objects to tables via Xml and persistence idiosyncrasies.  It doesn’t feel like an easier or better way necessarily, just a different way and different set of challenges. Maybe that’s just familiarity with the old method, and lack of familiarly with a new method, but maybe its not.

Thursday, December 3, 2009

Dealing with Design Debt Part 2½: The Smell of Fear

Its probably time for an update on the dealing with design debt series.  I promised part III would be the conclusion and it will be, but we’re not quite there yet, nevertheless there’s some story to tell now that will bridge the gap. 

Along with the business case made for the debt reduction project, there were also other business decisions that that had direct impact or were directly impacted by the project.  Most significant were the several projects on the schedule with direct overlap with key structures under construction as part of debt reduction.  To schedule these projects first would mean doing them twice, but not doing them first would mean delaying or missing revenue opportunities.

Measure twice, cut once.

From a purely technical perspective it made sense to complete the debt reduction project first, and tackle the dependent projects after.  This would allow us to design the solution for the new structure and implement once, rather than designing for the old structure then later retrofitting it for the new while also implementing twice.  However, the business case ran counter to the technical case.  These projects were estimated to have significant revenue impact, therefore delaying them until the completion of a long and risky debt reduction project was judged to represent a significant loss in revenue.

“It is a riddle, wrapped in a mystery, inside an enigma” - Churchill

Based on these projections, it was decided that the high revenue impact projects would be done prior to the debt reduction project, acknowledging the implied cost of implementing twice, and the ultimately less clean implementation.  Essentially, we had decided to take on debt during our debt reduction exercise.

Unfortunately, its not entirely clear whether that was the right decision or not.  The high revenue impact projects were completed but took longer than expected, pushed back the debt reduction project and made it a bit more complex.  Further, the actual revenue impact of those projects isn’t entirely clear nor is the impact of the additional debt and complexity particularly measurable. 

And there upon the rainbow is the answer to our neverending story

I guess the moral to this story is that, at least in this case, debt accumulation was an explicit rather than implicit trade-off decision.

Technorati Tags: ,

Friday, November 13, 2009

Deep thoughts

 

I’ve recently come across some blog posts which resonate with some of my own ideas.  While these particular quotes are not necessarily related to each other they are consistent with a larger theme that I’m trying to develop.  I’ve included the salient excerpts (providing my my own titles).

 

Negative consequence of [design] patterns

One of the biggest challenges with patterns is that people expect them to be a recipe, when in reality they are just a vague formalization of a concept…In my view a pattern should only be used if its positive consequences outweigh its negative consequences. Many patterns, oddly enough, require extra code and/or configuration over what you’d normally write – which is a negative consequence.

– Rocky Lhotka

Why on earth are we doing so many projects that deliver such marginal value?

To understand control’s real role, you need to distinguish between two drastically different kinds of projects:

  • Project A will eventually cost about a million dollars and produce value of around $1.1 million.
  • Project B will eventually cost about a million dollars and produce value of more than $50 million.


What’s immediately apparent is that control is really important for Project A but almost not at all important for Project B. This leads us to the odd conclusion that strict control is something that matters a lot on relatively useless projects and much less on useful projects. It suggests that the more you focus on control, the more likely you’re working on a project that’s striving to deliver something of relatively minor value. To my mind, the question that’s much more important than how to control a software project is, why on earth are we doing so many projects that deliver such marginal value?

– Tom DeMarco

 

The dirty little secret about simple: It’s actually hard to do

The dirty little secret about simple: It’s actually hard to do. That’s why most people make complex stuff. Simple requires deep thought, discipline, and patience – things that many companies lack. That leaves room for you. Do something simpler than your competitors and you’ll win over a lot of people…You can try to win a features arms race by offering everything under the sun. Or you can just focus on a couple of things and do ‘em really well and get people who really love those things to love your product. For little guys, that’s a smarter route.

When you choose that path, you get clarity. Everything is simpler. It’s simpler to explain your product. It’s simpler for people to understand. It’s simpler to change it. It’s simpler to maintain it. It’s simpler to start using it. The ingredients are simpler. The packaging is simpler. Supporting it is simpler. The manual is simpler. Figuring out your message is simpler. And most importantly, succeeding is simpler.

– Matt (37Signals)

Wednesday, September 23, 2009

Loopback and kiss myself

A consequence of using transactions to rollback and thus cleanup after database dependent tests is that some code, which would not otherwise be run within a transaction context, doesn’t work when it is. One such situation is the case of a loopback server, which I’ve encountered several times over the years. 

when people stop being polite... and start getting real...The Real World

A real world production environment might consist of several databases spread across several machines.  And sometimes, as distasteful as it may sound, those databases are connected via linked servers.  That is exactly the situation that we presently find ourselves in. We have quite a few views and procedures that make use of these linked servers, and those views and procedures invariably get called from within unit tests.  That in of itself isn’t an issue for transactional unit tests.  The critical factor is that our test environments, and more importantly our development environments aren’t spread across multiple machines, but instead host several databases on one local SQL server instance. 

There can be only one

In order for code that utilizes linked servers to be executable in development environments, we create linked servers that actually point back to the same SQL server instance, creating a loopback server.  Presently, loopback servers don’t support distributed transactions.  So what to do with transactional unit tests that call loopback code?

A few options come to mind, but are impractical for us:

  1. Turn off transactions on those tests, and write manual test data clean up code.
  2. Use aliasing so that the code doesn’t actually interact with the linked servers directly, and then simply don’t use linked servers in development environments, instead have the alias point directly to the tables, etc.
  3. Use virtualization to mimic physical production configuration.

These seem viable and actually #2 seems like a pretty good idea, but we have a mixed SQL 2000 and 2005 environment and aliasing is only available on 2005, so we’ve never even tried this.  Option #3, although it would more closely resemble the production configuration is more practical for a test environment than a development environment.  So, while it may solve the former, we still need to solve the latter issue, without the need for an overly complicated complete and self contained virtual environment for each developer.  Option #1 is just a step backwards that we’d like to avoid.

For instance keep your distance

There is a simpler option and is the path we recently chose after implementing transactional unit tests and finding numerous tests that immediately failed due to the loopback problem.  We simply set up a second instance of SQL server on the development machines, and then configured the linked servers to point not back to the same instance but to the two different instances.  For transactions to work, it turns out that the linked servers don’t have to be on two physically separate machines, just two different instances of SQL server.  This solution may have limited applicability for environments with multiple servers (and multiple links) or that don’t use linked servers at all.  But for us, with essentially only two databases that utilized the linked servers, it solved our issue without forcing us to change any code.  Its only slightly more complicated in that we have to run two instances of SQL server.  Eventually, we may end up moving to an aliasing solution, but that will require code changes and an upgrade, but for now we’ve sidestepped the loopback.

Wednesday, September 16, 2009

Evolution of unit testing

We take unit testing seriously, and make it a cornerstone of our development process.  However, as I alluded to in earlier post, our approach stretches the traditional meaning of unit testing. With a system that was not designed for testability, often times we have to write a lot of unrelated logic and integrate with external databases and services just to test a ‘unit’.  Further, while the desire would be to practice TDD, what we often practice is closer to Test Proximate Development.  Rather than start with the test first we tend to create tests at approximately the same time or in parallel with our development. Nevertheless, we have evolved and continue to increase the quantity and quality of our tests.  The path to our current methodology may be a familiar one.

Once you start down the dark path, forever will it dominate your destiny

We started out with a code-base that wasn’t necessarily written with testability in mind.  Nonetheless, the system was large enough, complex enough and mission critical enough to require not only testing our changes but that those tests be automated.  If we were to have any hope of introducing the amount of change demanded by the business, at the pace demanded and without introducing excessive amounts of bugs or letting a catastrophic blunder out into the wild, we had to begin building a suite of automated tests.

We made the most obvious choice and began using nUnit to write tests. We’ve used a variety of developer tools to run these tests throughout development, tools like TestDriven.NET, and later ReSharper.  We also set up CruiseControl.NET, which we were already using to automate our builds, to run these tests as part of the continuous integration process.

The biggest challenge, of course, was that there was no clear separation between business logic and data access code.  Therefore, right from the get go, our ‘unit tests’ weren’t unit tests in the purist sense.  They exercised the code, but also required interaction with the backing store.  Further, the majority of tests required a lot of setup data to either already exist in the database or be created prior to the test run in order for the the ‘unit’ of functionality to be testable (e.g. in order to test an Order Adjustment, a Customer, an Order, an Item, and other transactional records all had to exist to create and test an Order Adjustment).  In the beginning this meant that the majority of tests randomly retrieved whatever data was in the test system to test with or otherwise assumed that requisite data would be present allowing the tests to succeed.

That is why you fail

There are a few glaring problems with this approach that quickly exposed themselves. 

  • Tests would fail and succeed erratically. One test might alter test data used by a subsequent test in a way that would make it fail; order and timing mattered.  This left us chasing ghosts, often troubleshooting tests rather than application code.
  • The test database grew indefinitely as test data was dumped into the database on each build but never cleaned.  And builds were frequent.
  • The test database, originally set up on an old un-used PC, saw its performance degrade as the number of tests we wrote increased.  It got to the point where a failed test might take minutes to fix, but an hour to run the test suite to verify the fix.  Often times, after waiting an hour or more, we’d find out another failure had occurred. Fix-and-wait turnaround time became prohibitive.

We tackled these issues as they became productivity drains in no particular order and with no particular strategy.  At first we addressed our exponential data growth and performance problems with solutions barely adequate to get us by, to keep us writing and running tests.

Of course we threw hardware at the problem, incrementally (more memory here, an extra drive there) as problems arose. Eventually we upgraded to a beefy server, but that was much later.  The bulk of our first phase attempts were concentrated on creating a pruning script to be triggered at the beginning of each test run.

Train yourself to let go of everything you fear to lose

The pruning script attempted to clean out all the data created by the prior run.  This script is rather long and complex, recursively traversing from parent tables to child tables to delete in reverse order (all manually written).  You might ask, why not just wipe the database clean and fill it with known data prior to each run?  This was considered but ruled out based on what can be boiled down to:  DELETE statements work regardless of the columns in a table, INSERT statements don’t, which makes pruning a little more resilient to schema changes.  It seemed to me like a dummy data creation script would be much harder to maintain, but others may question that assumption.

Attachment leads to jealousy. The shadow of greed, that is

Co-dependent tests came next.  We began to refactor our tests (as they became problems) to be properly isolated and autonomous.  These test were re-written to create their own setup data, as they should have in the first place.

Having autonomously run-able tests, and more hardware resources, while continuously tweaking our pruning script, allowed us to grow our test suite to more than 2000 tests.  These tests ran in less than 20 minutes.  But of course these solutions were band-aids and living on borrowed time.

At an end your rule is... and not short enough it was

Working toward the elimination of the need for a pruning script, we began requiring that each test not only create its own data but also clean up after itself by removing that same data.  My initial solution was for each class to implement a Purge() method which would recursively call the Purge() methods of its children.  Thereby, each unit test could be wrapped in a try-finally, and within the finally all data created within the test would be purged. 

We wrote a considerable amount of these Purge methods, which encountered some of the same order of execution/referential integrity issues experienced by the pruning script, but they worked more or less.  A good percentage of tests were now cleaning up after themselves.  But I had a bad feeling about the Purge pattern, every time I wrote a Purge method it was as if millions of voices suddenly cried out in terror, and were suddenly silenced.  Writing unit testing code directly into application code classes can do that to you.  The purge code, in retrospect, was nothing more than hand coded compensating transactions.  Purge methods weren’t an elegant solution.  Off and on we toyed with the idea of using transactions and Enterprise Services to perform rollbacks, but each time it came up I could have sworn I had a good reason why it wouldn’t work but I can’t recall one now.  Eventual epiphany caused me to conclude that my Purge endeavor was ill-conceived, and a more elegant solution would likely be found in the use of transactions.

Mind what you have learned. Save you, it can

I recently went back to the drawing board on our cleanup approach, and decided to look at TransactionScope for a simpler solution.  The idea was an obvious one, wrap our tests in a transaction which always rolls back, thereby superannuating the need for Purge methods.  After a few quick proofs, I found TransactionScope not only worked, was cleaner, but also performed better than the manual Purge methods. I then encapsulated the transactional behavior in a base class from which all our test classes could inherit.

using System;
using System.Transactions;
using NUnit.Framework;

namespace Foo.Test.Common
{
[TestFixture]
public abstract class TransactionalTestBase : IDisposable
{
#region Setup/Teardown

[SetUp]
public virtual void Setup()
{
trx = new TransactionScope();
}

[TearDown]
public virtual void Teardown()
{
Dispose();
}

#endregion

private TransactionScope trx;
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}

protected virtual void Dispose(bool disposing)
{
if (disposing)
{
if (trx != null)
{
trx.Dispose();
trx = null;
}
}
}
}
}



Always there are two. A master and apprentice.



We’ve only recently begun to replace our current Purge methods with the transactional approach, and I think it holds promise for defeating Darth DataCleaner.  But I fear Darth DataSetupious is still out there forcing us inexorably toward repositories and mocking frameworks. Although, in my mind, the need to create and destroy data for testing purposes will always remain, there may be a new hope for bringing balance to our tests.



Wednesday, September 9, 2009

KISS my YAGNI abstraction

I’ve recently been observing what appears to me to be a growing contradiction within the software development community.  One the one hand popularity and adoption of the various flavors of Agile and its affiliated principles is growing, while at the same time tools, technologies, patterns and frameworks are being pumped out which seek higher levels of abstraction, looser coupling, and greater flexibility. Agile principles encourage simplicity, less waste, less up front design, the more familiar of those principles being:

But are the tools, technologies, patterns and frameworks simpler and necessary?  Does the fact that those recommended tools, technologies, patterns and frameworks continue to change so rapidly undermine any claims that they are simpler or necessary?

Who are you calling stupid?

In contrast to the doctrine of simplicity and just-in-time design, the latest technologies, tools, patterns and frameworks seem to be trending towards ultimate flexibility at the expense of simplicity.  Certainly, SOA, N-tier, DDD, MVP, MVC, MVVM, IoC, POCO (and the list goes on), are anything but simple.  Not only aren’t they simple, but they are also likely to fall into the “not gonna need it” category.  If one were to blindly follow best practices recommendations, then every application would be a Service Oriented N-tier Domain Driven multi-layered highly abstracted masterpiece, and would be re-written every few months.  But that hardly seems agile, lean, simple or less is more.  In some ways, Agile principles almost demand architecting after the fact.  On the other hand if you wait until you need Service Orientation, Inversion of Control or a Domain model its very difficult to add later. 

“If Woody had gone straight to the police this would never have happened”

How many successful companies succeed using systems that don’t subscribe to any of these concepts, but instead run their businesses using Cobol, Foxpro, Access, VB 6.0, Classic ASP (or any equivalent ‘old school’ technology)? And the corollary, how many failed businesses can attribute their failure to a fatal flaw in their LOB application design?  How many companies have said, “if we had only decoupled our inventory system from our purchasing system using a service layer and utilized a POCO capable O/RM tool we’d still be in business”?  My guess would be very few, and of that few they’d likely be software companies or SaaS providers where the technology is their product. But for the vast majority of companies out there where technology is the enabler not the product are we  being encouraged to over engineer by the loudest 1% of developers?

The devil made me do it

In some ways developers are snobs.  I think we spend a lot of energy looking for ways to separate the men from the boys.  The classic ranking of developers as professionals, amateurs, hobbyist and hacks gets played out over and over.  Just recalling the C++ vs. VB developer comparisons reveals parallels with each new generation.  C++ developers are professional developers while VB developers are hobbyists' and amateurs.  C# developers are professionals while VB.NET developers are hobbyist and amateurs.  ASP.NET MVC are professionals while Webforms developers are hobbyists and amateurs. Professionals use an O/RM, IoC, SOA and Mocking frameworks, if you don’t you’re an amateur. 

I don’t want to suggest that when used to solve a particular problem, any one of these technologies, tools, patterns or frameworks can’t in fact simplify a solution or make it more elegant or flexible, because they can and often do.  Or suggest that I’m not a participant in this snobbery, which I invariably am.  What interests me is when the desire to produce a ‘sophisticated’ or ‘professional’ solution means stuffing it full of the latest technologies, tools, patterns or frameworks and calling it simpler or more elegant.  While this is often interesting learning opportunities for developers and architects, and one more feather in our caps to differentiate ourselves from the outdated riff-raff, it hardly seems lean.  If when your only tool is a hammer, every problem looks like a nail then it can also be said that when you have a lot of (cool) tools every problem seems to require them all.

“It depends on what the meaning of the word 'is' is”

This is topic is further obscured by the fact that there is seldom a widely accepted ‘right’ way to do anything in software development.  Almost any approach has its share of debate. 

For instance, there’s lots a debate about which O/RM tool is the best, or purest. Even if you decide, yes their is a general consensus in the community that some form of O/RM is the preferred persistence/data access strategy, as I recently did, and you wade through the debates and pick a tool, inevitably you’ll discover another perspective that throws the decision back into question.  A recent post by Tony Davis, The ORM Brouhaha, did just that for me.

“The IT industry is increasingly coming to suspect that the performance and scalability issues that come from use of ORMs outweigh the benefits of ease and the speed of development”

Benchmarks posted on ORMbattle.NET purport to demonstrate a staggering performance difference between O/RM’s and standard SqlClient.  I mention this just as one example of how there are few right answers, just an endless series of trade-offs.

Which leads me back to my original premise. I don’t think we are necessarily keeping it simple or waiting until we need it, and we are certainly being pulled in two directions.

Monday, August 10, 2009

Prequel to Dealing with Design Debt

Two of my recent posts described how we are currently paying down our design debt in a big chunk.  As I eluded to in those posts, this sort of balloon payment is atypical, and has lead us to deviate from our normal development practices.  So what are our typical development practices as they regard technical debt and scar tissue?

Shock the world

I don’t mean to blow your mind, but we refactor as we go, leaving the park a little cleaner than we found it.  In its simplest form, that means for every bug, feature or enhancement worked on we strive for refactoring equivalency.  In other words, we aim to refactor code proportional to the amount of change necessitated by the bug fix or to meet the feature or enhancement requirements.  If we have to add 10 new lines of code for a new feature, then we also need to find 10 lines of code worth of refactoring to do (generally in that same area of the code).

Beyond that simple precept are some practical considerations;  we have to have some guidelines or goals around what kind of refactoring to do, have some QA capability to provide a safety net that encourages refactoring, some form of code review process to keep us honest and consistent, and hopefully some tools to help identify and automate refactoring opportunities.

In the beginning God created the code-behind

We have guidelines regarding naming conventions, casing, standard exception handling to name a few, we mostly follow the Microsoft guidelines.  There is a substantial amount of code written before these guidelines were agreed upon, and we change them over time as we learn more and new techniques become available. There’s always some refactoring work to be done in this regard. 

The main goal, however, has been to address the abuse of the code-behind.  Probably not all that unfamiliar to many, as our codebase has moved from classic ASP through the various versions of .NET it has carried with it the primitive code-behind and page_load centric model.  Naturally, the bulk and focus of our refactoring efforts goes into moving data access code and business logic out of pages into the business layer, increasing coherence by re-organizing and consolidating business logic into the correct business objects, and converting programmatic presentation logic into its declarative equivalent.  Essentially we want to make the code-behind as lean as possible, and the business objects as coherent as possible.  Not a lofty goal, and its a long way from MVC, MVP, DDD and so on, but its an essential first step before we can even consider a more well defined and established pattern or framework.

"So you wanna be a high roller"?

There’s a certain amount of risk in making any changes to the system and we’re essentially doubling the amount of change with each request by adding the additional refactorings.  There’s a long term payoff to these refactorings in terms of readability and maintainability (at the very least), but in the short term we have to mitigate the risk of “unnecessary” changes.  The way we do that is multifaceted; short iterations, unit tests, continuous integration, code reviews, and productivity tools.

  • What we mean by short iterations might not necessarily be what you expect.  We do a full system release every two weeks on a known schedule, completely independent of project or task schedules.  Therefore, all work targets a release and is fit into the two week cycle.  In a two week cycle that could mean one developer completes 10 bug fixes or one quarter of a 2 month project, but either way, whatever is complete and releasable is released into the live system.  This keeps the volume of change smaller and more manageable so that when things do go wrong the source of the issue can be more quickly pinpointed.
  • Unit tests are required for unit testable changes (and changes that may not be testable are often refactored to be unit testable).  The pre-existing as well as the new unit tests provide the safety net we need to safely refactor.  As long as the changes don’t break the unit tests, developers can feel reasonably confident about the changes they’ve made and be more aggressive about refactoring than they might be otherwise.  As the volume and completeness of unit tests increases we can be more and more aggressive.
  • We use CruiseControl.NET as our continuous integration server to monitor our Subversion repository and build and run unit tests whenever changes are committed.  This gives us nearly immediate feedback when changes do break something and as early as possible in the cycle.
  • We have an on-line code review process utilizing integration between Fogbugz and Subversion.  All system changes require a Fogbugz case # (large projects may be broken into numerous cases).  Each case includes change notes as well as the versions of the modified code attached.  Cases are then routed from the developer to a reviewer who can view the diffs directly from the case where they can approve the changes or comment and send them back for additional work before allowing them into the release.
  • In addition to the already mentioned tools, Subversion, CruiseControl.NET, Fogbugz, nUnit, our big gun is the widely used ReSharper which identifies refactoring opportunities, provides shortcuts to those refactorings and just does what ReSharper does.  We run a few other tools for informational and trending purposes like fxCop, StyleCop, CloneDetective, StatSvn, etc.  These tools don’t necessarily provide us any productivity gains at this point but some are incorporated into our build processes as indicators.

Let the healing begin

This is just a brief overview of how we approach incremental improvement.  Its working for us, albeit with a few glaring caveats.  Its undoubtedly a slow process, some portions of the code base almost never get touched and thus are unlikely to be refactored. Secondly, we have a hard time measuring our progress.  Code coverage, code duplication rate and lines of code are our best metrics. Just to give you some idea of what those metrics are; over approx. a two year period our code coverage has risen from 10% to close to 50%, duplication rate has dropped approx. 30% and our number of lines of code has begun trending downward even as we add features and grow the system.  Those are all good indicators that we’re heading in the right direction even if they don’t tell us how far we’ve come or how far we have left to go.