Tuesday, July 7, 2009

Dealing with Design Debt: Part I

There are two related terms that always come to mind when I’m asked to evaluate or implement any non-trivial change or addition to the line-of-business application systems that I work on; Technical debt and scar tissue.

First suggested by Ward Cunningham the idea of technical debt:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite.... The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.

And scar tissue as described by Alan Cooper:

As any program is constructed, the programmer makes false starts and changes as she goes. Consequently, the program is filled with the scar tissue of changed code. Every program has vestigial functions and stubbed-out facilities. Every program has features and tools whose need was discovered sometime after construction began grafted onto it as afterthoughts. Each one of these scars is like a small deviation in the stack of bricks. Moving a button from one side of a dialog box to the other is like joggling the 998th brick, but changing the code that draws all button-like objects is like joggling the 5th brick.

…As the programmers proceed into the meat of the construction, they invariably discover mistakes in their planning and flaws in their assumptions. They are then faced with Hobson's choice of whether to spend the time and effort to back up and fix things from the start, or to patch over the problem wherever they are and accept the burden of the new scar tissue—the deviation. Backing up is always very expensive, but that scar tissue ultimately limits the size of the program—the height of the bricks.

Each time a program is modified for a new revision to fix bugs or to add features, scar tissue is added. This is why software must be thrown out and completely rewritten every couple of decades. After a while, the scar tissue becomes too thick to work well anymore.

I will continue with these two analogies, the financial and the biological, as they both suggest dire consequences for applications that grow without accounting for the cumulative effects of all the design decisions and trade-offs made throughout the life of the application, yet so many systems grow organically in just that way. So what do you do with an application that’s already in debt and badly scarred?

The first stage is denial.

Whether its the other members of the your technology dept., technology leadership, or business owners; someone will likely need, not only to be convinced of the debt (unlike financial debt which is more obvious), but also understand the magnitude of the problem and why it ever need be “paid off”. This is precisely the quandary we found ourselves in several months ago.

For the better part of a year, or maybe longer, several seemingly reasonable large scale projects were assessed as being too costly (in terms of time and effort) and postponed, only to be periodically re-proposed with a slightly different twist or by another business owner. Over the same time frame, there were also many projects that either turned out to take longer, be more complicated than anticipated or have their planning padded due to the acknowledgement that they’d likely tread on some of the more heavily scarred tissue within our system. Recognizing this pattern, we were able to identify and raise the visibility of a fundamental design flaw with in our application systems. This particular design flaw was hurting our ability to add new functionality, collect accurate business intelligence data and do general application maintenance and enhancements. The module in question (and its structure) is fundamental, critical and core to our business and subsequently how the business logic has been written; the equivalent of a major organ.

That there was a problem within this module was easy enough to see for developers who had to tread lightly, create a complicated work-around, and/or hard-code a band-aid in order to bridge gaps between the current design and all those newly requested features that were grafted on over the years. Similarly, the problem manifested itself for those writing reports who felt like they were pushing the limits of SQL just to answer basic questions. However, what was less clear was determining the correct, or more correct, design for this module (more on that in later posts).

Moving to Acceptance

Having identified debt, which we were continuing to pay interest on, in the form of added complexity to many smaller projects, and its prohibitive complexity to larger projects, the case for paying the principal gained momentum. The trick was to identify how big the debt was, how much we were paying in interest and how much of the principal we needed to pay to get out from under.

We estimated that we were paying 10% interest on every project (for just this one design issue). This was a swag arrived at by looking at a sample time period and seeing how many of our projects dealt with, either directly or indirectly, the scarred module in question. This module, being so fundamental to our business, was estimated to be substantially involved in roughly 40% of our projects over the sample period. This estimate may seem high but another factor, tight coupling, made ripples inevitable. The second part of the estimate was equally as rough, in that we estimated paying a 25% complexity cost on those projects. Therefore, our rudimentary debt analysis yielded a cost of 25% on 40% of projects, which for simplicity’s sake became a flat 10% of all projects. Following this to its logical conclusion, we could see that not only were we living with substantial debt already and implicitly paying interest (without even taking in to account the projects that were simply not done because of the perceived complexity). Further, by continuing to build upon the unsure footing of a flawed module we’d be taking on more debt and could easily be seeing 30%-40% complexity interest on 50%-60% of our projects in the next year or two.

Putting the scarred module in those terms; projects that couldn’t be done, data that couldn’t be mined, and costs incurred on every project that weren’t even addressing the issue, was the basis for justifying the undertaking of a project to correct the design flaw in our core table structures. We were going to try to pay off the debt.

Next up: How do you perform a heart transplant on a runner while they’re running a marathon?

Further reading:

No comments:

Post a Comment