Monday, July 13, 2009

Dealing with Design Debt: Part II

In my previous post I talked about technical debt and scar tissue and how we used those concepts to identify and justify a project to rework a core part of our mission critical business applications. In this post I want to talk about our approach to tackling such a high risk project which is largely viewed more as a necessary evil than an exciting new feature or capability.

Race to Baghdad

There are many ways that systems like ours grow and I hazard a guess that few are architected up front with carefully drawn schematics and blueprints. In our case, it would be more accurate to describe how we got here as a Race to Baghdad approach. The original mission was to get the business off the ground and keep it growing by erecting applications and tying them together into a larger whole always meeting the immediate needs. In that sense we won the war but not the peace. The business is thriving but the system is struggling with the aftermath of being stretched too thin with a lot of terrain left to go back and fortify.

In the context of our current debt reduction project, the terrain we’re fortifying is not an isolated ‘module’ (a term that I want to use loosely) core but insulated. Instead it is a hub in a spider web of dependencies. This means that hundreds of stored procedures, views, queries, reports and all the application logic built on top on them could be and often are directly connected to each other and used by other modules. Further, it means that radically altering the structure of essentially three entities, will have a ripple effect across the majority of our enterprise systems. Naturally, our first instinct was to find a stable wormhole, go back in time, build a loosely coupled modular system, then slingshot around the sun back to the present where we’d then begin our debt reduction project. After a few Google searches failed to yield any results we switched to Plan B.

An iterative heart transplant?

We generally like to follow an agile-ish approach and develop in short iterations, releasing frequently. However, this project felt like it wasn’t well suited for that approach. Many, including those on the team, may question the wisdom of that decision now or in retrospect. The decision to perform a few large iterations rather than dozens of smaller ones was not made lightly. The presiding sentiment, at the onset of the project, was that making a small modification to each of a number of entities one at a time and then sweeping through the entire code-base following the ripples would lead to too many iterations, and more importantly too much repetitive work (sweeping through the code-base over and over again) and a prohibitive duration. A project like this, which won’t have visible tangible business benefits until final completion, coupled with the prospect of an extremely long duration lead us to deviate from our norm. We decided to attempt to shorten the duration by completing fewer sweeps through the code, acknowledging both the longer than normal iteration cycles and the danger inherent in massive changes released all-at-once. We foresaw three major iterations, each bigger than its predecessor.

“This shit's somethin. Makes Cambodia look like Kansas”

1) Reduce the surface area. Borrowing a term from the security world, we decided that since we didn’t have modularity on our side, at least we could, in the first iteration, eliminate dependencies and hack away the strands of the spider web. This made sense from several perspectives:

  • It allowed us to get a fuller picture of what we were dealing with by going through the entire code-base once, without being overcommitted. If our initial estimates of scope and cost were off, we could adjust them and/or allow the stakeholders to pull the plug on the project.
  • Removing unused and seldom used code, combining and reducing duplication, and refactoring to circumvent easily avoidable dependencies in of itself was a debt reductive activity (or perhaps the analogy or removing scar tissue is more appropriate here). If the plug were to be pulled on the project due to technical, resource, financial or any of a number business re-prioritization reasons, completing this activity would be a significant step towards modularity and simplification. There was immediate ROI.
  • It gave the stakeholders a legitimate evaluation point at which to decide to continue or not without “throwing away” work.

2) Neuter the entity. One critical entity was radically transforming both in terms of its attributes and its relationship to other entities. At this step we aimed to move the attributes to their new homes on other entities or eliminate them entirely, while at the same time preserving the relational structure. This meant that some code would have to change, but a significant number of queries, reports, and the like could remain only marginally effected as the JOINs would stay viable. It would also mean that the eventual change to the entity’s relationships would be slightly less impactful because most of its attributes had been removed leaving it more or less a relational placeholder. At this point we’d also write the data migration scripts.

3) Break the chains. The last step would be to sever the relationships and reorganize the entities, effectively breaking all code that had previously relied on those relationships. Once this was done, there was no going back, no partially effected queries or seemingly unaffected business logic. We struggled with a way to do this without one massive all-at-once ripple, but couldn’t find one (without going back to the “long duration” approach).

“Plans are useless: planning is invaluable” (Eisenhower)

Currently we’re working on breaking the chains. We successfully reduced the surface area, reaping more benefit (getting rid of more scar tissue than anticipated) in a shorter time than expected. However, neutering the entity proved elusive, perhaps out of necessity but equally likely due to a lack of resolve on our part to stay committed to the plan. Nevertheless, some of the work done there wasn’t sufficiently insulated enough to release. Therefore, what could not be released safely into the production system has slipped into the current iteration making it bigger than we’d hoped. The lessons continue to be learned.

The trilogy will conclude with: The Final Iteration

No comments:

Post a Comment