Tuesday, July 28, 2009

Linq-to-Sql, Entity Framework, NHibernate and what O/RM means to me.

We’ve recently completed an upgrade of our enterprise application suite, or home grown ‘mini-ERP’ for lack of a better term.  We upgraded from .NET 2.0 to .NET 3.5 SP1,  a fairly painless exercise as far as complete system upgrades go.  We didn’t really have to change much code to get there, but with .NET 3.5 we get a whole host of new features and capabilities.  There’s WCF, WF, WPF, Silverlight, ASP.NET MVC, Linq and the Entity Framework just to name a few.  Naturally, the question is which of these new capabilities do we try to take advantage of and in what order?

To use any of them represents a considerable learning curve, a significant change in our architecture and in some cases a paradigm shift in thinking about how we solve certain kinds of problems. As the title implies, in this post I intend to discuss persistence patterns and technologies.

Being persistent

Our current persistence pattern is likely a familiar one to most.  We have a collection of business objects that serve the dual purpose of both business logic layer and data access layer.  That is to say that the objects contain both data and behavior as well as the logic for how to hydrate and persist themselves.  These objects loosely follow an Active Record pattern, mapping roughly one-to-one with the underlying data structure entities.  The persistence logic consists of internal calls to stored procedures which return DataSets/Datareaders and are then manually mapped to the object properties (or private members), in turn the properties are mapped back to stored procedure parameters for persistence.  This works adequately and has the advantage of being already written, proven and time tested code. Why would I consider changing existing working code to utilize an O/RM?

SwwwORM

There’s a multitude of O/RM tools that have been around, widely used, feature rich and have achieved a level of maturity, none of which are dependent on .NET 3.5.  The most well known and probably most widely used of these is NHibernate.  Admittedly, my only experience with this tool was purging its remnants from a codebase I inherited a number of years back, not because of any deficiency with NHibernate, but because it was haphazardly implemented by a developer who didn’t understand it and then left the company.  At that time we had no experience with NHibernate, and were inundated with performance problems, locking and transactional issues with any code that used NHibernate.  Being as it was a relatively small percentage of the codebase utilizing NHibermate, we decided that rather than ramp up on NHibernate and fix the implementation, we’d instead favor consistency by replacing the NHibernate code with our tried and true DataSet pattern.  We weren’t ready then, but I knew at some point we’d need to revisit O/RM whether it be NHibernate or another of the multitude out there.  What does .NET 3.5 have to do with revisiting O/RM?

Linq is the word, its got groove its got meaning

For me, Linq is the catalyst to challenging our data access approach.  From the get-go, Linq-to-Objects and Linq-to-Xml were no-brainers.  To be able to query collections, xml documents, and DataSets with Sql-like syntax was an obvious and giant leap over nested foreach loops and similar conventions.  Linq-to-Sql and later Linq-to-Entities, however, are intriguing but much less obvious choices.  Considering that we already have hundreds of business objects, the rapid development features of generating code directly from the database schema and coding away isn’t compelling.  These two technologies, if we are to use them, would have to be shoehorned into our existing architecture.  We aren’t about to replace all our business objects with Linq-to-Sql’s ‘dumb’ data objects and throw away all our optimized stored procedures and hope for the best from the generated Sql.  Similarly, we aren’t going to convert all our business objects to EntityObjects and also hope for the best from the Linq-to-Entity Sql generator.  In both instances, we’d also be forced to find a place for all the other code and business logic contained in our business objects that wouldn’t play nice with the code generators, whether that be by using partial classes, inheritance or whatever.  Add to this the fact that L2S is supposedly dead, and EF is v1, I can’t make a strong case to jump on either of these as a business object replacement.  But do I really need L2S or EF to replace my business objects, or even want them to if they could?

Rise of the Data Layer

Code generation, tight coupling between database schema and objects, and Sql generation all become much more compelling at the data layer rather than the business layer.  If I view L2S objects, EF Entities, NHibernate generated objects, or any other O/RM objects as data transfer objects (dto) rather than business objects, then things get a little more interesting.  Using EF (I actually started out with L2S at first) to do some prototyping, I used a Repository Pattern in conjunction with Linq to query and persist data centric entities in a ‘detached’ manner.  What this approach allows me to do is replace my DataSets/Datareader mapping code with Entity mapping code, having the following benefits:

  • I only need to change the internals of my business objects, their structure, inheritance tree and interfaces are all unchanged.
  • My business objects are now mapped to type safe Entities, rather then DataSets and DataReaders, and with fewer lines of code.
  • Persistence is two-way, handled by mapping back to those same Entities, rather than passing a litany or parameters back to a stored procedure.
  • With very few lines of .NET code and no stored procedures I can handle SELECT, INSERT, UPDATE and DELETE, allowing me to drop all the boiler plate stored procedures that perform CRUD now, and not write new ones.
  • This same approach could have just as likely utilized L2S, NHibernate, LLBLGen, etc. giving me flexibility to chose or change tools without significant impact on the business objects.

This may not be POCO paradise or necessarily what the O/RM vendors intended, but for me it provides a compelling use for the technology while not forcing me to lock in and conform to the particulars of one tool.

Sometimes code is worth a thousand words so I’ve provided sample code. Download

Why use the Entity Framework?

Monday, July 13, 2009

Dealing with Design Debt: Part II

In my previous post I talked about technical debt and scar tissue and how we used those concepts to identify and justify a project to rework a core part of our mission critical business applications. In this post I want to talk about our approach to tackling such a high risk project which is largely viewed more as a necessary evil than an exciting new feature or capability.

Race to Baghdad

There are many ways that systems like ours grow and I hazard a guess that few are architected up front with carefully drawn schematics and blueprints. In our case, it would be more accurate to describe how we got here as a Race to Baghdad approach. The original mission was to get the business off the ground and keep it growing by erecting applications and tying them together into a larger whole always meeting the immediate needs. In that sense we won the war but not the peace. The business is thriving but the system is struggling with the aftermath of being stretched too thin with a lot of terrain left to go back and fortify.

In the context of our current debt reduction project, the terrain we’re fortifying is not an isolated ‘module’ (a term that I want to use loosely) core but insulated. Instead it is a hub in a spider web of dependencies. This means that hundreds of stored procedures, views, queries, reports and all the application logic built on top on them could be and often are directly connected to each other and used by other modules. Further, it means that radically altering the structure of essentially three entities, will have a ripple effect across the majority of our enterprise systems. Naturally, our first instinct was to find a stable wormhole, go back in time, build a loosely coupled modular system, then slingshot around the sun back to the present where we’d then begin our debt reduction project. After a few Google searches failed to yield any results we switched to Plan B.

An iterative heart transplant?

We generally like to follow an agile-ish approach and develop in short iterations, releasing frequently. However, this project felt like it wasn’t well suited for that approach. Many, including those on the team, may question the wisdom of that decision now or in retrospect. The decision to perform a few large iterations rather than dozens of smaller ones was not made lightly. The presiding sentiment, at the onset of the project, was that making a small modification to each of a number of entities one at a time and then sweeping through the entire code-base following the ripples would lead to too many iterations, and more importantly too much repetitive work (sweeping through the code-base over and over again) and a prohibitive duration. A project like this, which won’t have visible tangible business benefits until final completion, coupled with the prospect of an extremely long duration lead us to deviate from our norm. We decided to attempt to shorten the duration by completing fewer sweeps through the code, acknowledging both the longer than normal iteration cycles and the danger inherent in massive changes released all-at-once. We foresaw three major iterations, each bigger than its predecessor.

“This shit's somethin. Makes Cambodia look like Kansas”

1) Reduce the surface area. Borrowing a term from the security world, we decided that since we didn’t have modularity on our side, at least we could, in the first iteration, eliminate dependencies and hack away the strands of the spider web. This made sense from several perspectives:

  • It allowed us to get a fuller picture of what we were dealing with by going through the entire code-base once, without being overcommitted. If our initial estimates of scope and cost were off, we could adjust them and/or allow the stakeholders to pull the plug on the project.
  • Removing unused and seldom used code, combining and reducing duplication, and refactoring to circumvent easily avoidable dependencies in of itself was a debt reductive activity (or perhaps the analogy or removing scar tissue is more appropriate here). If the plug were to be pulled on the project due to technical, resource, financial or any of a number business re-prioritization reasons, completing this activity would be a significant step towards modularity and simplification. There was immediate ROI.
  • It gave the stakeholders a legitimate evaluation point at which to decide to continue or not without “throwing away” work.

2) Neuter the entity. One critical entity was radically transforming both in terms of its attributes and its relationship to other entities. At this step we aimed to move the attributes to their new homes on other entities or eliminate them entirely, while at the same time preserving the relational structure. This meant that some code would have to change, but a significant number of queries, reports, and the like could remain only marginally effected as the JOINs would stay viable. It would also mean that the eventual change to the entity’s relationships would be slightly less impactful because most of its attributes had been removed leaving it more or less a relational placeholder. At this point we’d also write the data migration scripts.

3) Break the chains. The last step would be to sever the relationships and reorganize the entities, effectively breaking all code that had previously relied on those relationships. Once this was done, there was no going back, no partially effected queries or seemingly unaffected business logic. We struggled with a way to do this without one massive all-at-once ripple, but couldn’t find one (without going back to the “long duration” approach).

“Plans are useless: planning is invaluable” (Eisenhower)

Currently we’re working on breaking the chains. We successfully reduced the surface area, reaping more benefit (getting rid of more scar tissue than anticipated) in a shorter time than expected. However, neutering the entity proved elusive, perhaps out of necessity but equally likely due to a lack of resolve on our part to stay committed to the plan. Nevertheless, some of the work done there wasn’t sufficiently insulated enough to release. Therefore, what could not be released safely into the production system has slipped into the current iteration making it bigger than we’d hoped. The lessons continue to be learned.

The trilogy will conclude with: The Final Iteration

Tuesday, July 7, 2009

Dealing with Design Debt: Part I

There are two related terms that always come to mind when I’m asked to evaluate or implement any non-trivial change or addition to the line-of-business application systems that I work on; Technical debt and scar tissue.

First suggested by Ward Cunningham the idea of technical debt:

Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite.... The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.

And scar tissue as described by Alan Cooper:

As any program is constructed, the programmer makes false starts and changes as she goes. Consequently, the program is filled with the scar tissue of changed code. Every program has vestigial functions and stubbed-out facilities. Every program has features and tools whose need was discovered sometime after construction began grafted onto it as afterthoughts. Each one of these scars is like a small deviation in the stack of bricks. Moving a button from one side of a dialog box to the other is like joggling the 998th brick, but changing the code that draws all button-like objects is like joggling the 5th brick.

…As the programmers proceed into the meat of the construction, they invariably discover mistakes in their planning and flaws in their assumptions. They are then faced with Hobson's choice of whether to spend the time and effort to back up and fix things from the start, or to patch over the problem wherever they are and accept the burden of the new scar tissue—the deviation. Backing up is always very expensive, but that scar tissue ultimately limits the size of the program—the height of the bricks.

Each time a program is modified for a new revision to fix bugs or to add features, scar tissue is added. This is why software must be thrown out and completely rewritten every couple of decades. After a while, the scar tissue becomes too thick to work well anymore.

I will continue with these two analogies, the financial and the biological, as they both suggest dire consequences for applications that grow without accounting for the cumulative effects of all the design decisions and trade-offs made throughout the life of the application, yet so many systems grow organically in just that way. So what do you do with an application that’s already in debt and badly scarred?

The first stage is denial.

Whether its the other members of the your technology dept., technology leadership, or business owners; someone will likely need, not only to be convinced of the debt (unlike financial debt which is more obvious), but also understand the magnitude of the problem and why it ever need be “paid off”. This is precisely the quandary we found ourselves in several months ago.

For the better part of a year, or maybe longer, several seemingly reasonable large scale projects were assessed as being too costly (in terms of time and effort) and postponed, only to be periodically re-proposed with a slightly different twist or by another business owner. Over the same time frame, there were also many projects that either turned out to take longer, be more complicated than anticipated or have their planning padded due to the acknowledgement that they’d likely tread on some of the more heavily scarred tissue within our system. Recognizing this pattern, we were able to identify and raise the visibility of a fundamental design flaw with in our application systems. This particular design flaw was hurting our ability to add new functionality, collect accurate business intelligence data and do general application maintenance and enhancements. The module in question (and its structure) is fundamental, critical and core to our business and subsequently how the business logic has been written; the equivalent of a major organ.

That there was a problem within this module was easy enough to see for developers who had to tread lightly, create a complicated work-around, and/or hard-code a band-aid in order to bridge gaps between the current design and all those newly requested features that were grafted on over the years. Similarly, the problem manifested itself for those writing reports who felt like they were pushing the limits of SQL just to answer basic questions. However, what was less clear was determining the correct, or more correct, design for this module (more on that in later posts).

Moving to Acceptance

Having identified debt, which we were continuing to pay interest on, in the form of added complexity to many smaller projects, and its prohibitive complexity to larger projects, the case for paying the principal gained momentum. The trick was to identify how big the debt was, how much we were paying in interest and how much of the principal we needed to pay to get out from under.

We estimated that we were paying 10% interest on every project (for just this one design issue). This was a swag arrived at by looking at a sample time period and seeing how many of our projects dealt with, either directly or indirectly, the scarred module in question. This module, being so fundamental to our business, was estimated to be substantially involved in roughly 40% of our projects over the sample period. This estimate may seem high but another factor, tight coupling, made ripples inevitable. The second part of the estimate was equally as rough, in that we estimated paying a 25% complexity cost on those projects. Therefore, our rudimentary debt analysis yielded a cost of 25% on 40% of projects, which for simplicity’s sake became a flat 10% of all projects. Following this to its logical conclusion, we could see that not only were we living with substantial debt already and implicitly paying interest (without even taking in to account the projects that were simply not done because of the perceived complexity). Further, by continuing to build upon the unsure footing of a flawed module we’d be taking on more debt and could easily be seeing 30%-40% complexity interest on 50%-60% of our projects in the next year or two.

Putting the scarred module in those terms; projects that couldn’t be done, data that couldn’t be mined, and costs incurred on every project that weren’t even addressing the issue, was the basis for justifying the undertaking of a project to correct the design flaw in our core table structures. We were going to try to pay off the debt.

Next up: How do you perform a heart transplant on a runner while they’re running a marathon?

Further reading: