Friday, March 30, 2012

Cuckoo for POCO Puffs: Prequel

What exactly am I hoping to accomplish by moving to EF code-first or NHibernate?

The short answer is that I'm looking to gain features that I'd otherwise have to implement while at the same time simplifying and reducing the amount the code we already have.  For the benefit of getting my own thoughts in order I want to articulate a slightly longer answer which will inventory the features and identify the potential areas of simplification and reduction.

And that is what grieves me the most

We have a substantial codebase which utilizes our own class-in-charge patterned persistence ‘framework’, the bulk of which still works by using the Enterprise Library and manually mapping ADO.NET structures (datarows, datareaders, commands) back and forth to the database. This approach first started in the .NET 1.1 era. The framework mainly consists of a few base classes that provide more of a template (for consistency) than anything else. This approach has lasted a long time and continues to work. Nevertheless, there’s a certain amount of tedium and ceremony with our existing approach. For every domain object there are potentially the four classic (CRUD) stored procedures that must also be created, plus any other variations to provide alternative means of retrieval or persistence, and of course all the mapping logic. This can mean an explosion of stored procedures to meet a variety of query alternatives, or one procedure with a multitude of conditional switches executing different query versions based on parameter combinations. Invariably, to keep the number of procedures manageable, the same stored procedure is used to serve multiple purposes at the expense of efficiency.

Ah. It must’ve been when I was younger

Our active record pattern-ish approach leads to additional difficulties because its not well suited to treating complex graphs as aggregates, we often find ourselves in the eager vs. lazy load conundrums, validation is ad-hoc, concurrency is seldom handled, and as I've lamented caching is non-trivial. One of the more obvious cases where our framework struggles is the oft encountered situation where one use-case implies that domain objects would be more efficiently eager-loaded as an entire graph, while those same domain objects are more efficiently lazy-loaded within another use-case in a different module within the system.  The same is true of caching.

Its possible to evolve our framework, use code generation, and steal bits and pieces from other frameworks, and I’ve done a little of that. But that’s an awful lot of wheel inventing and plumbing.


My children, I have watched you for a thousand years.

For a long time now there have been frameworks that provide a much richer set of services, both at the domain layer and data layer (and some debate about whether these frameworks could or should serve both). The frameworks that I'm most aware of are Entity Framework, NHibernate, CLSA.NET and SharpArchitecture. Therefore, these are the focus of my re-evaluation. The features of each that I'm interested in are:

Entity Framework 4.3

  • Sql Generation
  • Linq support
  • Change Tracking
  • Validation facilities
  • Concurrency support

NHibernate

  • Sql Generation
  • Linq support
  • Change Tracking
  • Validation facilities
  • Concurrency support
  • Level 2 cache
  • Logging/tracing


SharpArchitecture

  • Implementation of DDD
  • Loose coupling

CSLA.NET

  • Validation
  • Security
  • Transaction manager
  • Business Rules engine
  • Rich databinding support
  • N-level undo
  • aggregate roots
  • Mobile objects/portability

Infidel Defilers. They shall all drown in lakes of blood

Our first foray into the frameworks was EF v1, and that did not deliver on the promise of code reduction or simplicity. At best it was a wash, making simple things simpler and difficult things more difficult, mostly just pushing complexity around. That's likely partially the fault of our approach and partly that of the the agony of EF v1.  

I wish to speak to you now

With EF 4.3 code-first and FluentAPI we've taken a fresh look, and with its apparent similarities to Fluent Nhibernate it made sense to look at both.  I still like the idea of deploying changes and not having to deploy both code and SQL scripts simultaneously. Of course, the downside is that we can't change them independently. Oftentimes we’ve found it advantageous to be able to optimize SQL in a stored procedure and change it without having to recompile and redeploy an application. Some might suggest that we could get the best of both worlds by combining O/RM's with stored procedures (because they support this), but I feel like this is the worst of both worlds. By the way, I'm sorry but O/RM's don't generate more efficient code than I could write.  Nevertheless, generated SQL provides the opportunity to query the same objects differently in different parts of the application and get more appropriate SQL. This is even more true with column level change tracking.  Having code that is smart enough to only update what was changed rather than sending all values back and forth, as well as knowing when no update is necessary, would be a meaningful improvement.

This flame will burn away the darkness, burn you the way to paradise

Code that can easily be toggled between eager and lazy loading by use-case is certainly desirable. And having all the complexity of caching handled for me would be a dream come true.  That’s one of the more exciting differentiating features of NHibernate. But its not quite clear to me how well that works and whether we’d be able to reach a point where enough of the data access is managed by NHibernate to take advantage of Level 2 caching.

That is strength, boy! That is power!

Since 2005 I've been fascinated with CSLA.NET (although admittedly not fascinated enough to actually build something with it) and more so with each successive version. Its not an O/RM but it has some overlap in terms of the facilities it provides, like validation and change tracking. Additionally, it provides one specific advantage that I find particularly intriguing. The concept of mobile objects and with it a measure of portability. The ability to use the same business logic on both sides of RIA application, whether that be Silverlight, WPF, WP7, or even Monotouch and Monodroid is especially compelling. With the explosion of devices and platforms this is sounding more and more powerful.

Contemplate this on the tree of woe.

These frameworks are not mutually exclusive. We could use EF or NHibernate as both our data access and domain layer, or we could CLSA with EF or NHibernate, and SharpArchitecture already leverages NHibernate.  All this must be weighed against the time, effort, approach, and risk of switching. If the application(s) were to be built from scratch today, we'd clearly be using one or more of these, but alas I'm architecting after the fact. I have to deal with the knowledge that a migration is fraught with peril.

Tuesday, March 20, 2012

Cuckoo for POCO Puffs: Week 1

I completed converting my moderately complex sub-model (24 entities) from the database-first style to code-first style this week.  The results are pretty encouraging.

I find your lack of faith disturbing

The use of foreign key associations, POCO’s and the Fluent API vastly simplified the resulting code.  Not that I didn’t encounter my share of learning curve struggles.  The fluent api syntax for declaring relationships is a bit clumsy, and the difference between declaring properties as virtual or not and its resulting impact on lazy load behavior confused me on at least one occasion.  Notwithstanding those, the bulk of the effort was simply the brute typing involved in re-implementing all the class, repositories and relationships, as well as refactoring tests to use the new code. The EF 4.x POCO Entity Generator gave me a really good starting point, although it didn’t allow me to pick which tables, instead generating the whole database.

All too easy

What I found most compelling with POCOs, is that I no longer needed to jump through all the hoops to detach and re-attach object graphs.  You may recall we used this approach, Reattaching Entity Graphs with the Entity Framework, to handle this problem in EF v1.  Apparently, and it kind of makes sense with POCOs, that detaching and re-attaching is less of an issue.  My code-first repositories are so much simpler that I’m suspicious that either I’ve missed something or my chosen sub-model wasn’t sufficiently complex enough to surface the still lurking issues of detaching objects.  Then again, one thing I haven’t attempted yet is persisting a complex object graph as one unit, relationships and all.

Asteroids do not concern me

Once the object leaves the context its essentially detached.  One odd side-effect of this and lazy loading is that accessing a lazy property (even to check if its null) outside of the context results in an exception.  There doesn’t appear to be a good way to check if properties are loaded outside of the context.  To handle this I disabled lazy loading, which might seem extreme, except that I was already being explicit using Include(), and having EF attempt to lazy load detached objects doesn’t serve much purpose.

Another oddity I haven’t quite grokked is that foreign key associations don’t appear to stay in sync with their ID properties even after a SaveChanges() (e.g. if I update the StateId on an Address, the Address.State association doesn’t seem have the correct State after the persistence).  Its not a feature I’ve thus relied on so its not an issue in this case but something I’ll want to understand better if I go forward.

Impressive, most impressive

Overall, I’m very impressed with combination of code-first, POCO and the fluent api in EF 4.3.   I don’t see any reason not to convert our existing EF code to this approach, other than the time, effort and risk.  It may not be flawless, but it’s a considerable improvement over the contorted v1 style approach currently employed. There are two additional points of inspiration gleaned from this effort:

  1. The POCOs appear as if like they could just as easily work with NHibernate as with EF.  They have no dependency on or direct relationship with EF (thanks the the fluent api) and all the properties are virtual.
  2. There is a striking similarity between the POCO objects and our Domain objects in many cases,  In some cases, the POCO object is nearly identical to the Domain object that its hydrating.  This suggests to me that we may be able to collapse the two layers. 

Perhaps I can find new ways to motivate them

Next up I’m going to try to implement this same sub-model using the same POCO objects, but this time using Fluent NHibernate.  NHibernate is more mature, supports more databases, and has Level 2 cache support baked in.  All else being equal, NHibernate might be the better choice, so I want to see how equal all else is.

At some point I should probably explain what I’m trying to accomplish with all of this.

Friday, March 16, 2012

RIF Notes #15

“If you don’t know why it works, it probably doesn’t. You just don’t know it yet.” – Steve McConnell

“I need serenity
In a place where I can hide”

Wednesday, March 14, 2012

Cuckoo for POCO Puffs: Day 1

Now that we’re firmly on .NET 4.0, and EF 4.x has arrived I’ve begun revisiting our EF v1 style repository pattern looking to see if the new code-first and POCO support will help mitigate some of the frustration we’ve encountered with Entity Framework. The pain we’ve experienced over the past couple of years primarily falls into two buckets:

  1. The models (EDMX and designer files) can become large, unwieldy and idiosyncratic.  Refreshing it with underlying changes to the database can often result in cryptic errors, and source control merges of these files can be painful.
  2. Independent Association relationships are tricky and complicated when working in a disconnected mode.  Properly navigating these relationships often involves loading entities from two or more contexts, associating them together, and then getting them to properly persist as a complete graph when reattached to a new context.

Models, Inc.

Over the years we’ve mitigated some of the giant model problems (our primary database contains just shy of 400 tables)by creating sub-models.  We’ve created these sub-models by clustering groups of related tables in their own model.  This has helped quite a bit, but has also led to a bit of code duplication in the form or repetitive boilerplate code.  Its likely that could also be mitigated with some T4 code generation, but we never got there.  Additionally, some tables belong to more than one model, and querying across models isn’t possible.  Lastly we had to employ some tricks to make sure we don’t have to have a connection string for each sub-model.

Relationship advice

We don’t have a good answer for the contortions we have to go through to manage disconnected relationships.  Up to this point we’ve treated EF entities as DTOs, following Rocky’s Lohtka’s advice, rather than using EF entities to replace our domain objects.  On the other hand, Ayende disagrees with this approach, at least in the case of NHibernate.  His recent posts about limiting abstractions seems like its directed right at us and our repository over EF solution.

If you are writing an application and you find yourself writing abstractions on top of CUD operations, stop, you are doing it wrong.

But I digress…

We have a considerable amount of code invested in this approach, domain objects mapped to EF entities (as well as straight up ADO.NET), so before we throw away both our domain and data access layers in one fell swoop, I figured the first step was to see whether code-first POCO objects could assuage our model problem and if Foreign Key associations could sooth our relationship pains. In the process I hope to get a better sense of whether EF or another O/RM might eventually serve in the domain.

Keeping it real

Naturally, I chose our simplest case to start with, a sub-model with only two entities.  I began to convert our database-first sub-model to code-first, picking the Fluent API over Data Annotations for a few reasons:

  1. I’m not fond of polluting the objects with persistence attributes if I can avoid it.
  2. The Fluent API is a superset of data annotations.  I’m unlikely to be able to do everything with Data Annotations alone, and rather than have configuration information in two places, I decided to centralize it in EntityConfiguration classes.
  3. If I was going to consider EF as a replacement for my domain objects, Fluent API offered me the best opportunity to do that with the least amount of change to the domain objects.  Keeping the POCO’s pure, might even lend itself to more easily attempt a similar approach with a NHibernate (which I have next to no experience with, so this may be a pipe dream).

Trivial pursuit

One of the first things I discovered is that I couldn’t mix code-first in the same assembly with my existing database-first models.  After one day, I can’t say that this is necessarily impossible but I found some indications that it wouldn’t work.  Rather than fight it, I created a separate assembly for the code-first data access and began building my two POCO entities, my new DbContext (which I found much easier to leverage existing connection strings), and a repository with the same signature as the existing database-first repository. 

By the end of the day I had swapped out my unit tests and domain classes to use the new code-first repository instead of the database-first and had all my unit tests passing.  I then deleted the model, designer, and the rest of my database-first artifacts, just to be sure.  The code-first version definitely feels cleaner, fewer files with less code.  One feature of database-first that is both a benefit and a curse is its ability to refresh itself from the database.  I don’t see an obvious way to mimic that with code-first, but then again maybe I won’t need to.  Overall, day 1 went pretty smoothly, albeit with a trivial example. 

Real World/Road Rules Challenge

Next I’m going to pick a more complex sub-model, one with 20+ tables with complex relationships, and begin the conversion. I expect this will take more than a day, but will give me a much better idea of the advantages, feasibility and effort of doing a full conversion to code-first.