What went wrong with the distributed cache?

November 14, 2011

The basic purpose of the distribute cache was to address the following conditions:

We moved from one webserver, to two webservers, with the intention of having the flexibility to move to N webservers. The one webserver is utilizing the ASP.NET Cache (in-memory) for heavily utilized read-only objects (Category, ProductClass, Product). Moving to two webservers meant a doubling of database queries for cached objects. Each webserver having its own copy of the ASP.NET cache that it needs to load.
The ASP.NET cache competes for memory with the application itself, as well as the outputcache. An increase in memory pressure caused by any one of them causes cache trimming (items to be evicted from the cache). This results in more database traffic to re-load the evicted items.
Application restarts (application pool recycles, etc.) cause the cache to be flushed and reloaded.
The database is the mostly likely bottleneck and is the most difficult to scale. We can add more webservers, but we cannot easily add more database servers. Thus using caching as efficiently as possible is the best way to offload database traffic to the web servers.
The theoretical ability for backend systems to effect and/or participate in the distributed cache. (e.g. backend systems could update or expire a product in the eCommerce cache when when a price changes)

NCache’s distributed cache addresses these conditions by providing:

One copy of the cache replicated across the webserver nodes.
Its own dedicated process and memory space that could be configured independently and would not compete with the ASP.NET application or the outputcache for memory.
The cache would be durable and survive application recycles and even the reboot of one of the nodes.
Purported fast throughput, 30,000 cache reads per second.

No small matter

The first major challenge to enabling distributed caching was our object structure and distributed caching’s reliance on serialization. Our object graphs are deeply intertwined and utilize lazy-loading heavily. These two facts were challenges for distributed caching. The object graphs were large, duplicative and need to be fully loaded prior to serialization rather than lazy-loaded on demand. The same object might be attached to different graphs repeatedly (e.g. the same manufacturer object might be attached to hundreds of product classes).

I spent considerable time creating boundaries, reducing duplication, and eager loading the graphs prior to objects being placed in the cache. With this distributed cache friendly refactoring I was ready to enable distributed caching and do some rudimentary load testing in our test environment.

Not so fast

What I found was that NCache easily became the top resource consumer on the webservers under any kind of load. Performance with distributed caching on, as compared against the same two nodes with separate ASP.NET caches, was measurably worse. In limited load testing, the overhead of distributed caching appears to far exceed any performance gain of maintaining one synchronized out-of process copy. Far from achieving 30,000 reads/sec, under about 2000 reads/sec I could see NCache causing thread locking and reads taking as long as 200 ms.

There’s a rather significant caveat to these findings; my load testing was in no way indicative of true load. It consisted of essentially clicking through the same 4 pages repeatedly in extremely rapid succession using a load test tool simulating 25 users. Its entirely possible that under a truer load the overhead of distributed cache access could be more balanced with other processing activities and that the synchronization of the cache would prove to be more beneficial. Nevertheless its more likely that the overhead found during load testing would also exist in production and result in overall performance degradation.

A distributed cache is more like a database

Reading from a distributed cache incurs overhead. The conclusion I draw from this is that a distribute cache is more like a database than it is like the in-memory in-process caching provided by the ASP.NET Cache. With the ASP.NET cache, reading and writing to the cache are essentially free. We’re basically reading and writing memory pointers from a Dictionary. Reading Category objects out of the cache hundreds of times in the course of one page request has negligible performance implications. However, with a distributed cache, even a super fast one, those same hundreds of cache reads can add up quickly. The distributed cache may be local (depending on your topology), and store everything in memory, but you still need to serialize objects in and out of it over a socket connection, and unless you’re judicious in its use, that can get expensive more quickly than you might expect.

Does it or doesn’t it add up?

The obvious question is what are other NCache customers doing differently, or how do large sites make use of distributed caching (facebook uses memcached, stackoverflow use Redis) given the fact that even in our small environment with meager load we find that it can easily hurt performance. Is it a matter of scale, do you need to be using 10 webservers before benefits of a centralized cache out weigh the overhead? Or are they just smarter about their cache access. Maybe NCache is the wrong product, we have the wrong version, or there’s still something ‘funny’ about the performance and configuration of our web farm servers?

At some point, after the holidays, I intend to enable NCache and capture some performance data with dynatrace to gauge it under true load and see if any new insights are revealed.

Nappi Sight