Wearing Off the Elbow Patches

Alistair Atkinson's Blog

Month: January, 2011

Yield and harvest as a way of thinking about fault-tolerance

I was put onto an excellent article by Coda Hale titled ‘You Can’t Sacrifice Partition Tolerance‘ which expands on some of the ideas laid out in Dr Eric Brewer’s CAP Theorem.

The CAP Theorem is the distributed systems version of the classic engineering maxim of “Good, fast, cheap: pick two”. Briefly stated, it declares that any distributed system may only have two of the following properties: consistency; availability; partition tolerance. It makes perfect sense when you think about it, as it’s logically impossible to remain both consistent and available in the case of the system becoming partitioned due to node failure/disconnections etc.

The case raised by Hale is that, for any distributed system, partition tolerance is not something that can simply be chosen. That is, disconnections and node failures etc are guaranteed to happen at some point in time. They are simply an annoyingly constant fact of life for anyone involved in the field. Therefore, any real-world distributed system must support some kind of fault-tolerance, otherwise it’s useless.

This means that the distributed systems designer is left to decide the following: given that system failures will occur, and your system may become partitioned, is your system going to degrade by sacrificing consistency, or availability? It depends on the system requirements as to which should be chosen, however it’s the unfortunate face that you simply can’t preserve both.

Given this, Hale suggests that a better way to think about handling failure is by using yield and harvest. Yield is defined as the impact of unavailability on handling incoming requests (related to but different to uptime), and harvest is a measure of how much of the required data is available to fulfil a request (ie. if a request needs to data from two servers, one of which is unavailable, the harvest will be only 50%).

Hale’s point is that you can choose which of these will be affected in the case of system failures and partitions. Is it better for some requests to be dropped in the interests of preserving accuracy (harvest), or can we handle a lower harvest in the interests of maintaining responsiveness? Again, the system requirements determine the answer.

I find this a very useful way to think about how a system is going to degrade when faults occur. After all, if you design something to be scalable (increase throughput as nodes are added) it makes perfect sense to consider how a system will degrade/scale down as nodes are lost. It can only be beneficial to be completely clear on the choices and tradeoffs involved, and that, ultimately, you can’t have your cake and eat it too when it comes to consistency and availability.

My 2010 in review

A quick summary of the highlights of my year:

  • Started out living in Napier, NZ, finished up in Brisbane, Aus. Miss some things about Napier, but no regrets. Brisbane is a fantastic place to live, and I’m not moving again in a hurry.
  • Started out working as a lecturer at EIT, finished up as an engineer at Merge Gaming. Found that I was happy with the career change. No regrets at all.
  • Also found that (thankfully) I still had my coding chops, and that I enjoy building software.
  • Started learning about all the neat stuff that’s been happening in the industry that I missed due to a myopic focus on my PhD.
  • Oh yes, officially (finally) graduated in April with a doctorate.
  • Presented a paper at ACSW2010 in Brisbane in January.
  • Started learning about web design and development, something I’d always done my best to maintain a militant ignorance about. Remedial education in HTML, CSS, Javascript. Always wanted to learn Python, so getting into Django. Turns out it’s all actually pretty fun.
  • Started a little web project that I’d like to get finished over the summer.
  • Haven’t got out mountain biking as much as I’d have liked, but discovered some nice trails around Mt Coot-tha. They’re not a jot on what I had on my doorstep this time last year though *sigh*.

So overall, it’s been a great year with a lot of change, all of which has thankfully turned out to be positive. Hopefully 2011 will be slightly less hectic but just as productive.

Follow

Get every new post delivered to your Inbox.