The Fallacy of vHadoop
I really hate writing these kind of posts, especially when there is the remote possibility that it will look like I’m going after someone I have the utmost respect for, Richard McDougall. The guy is a legend and borders on deity status in my books. So keep that in mind when you’re reading this… I’m not attacking anyone, just providing an ‘other side of the coin’ kind of view to what VMware marketing would have you believe. Maybe I should re-adopt my old moniker of “cutting a swath of destruction through marketing bullshit” (those are my old vinternals.com novelty business cards :)
VMware recently announced “Project Serengeti”, a tool designed to rapidly deploy Hadoop clusters on top of vSphere 5, with an accompanying whitepaper “Virtualizing Apache Hadoop”. A whitepaper that is unfortunately not without some glaring omissions / understatements / contradictions.
Let’s take a look at the “Use Cases and advantages of virtualizing Hadoop” section of that whitepaper.
Serengeti uses (the awesome) Chef under the hood for provisioning. Now obviously spinning up VM’s is infinitely faster than deploying physical hardware, but when you’re talking about scaling out a Hadoop cluster you’re talking about adding TaskTracker / Data nodes. As anyone who has run Hadoop in production knows, this kind of node is not your little 2 vCPU / 8GB RAM webserver - it’s a hungry beast that will generally consume as much CPU, memory and IO as you can throw at it (the amount of each obviously varies with the task at hand). The typical resource configuration for this kind of node is dual socket / multicore / 24GB+ of RAM. Could you *really* rapidly deploy a meaningful number of VMs with that spec in your environment? Unless you are using Joyent or Amazon, neither of which Serengeti supports, I am betting the answer is no.
Once you have provisioned the virtual machine, the rest of the rapid scale out configuration is done within the guest OS. Which applies equally to physical hosts.
High Availability / Fault Tolerance
Again, anyone who has run Hadoop in production is painfully aware of the single points of failure that Hadoop 1.0 has. But VMware doesn’t have any technology that can provide better protection than the standard Secondary NameNode configuration offers. VMware HA may be able to restart the NameNode in the event of an underlying host failure, but when it comes up it will still need to read in the fsimage and rebuild the in-memory HDFS block-to-node mapping by querying all the data nodes just like you would if you had to promote the Secondary NameNode to Primary, so if you’re configured your physical environment correctly (DNS aliases etc) the recovery time will be roughly equivalent for either scenario.
Mentioning VMware Fault Tolerance as a solution for NameNode / Job Tracker availability is just being straight up misleading. For those of you not familiar with VMware Fault Tolerance, it requires the VM to have a maximum of 1 vCPU. You really going to run either of those nodes with a single CPU?
There’s a nice little graphic here depicting islands of applications and hardware that could be much more efficiently run across a common virtualisation layer. But lets have a look at the reality of that with a Hadoop deployment.
For starters, lets look at memory overheads. Say you were running each data node with a modest 6 vCPU’s and 24GB RAM. With vSphere 4.1, this could incur around 1GB of memory overhead per VM. Funnily enough, VMware doesn’t provide the same table for vSphere 5 anymore, instead showing a contrived sample of what memory overhead is required to just power on a VM (that’s almost a topic for another blog post).
Second, the vast, vast majority of VMware deployments today are running on SAN or NFS. Further, in just about every shop that i know of (and I do know of quite a few) this has meant moving to small form factor servers like blades - form factors that don’t allow for much local storage. In the “myths” section of this very same whitepaper, VMware actually recommends to run Hadoop data nodes against _local storage_. But hardly anyone deploys VMware Infrastructure this way. So now what, I need to buy a specific type of hardware in order to run my virtual Hadoop data nodes, thereby entirely eliminating the ability to run my nodes anywhere in this “big pool of common, virtualised compute”?
If i have to buy specific hardware for something that I know will max out the resources available, I’ll just run it physical thanks.
Easy maintenance and movement of environment
When you want to move something Hadoop related from staging to production, you are moving at most 2 things - jobs and a dataset. There is absolutely no need to pick up an entire OS + Hadoop install, let alone an entire cluster, and move it anywhere. For crappy legacy applications that are tightly coupled to the OS or have un-automatable installation processes, I can understand how this capability would be of benefit. But Hadoop is neither tightly coupled to the OS nor un-automatable - it’s quite the opposite.
I could go through the other points in that whitepaper, but I’m going to wrap it up there. In closing, I’m going to point out that nowhere have I talked about performance overhead (the memory overhead i mentioned is not related to performance - it’s just a virtualisation overhead). That’s because in my experience, the performance overhead of a modern hypervisor on modern hardware is negligible in the absolute majority of cases. And I believe VMware when they say that Hadoop is no exception to this.
What I really take exception to is this mantra of “just because you can virtualise something, you should.” Like anything, there are good use cases for virtualisation and not so good ones. There are even a few bad ones.
Hadoop is optimised for scale out on commodity hardware, and for the data and compute to be located as closely as possible. It will chew up every resource you throw at it, unless you do something really suboptimal with your jobs. Given these things, I don’t think virtualisation is a good idea at all for production Hadoop deployments. Virtualisation is fucking awesome for dev / functional test though, and to that end Serengeti looks very cool for VMware shops.