Is AWS the patient that needs constant medication?
I was just reading High Scalability about Why Swiftype moved off Amazon EC2 to Softlayer and saw great wins!
We’ve all heard by now how awesome the cloud is. Spinup infrastructure instantly. Just add water! No up front costs! Autoscale to meet seasonal application demands!
But less well known or even understood by most engineering teams are the seasonal weather patterns of the cloud environment itself!
Sure there are firms like Netflix, who have turned the fickle cloud into one of virtues & reliability. But most of the firms I work with everyday, have moved to Amazon as though it’s regular bare-metal. And encountered some real problems in the process.
1. Everyday hardware outages
Many of the firms I’ve seen hosted on AWS don’t realize the servers fail so often. Amazon actually choosing cheap commodity components as a cost-savings measure. The assumption is, resilience should be built into your infrastructure using devops practices & automation tools like Chef & Puppet. The sad reality is most firms provision the usual way, through the dashboard, with no safety net.
We’ve by now heard of firms such as Netflix building their Chaos Monkey to actively knock out servers, in effort to test their ability to self-healing infrastructure.
From what I’m seeing at startups, most have a bit of devops in place, a bit of automation, such as autoscaling around the webservers. But little in terms of cross-region deployments. What’s more their database tier is protected only by multi-az or just a read-replica or two. These are fine for what they are, but will require real intervention when (not if) the server fails.
I recommend building a browse-only mode for your application, to eliminate downtime in these cases.
4. Provisioning isn’t your only problem
But the cloud gives me instant infrastructure. I can spinup servers & configure components through an API! Yes this is a major benefit of the cloud, compared to 1-2 hours in traditional environments like Softlayer or Rackspace. But you can also compare that with an outage every couple of years! Amazon’s hardware may fail a couple times a hear, more if you’re unlucky.
Meanwhile you’re going to deal with season weather problems *INSIDE* your datacenter. Think of these as swarms of customers invading your servers, like a DDOS attack, but self-inflicted. Amazon is like a weak immune system attacking itself all the time, requiring constant medication to keep the host alive!