By Ivan Pepelnjak
Tuesday, October 06, 2015
During my recent SDN workshop one of the attendees asked me “How do you build carrier-grade (5 nines) cloud infrastructure [with VMware NSX]?”
Before delving into details (aka disclaimer)
This is not an NSX-related blog post. It just happened that the attendee tried to accomplish the Mission Impossible with NSX. He could have chosen Juniper Contrail or Nuage VSP or anything else while facing the same pointless task.
I’ve encountered two compute infrastructure products that were probably close to what people call carrier-grade
in my days – IBM mainframes and Tandem minicomputers. Both were incredibly complex and expensive, and ran short user-written transactions on top of fully redundant software and hardware infrastructure.
It’s impossible to reproduce the same feat in an Infrastructure-as-a-Service cloud environment because the workload isn’t composed of short ACID transactions but of servers of unknown quality. You might
be able to build a cloud infrastructure with 5-nine reliability
, but it would be a totally wasted effort if the workload running on top of it crashes (or is brought down for patching). See also High Availability Fallacies
for more details.
The only way to build a solution with more than 99.9% availability is (according to James Hamilton
) to build an application-layer solution running in multiple availability zones, and once you do that, you don’t care that much about the availability of individual zones as long as it’s reasonably high.
Building Carrier-Grade Infrastructure
Twenty-five years ago we had simple routers and switches, and we knew how to build resilient networks with redundant boxes and routing protocols. Then the traditional service providers learned how to spell IP and wanted to implement their existing operational practices in this brave new world… prompting the networking vendors to build increasingly complex infrastructure products like redundant supervisors, non-stop forwarding, and in-service software upgrade
Guess what – complex products tend to be expensive to build and operate. The carriers complaining about high cost of the networking gear and lustfully looking at what Google, Facebook, Amazon and Azure are doing should stop yammering and admit that they got what they asked for.
Obviously some people never learn, and now that the carriers turn their attention toward the new fad – Network Function Virtualization – they want to repeat the same mistake, and want cloud architects to build carrier-grade infrastructure on which they’ll run unreliable workloads.
The Way Forward
: doing the same thing over and over again and expecting different results.
Definitely not Einstein
The more I look at what various organizations are doing (and succeeding or failing along the way), the more I’m convinced that there’s only way to reduce the overall costs of running your IT infrastructure:
- Set realistic goals based on actual business needs;
- Build good enough infrastructure that is easy to operate at reasonable costs;
- Build the few applications that actually need very high availability (not everything needs five nines) using modern design-for-failure architectural principles. See also Cloud Native Applications for Dummies.
Numerous large-scale companies have proven that this approach works, but of course it requires a major change in the way your company develops and deploy applications.
You could also decide to ignore this trend and continue building ever more complex
infrastructure, and get the results you deserve