Understanding the Implications of Amazon's EC2 Crash

Editor’s Note: Datapipe is a global provider of mission critical IT solutions. The company’s offerings include Datapipe Managed Cloud for Amazon Web Services. This service helps clients manage their AWS environments.

The effects of the recent outages at Amazon Web Services (AWS) have reverberated through the cloud community and the IT community at large. For those that experienced a major disruption, there is much soul searching going on

For those that have been slow to adopt cloud technology, it was a perfect “I told you so!” moment. However, the reality is that similar to any IT strategy, being prepared is usually the best course of action and can help avoid headaches and sleepless nights

You did sign a contract …

Well, at least you digitally signed, or agreed to one. That contract typically has a service-level agreement, or SLA. People tend not to want to read the fine print, but if you suffered a major disruption because of Amazon’s outage two weeks ago and you are looking for someone to blame, hold up that SLA and look in the mirror

Many cloud services SLA’s cover only a portion of the services you consume and are oriented toward the provider. This leaves you with few remedies — like service credits instead of refunds, and potentially large outage windows that do not generate credits

What to do?

Read that SLA carefully and design your infrastructure to it . In the case of AWS, there are multiple zones and regions of redundancy available, all with a general availability SLA of 99.95%. If you need 100% effective uptime, you should ensure resources are provisioned in multiple zones and regions; perhaps employing off-cloud backup plans to safeguard mission critical data, as well

We’ve all heard the stories of companies with their life-blood (whether it be their killer Web app, patient information system, or simply their brand website) being run on cloud infrastructure with no respect for the SLA or sound architectural concepts

After you’ve fully understood the SLA, and the variety of service offerings available to you, you can develop a sound deployment strategy that will minimize the impact of almost any outage scenario.

Test early, and often

If you were setting up a brand new datacenter with racks of servers and the latest networking, power, and cooling gear, would you test it out? Would you turn off that switch, pull that plug, or raise the alarm to see what the reaction of your assets and staff was, with the goal of 100% service availability?

Of course you would

The buzzword nature of “cloud” and the mixture of different service classes (infrastructure, platform, software, private, public, hybrid, etc.) has confused the IT buying community into thinking that “cloud = easy.” Running a multi-million dollar startup or an enterprise software network is not the same as cutting and pasting junior’s face into the family photo album, folks. If you are using infrastructure as a service (IaaS), treat it that way. Terminate instances, simulate outages, destroy volumes and kill security groups. Is your application still available? Win. Did it fail unexpectedly or in way you can’t recover? Start over. You still aren’t safe from Murphy’s Law

An easy test to run on AWS is to examine how your computing load is spread within any given region. If you run your command line, or load up your AWS dashboard, and see everything in “us-east-1a,” you are prone to failure. So how do you correct it? Look to the guy or gal with the glasses and the pocket protector — just like you used to do.

Cloudpocalypse

I know reading SLA’s, designing for failure, and creating test batteries sounds something like, dare I say it, engineering, but one of the many positive things that has come out of recent cloud outages is the realization that sound engineering cannot be replaced by hype. Yes, cloud technology is revolutionary. Yes, you can do so much more, with less upfront investment and near infinite scalability. Do you get all this for the low, low price of $50 per year or $0.12 per hour? Not likely

Engineering excellence is required to design sound solutions to real problems, using what is available to produce something of higher value than its constituent parts. In the interest of full disclosure, a lot of what we do at Datapipe is exactly that. We take technology — hardware, software, utility — and design sound solutions for our customers

You can do this in your own shop as well if you have the right mix of staff, time, and training. But to move forward with your IT strategy without consideration for engineering is to leave yourself vulnerable to failure. Not doing this may be cheaper initially, but there is a cost to that approach, but as IT departments shrink I think recent events highlight the need to use proper engineering as a competitive differentiator vs. simple IT cost control

The approach I describe above can at the very least help start the conversation with your customers, business units, staff, and vendors. The AWS outage, among many outages of major technologies we interact with every day, reveals the need to consider your own expectations and take ownership of your decisions when embarking on new projects and strategies

The great news is that this event happened at the right time; enough time for people that were considering cloud technology to take a more informed approach, but not too early to stunt its growth. AWS responded with a thoughtful, open, and honest post-mortem in addition to providing more service credits than they were contractually obligated. I see a great opportunity here for the Amazon platform, and all cloud platforms, to evolve — driven by informed consumers prepared with a clear understanding of the strength of the cloud

Ed Laczynski is VP of Cloud Strategy & Architecture and it responsible for driving Datapipe’s cloud computing strategy and for the architecture of new cloud platforms and integrations. Prior to Datapipe, Ed was the founder and CTO of LTech, a pioneer in enterprise cloud computing products and services. At LTech, Ed led the business and product development vision, achieving significant sales and customer growth. A frequent speaker and author, Ed is recognized throughout the industry for his thought leadership, vision and experience helping organizations succeed with cloud computing. Ed has also led innovation in digital media and finance in roles at McCann-Erickson Worldwide and Credit Suisse. Ed is a graduate of New York University.