The Cloud has Crashed but ...

At some point cloud services will be as reliable as the electricity in our homes and offices. Now, I am not an old man by any stretch, but I remember as a child in the UK, expecting to have power cuts in the winter. We took candles and matches or flashlights to bed. Wood was stockpiled to burn to heat the house. Now, that is unheard of.

Until that day arrives for cloud services, it is interesting to reflect on the response to Amazon’s EC2 cloud server crash in April. Just when cloud computing was being touted as the only way that anyone will ever access data and applications, the outage seems to have knocked confidence. Or more correctly, it has helped people to look beyond the hype of the cloud vendors and bloggers.

As a Technorati blogger said “This latest hiccup further highlights the complexities and potential pitfalls of the rush to cloud computing … this kind of outage may give many would-be cloud adopters a reason to think twice before putting all of their eggs in one cloudy basket.”

But what does the outage mean to the CIO when devising long term strategy, to the business manager and to the leadership team?

First some background on how the cloud has changed the world of the independent software vendor (ISV). The cloud makes it easier to build and provide a global service. Providers like Amazon, Microsoft Azure, Rackspace and others mean that expensive servers do not need to be bought, configured and maintained. The compute power can be purchased on demand for both development and for production. This is brilliant for the cash strapped start-up: pay more and you get more resilience.

There is often more than one level of service that can be bought. Naturally no vendor buys the correct level of service that they really should, because they would rather divert funds into sales and marketing rather than resilience, which is in effect insurance. And insurance is all about probability and risk.

What does this mean in terms of the Amazon cloud failure?

The outage didn’t hurt everyone. It was restricted to the east coast of the U.S. and about 250 companies. Nevertheless, there were some non-strategic but high profile apps that went down including FourSquare, Reddit and Hootsuite. If it had been a retailer’s point of sale systems, a credit card fraud detection system, a help desk case management system or a project team collaboration site, then it would have been more critical.

Strategy considerations

So one outage shouldn’t have CIOs tearing up their cloud-based IT strategies. In many cases, a mature cloud vendor’s infrastructure is more resilient that many companies own data centers. However, there are many more “components” between the end user and the cloud app than compared to the corporate data center hard wired to the company LAN. These “components” are probably all provided by different organizations — many of whom are never visible; hidden behind the agreement with the cloud application provider.

This means when CIOs are cloud planning, they need to evaluate the risks and the cost of mitigating those risks when implementing a cloud service and balance them against the opportunities the cloud offers. A knee-jerk reaction against the cloud would be as bad as ignoring the risks imposed by a business depending on cloud based applications and data.

Granted, it is far more difficult to evaluate the risks of a cloud vendor. At a minimum there are more questions that need to be asked and their answers need to be closely questioned and evaluated. Which is why I co-authored a book, Thinking of… Buying a Cloud Solution? Ask the Smart Questions, which lists over 90 pages of questions that need to be answered. It is not a fun read but it might save your career.