You’re a CEO of HotDotCom.com. It’s three in the morning and your VP of Sales for Asia texts to say that the site is loading noticeably more slowly than a competitor’s site. So you get on the Batphone and call your VP of Ops, explain that your VP is trying to close a deal but the site looks like garbage because it loads so slowly compared to the site of HotterDotCom.com.
So why is the site loading so slowly? Your bleary ops guy putters around in the control panel, does a little bit of analytics and says, “Boss, I have no idea. Everything seems to be running well in our application stack. It must be old hardware at the data centre.” But we’re in the cloud, you say. Hardware doesn’t matter anymore, right? Well, not really.
In fact, hardware matters more than ever. Last week at the Surge Conference, I talked to dozens of VPs of Operations and CIOs of major companies. To a man (and woman) they complained that their hardware was killing them. Which is ironic because most of them had long since outsourced their hardware to cloud providers and didn’t own much hardware at all. But set foot in almost any data center and you’ll be confronted by rack after rack of three, four, and even five-year-old servers chugging along in the slow lane.
So why has this become such an issue now? Let’s count the ways. First, many consumers of cloud computing have made the leap from using the cloud as a test environment or a place where high latency (read: low priority) production tasks are computed to using the cloud as their soup to nuts production environment. This has exposed them more acutely to the vagaries and vulnerabilities of the cloud (which is why cloud outages that have always been relatively common now assume crisis proportions in minutes today).
Second, the cloud industry is relatively new. So new that I would wager many cloud services providers are still feeling out what is an acceptable hardware refresh cycle that balances their need to keep capital expenditures down while maintaining sufficient performance for customers.
This is a steep learning curve because the switch from test to production use cases of the cloud exposes end users on cloud apps today to a far more volatile environment that is inherent in wireless data networks. In the world of wireless, page load times experienced by end users can vary widely and it’s not uncommon for those load times to eclipse 10 seconds – a point at which roughly 50% of visitors will abandon a page.
At the very core of this conflict over hardware between users and cloud providers is simple economics. Cloud providers maximise profits by minimising hardware costs (i.e., the cost of a bit). Customers maximise profits through minimising their bit/second service delivery cost. Yes, it’s more complicated, but that’s the core economics. So there will always be tension between what customers want (the latest screaming Intel server chipset) and what cloud providers want (very low capex).
For cloud customers, however, it now behooves you to ask really hard questions about the hardware that underpins your cloud environment. Pushing hard to get specific model numbers and chipsets of the servers that you will be running on is optimal. Because cloud providers move your data around, this isn’t always 100% possible (since you are no longer tethered to a specific physical server).
But getting a general inventory of what vintage servers are in the racks at a data center is a pretty good idea. Then, make sure to ask about refresh cycles for the servers. If you really want to know, go to the data centre and see for yourself. At the end of the day, whether the server running your cloud is one- or three-years-old may be the difference between a fast-loading and a slow-loading page or cloud app – and the difference between a customer win and a Batphone call in the middle of the night asking why, why, why.