A recent outage at the hosting company OVH meant that many of their clients were unable to access their servers for more than 12 hours – which is a long time in the fast-paced business world, and almost an eternity for e-commerce companies. With 50,000 websites down due to the failure – caused by a leak in the water cooling system and compounded by a fault in the alerting system – many will be asking if there’s a way to avoid such problems in the future.
The short answer is that there is no way to avoid it – no system is ever entirely fool-proof. A large and complex cloud environment as found at OVH can be harder to fix and restart if anything does happen. As OVH found, the restart sequence can be halted by the detection of additional faults which slow down the process.
Does this mean that bigger isn’t better when it comes to cloud? Not entirely. Certainly, the complexity of a larger provider means it can take longer to restart your system. But if you look at the biggest providers, such as AWS from Amazon, or Microsoft, they have deep enough pockets to provide a lot of redundancy in their systems, and a very broad geographical distribution. That means if there is a fault, from a problem with the hardware to flooding or power outages, the workload can be distributed across the rest of the wide geographical network.
So more redundancy means less down time and more reliability? Again, not entirely. A recent incident saw many google docs users locked out of their work due to what was eventually identified as a bug in the system. In looking to update the service to protect users from malware and malicious content, google inadvertently accused some of their users of creating such content and locking them out of their work. For many who had got used to the convenience of such cloud-based tools it was a reminder that while the cloud giants may provide reliability and easy access, they hold all the cards. This shut down was accidental, but it demonstrated how users are essentially at the mercy of google.
So what’s the answer? Too small and there’s not enough redundancy, too large and as well as the complexity of restarting, it may leave you with less control. The best approach, as with so many things in business, is to keep it simple. You can store your own hardware in our data centre, which means it has the benefit of our expert management in a specially-designed environment, but you have complete control. You can access your kit whenever you need to, 24/7 and if anything does go wrong, you’re entirely independent. That means if anything does happen, your server will reboot itself and be back online in minutes, because it’s not linked to anything else.
What we’re talking about is like the difference between starting a car and launching a rocket into space – the more complex the environment, the longer it will take. Complex environments work in sequence, which means there could be multiple points of failure that halt the restarting process – and that’s if you can get in to initiate a restart. OVH users found themselves locked out of the control panel because so many users were trying to get in at the same time. In contrast, a local datacentre can offer a team on hand to help, and access to get in and fix any issues yourself, in person if you prefer.
Whether it’s disaster recovery provision, redundancy and backup generators, we spend a lot of time trying to prevent issues, but we understand that even with the best possible protection, an outage due to a hardware fault could still happen. In those cases, the important thing is to get back up and running as quickly as possible and keeping it simple and collocating your servers in a data centre could be the best way to deliver that.