Can a company truly offer 100% network uptime?
The short answer is no. You’re probably thinking, “why not? Some companies guarantee 100% network uptime.” Before answering that question, let’s take a look at a few cloud outages that made headlines in the past year.
“Our best estimate is that it affected roughly 5,000 projects, 5,000 comments, and 700 new user accounts. Code repositories or wikis hosted on GitLab.com were unavailable during the outage, but were not affected by the data loss.”
“Customers of IBM’s global cloud infrastructure services could not access the user interface portal [due to] emergency maintenance.”
“Fortunately for users, the outage didn’t result in any data loss–just a delay in being able to view and read the content in their archives.”
“The issue appeared to affect users all over the globe, with thousands of users reporting problems according to the Down Detector website, which logs reports of online service failures.”
What does this mean?
Most people consider Microsoft and Google to be the big players in the cloud industry, and competition and service levels are fiercely coveted to ensure their uptime is the best. Even among other businesses, everyone is aiming to reduce their downtime.
But what do these recent outages tell us? No matter how much planning there is, or how many capable minds are working on building these systems, you have to consider that for regular consumers who use services like Gmail, Hotmail, Yahoo Mail, or other free cloud-based services–it’s free. You get what you pay for. However, these services are integral to these companies’ brands and represent products that maintain a high stickiness factor with end-users.
For example, how much do you rely on your email every day? If it went down, chances are it would be an inconvenience, but one you could work around for a couple of days.
But what happens when you are a business that relies on paid products that utilize the cloud? Even further, what happens when you are a billion-dollar business running on the cloud?
What are companies doing?
Companies continue to show that cloud services routinely offer availability above 99%, which is higher than that usually provided by in-house IT departments. For companies where a percentage point can make a difference of a million dollars, this is significant. Cloud providers today are continually working to eliminate all single points of failure and prevent outages, but there still needs to be a backup plan.
As a web hosting company, we are continually looking for redundancy across our entire service offering–redundant switches, redundant power, redundant bandwidth, redundant network cards, redundant servers–you get the idea. We are continually working to minimize single points of failover, particularly for our server and SaaS customers.
So, what’s a good SLA (Service Level Agreement)? An obvious answer is a number that matches your business requirements and does not negatively impact your business. There is a reason we offer our web hosting customers different service levels. For some organizations, 99.9% is more than satisfactory. For other clients with higher requirements, they’ve asked for 99.999%+ uptime and know that any type of outages will have negative consequences that surpass the cost of hosting in that type of environment.
It is important to understand that with each nine you add to your service level, you are significantly increasing your costs to minimize outages for your customers. Most companies have shown that five nines are the tipping point for increased costs. At that point, you are adding new layers of redundancy to your service offering.
As an example, in building our VMware cloud hosting service, we made the business decision to target five nines as the desired service level for our customers and build our solution with enterprise grade hardware from Dell and the leading virtualization platform from VMware.
We built redundant fault-tolerant server clusters that give our customers better availability and failover in the event of an outage, as well as decreased downtime and smaller time to recovery. However, in doing so we had to invest more than three times as much money into the service offering to guarantee that amount of uptime. Consider this; 99.99999% uptime is equivalent to five minutes of unscheduled downtime a year.
To give you an idea, there are 525,600 minutes in a year and 4 minutes a year represents .00000076 of that total.
So, is the cloud reliable and safe? Yes. Is it more reliable than classic hosting? In a lot of cases, yes. Is 100% network uptime truly possible? Not really. Similar to what we always tell our customers, you need to be prepared. Having a plan B always makes sense, and no matter how much preparation goes in, you can never cover all of the contingencies.