“Fiduciary responsibility”. “Due diligence”. Such are the key watchwords for businesses everywhere. Indeed, these simple phrases are in fact the core concepts underlying regulations that cover stockholder rights, contracts for merger and acquisition, and many other areas of business conduct.
With the ever-increasing reliance of businesses upon their IT systems and electronically stored business data comes an equivalent increase in management’s duty to ensure due diligence and fiduciary responsibility with respect to protecting them against all causes of loss or damage.
The potential costs of failing to do so can be enormous. Organisations of all sizes need to address the assessment of threats to their IT operations, inclusive of systems, applications, and data, and develop solid numbers around the potential costs they represent.
Hidden Financial Risk
According to Dunn & Bradstreet, 59% of Fortune 500 companies experience a minimum of 1.6 hours of downtime per week. To put this in perspective, assume that an average Fortune 500 company has 10,000 employees who are paid an average of $56 per hour, including benefits ($40 per hour salary + $16 per hour in benefits). Just the labour component of downtime costs for such a company would be $896,000 weekly, which translates into more than $46 million per year.
Of course, this assumes that everyone in the company would be forced to stop all work in a downtime scenario, and that may not be so. But, since the operations of many companies are increasingly knit together by their information technology, system downtime now hampers the productivity of almost everyone in the organisation, and completely sidelines a significant and growing percentage of them.
While some insurance providers offer coverage to reimburse companies for sales revenue lost during unplanned server outages, typically, these policies do not cover any other expense besides lost sales. Facilities Management operations often hold these types of policies to lessen their exposure.
Insurers train underwriters to be experts at assessing risk and extrapolating potential losses. IT managers, on the other hand, typically do not have the tools and experience needed to assess the real risks involved. If your organisation stands to lose money and goodwill if a core component of its information management system fails, or if it is difficult to find a window of time to bring the system down for upgrades or modifications, then the right software tools will help you understand and even quantify your costs attributable to the time that a critical IT system is offline.
With users of these systems having more tools and information available to them, overall demand for CPU ticks is increasing. Without time to spare, users tend to take uninterrupted access to these systems for granted. But periodic interruptions in availability are a fact of life, and therefore, the need to have systems continuously available is colliding with other business objectives that call for the conservancy of assets.
To be clear, the great majority of system and data unavailability is the result of planned downtime that occurs due to required maintenance. But although unplanned downtime accounts for only about 10% of all downtime, its unexpected nature means that any single downtime incident may be more damaging to the enterprise, physically and financially, than many occurrences of planned downtime. Understanding your cost of downtime is therefore critical in either case.
Downtime Threat Analysis
Before you can calculate downtime costs, you need to know its sources. And not all of them are strict IT issues. To begin with, it is important that you identify and understand both your internal and external downtime threats. What has the potential to take your business down? The threats to your business could include natural events as well as man-made events, “weather and wires.”
Spend time thinking about what could actually happen and plan accordingly. There could be accidental as well as planned events that could cause or contribute to systems and business downtime. Some events may be within your control while others are not. Some events, like hurricanes, will give you ample warning; some events, like a server power supply burnout or RAID controller crash, may happen quickly and give you very little time to react.
Sadly, you’ll also need to consider extreme external events, including terrorism, or regional disasters, such as wide-spread power failures or the collapse of a key bridge in a metro area. Such events can impact employee availability and safety, power and data line availability, etc.
Once you catalogue events and conditions that could affect you, be sure to set up processes for real-time monitoring and information gathering for external threats. This can be as simple as signing up for e-mails or alerts from local weather stations so that you are made aware of impending weather events. With some types of events, you will need to be able to determine with certainty the likelihood of the event as well as to consider the potential severity, in order to properly plan for response.
Also, be sure to consider and plan for “what happens next,” in the days and weeks following an event. For example, if you must move locations when a disaster occurs, be sure to plan for how to establish and maintain proper security for users or for devices attaching to a new server while in a temporary environment.
IT outages, planned or unplanned, can unleash a procession of costs and consequences that are direct and indirect, tangible and intangible, short term and long term, immediate and far reaching. These costs include tangible/direct costs such as lost transaction revenue, lost wages, lost inventory, remedial labour costs, marketing costs, bank fees and legal penalties from not delivering on service level agreements, and intangible/indirect costs including lost business opportunities, loss of employees and/or employee morale, decrease in stock value, loss of customer/partner goodwill, brand damage, driving business to competitors or even bad publicity/press.
The cost that may be assignable to each hour of downtime varies widely depending upon the nature of your business, the size of your company, and the criticality of your IT systems to primary revenue generating processes. For instance, a global financial services firm may lose millions of dollars for every hour of downtime, whereas a small manufacturer that uses IT as an administrative tool would lose only a margin of productivity.
The Opportunity Cost of Downtime
On average, businesses lose between $84,000 and $108,000 (US) for every hour of IT system downtime, according to estimates from studies and surveys performed by IT industry analyst firms. In addition, financial services, telecommunications, manufacturing and energy lead the list of industries with a high rate of revenue loss during IT downtime.
Downtime costs vary not only by industry, but by the scale of business operations. For a medium-sized business, the exact hourly cost may be lower, but the impact on the business may be proportionally much larger. While idled labour and lower productivity costs may seem to be the most substantial cost of downtime, any true cost of downtime estimate should include the value of the opportunities that were lost when the applications were not available.
For example, consider a company that averages a gross profit margin of $100,000 per hour from Web and telemarketing sales. If its order-processing systems crash for an hour, making it impossible to take orders, what is the cost of the outage? The easy, but erroneous, answer would be $100,000. Some customers will be persistent and call or click back at another time. Sales to them are not lost; cash flow is simply delayed. However, some prospects and customers will give up and go to a competitor.
Still, the value of the purchases that these customers would have made during the outage likely underestimates the loss to the company because a satisfied customer can become a loyal customer. Dissatisfied customers, or prospects that never become customers, do not. Consider a prospect who would have made an immediate $100 purchase and then repeated that purchase once a year. Using a standard discounted cash-flow rate of 15 percent, the present value of those purchases over a 20-year period is $719.82. In this example, the company’s loss is more than seven times the value of the first lost sale. (A lower discount rate produces an even higher value.)
Downtime and Business Impact Analysis
A business impact analysis is a good framework within which to understand and calculate downtime costs. The central task is to identify your critical business functions, based upon data or application integrity and the sensitivity of each to downtime. You will want to determine the maximum outage time that each specific critical business function can sustain before the business is impacted. Considering the impacts of both long and short-term outages will help you determine what the recovery objective should be for each business function.
Once you have determined where your downtime vulnerabilities are, you will be better able to identify the costs associated with that downtime as well as its overall impact to the business. With that knowledge in hand, you will be better able to define the ROI of various solutions or tactics needed to reduce the costs incurred during business function outages or, preferably, to avoid them altogether.
Determining the Cost of Downtime
One way to project the number of hours that a system may be down unexpectedly each year is to estimate the system’s reliability. This does not equate to the reliability numbers provided by hardware vendors because a system depends on a combination of hardware, software and networking components.
While unplanned downtime may be significant, often more than 90 percent of downtime is planned due to system backups, maintenance, upgrades, etc. Estimates of yearly planned downtime are usually more accurate than estimates of the unplanned variety as maintenance activities typically either follow rigid schedules or their frequencies are, on an annual basis, reasonably predictable.
The first step in deriving an estimate of planned downtime is to perform a rigorous audit of all normal maintenance activities, such as database backups and reorganisations. For each such activity, multiply the historical average downtime per occurrence, adjusted for any growth trends, by the number of times the activity is performed per year. The timing of other planned activities, such as hardware and software updates, is less consistent, but historical averages provide a sufficient guide as to frequency and duration of the required downtime. These averages can be adjusted to incorporate any knowledge of upcoming upgrade requirements.
While it is impossible to predict the precise loss from an outage, it is important to derive reasonable estimates. Only then is it possible to evaluate the economically appropriate level of investment in data recovery or information availability software solutions. Losses in the areas of labour, revenue and service all contribute to the total cost of downtime.
A good starting point for evaluating these factors is to collect statistics on both the duration and associated costs of past downtime as recorded by the accounting department. These include all of the tangible and intangible factors outlined at the beginning of this section and more.
Damaged Reputation and Loyalty
The sales-per-hour number does not include the value of customer loyalty. To more accurately assess total lost sales, the impact percentage must be increased to reflect the lifetime value of customers who permanently defect to a competitor. If a large percentage of customers typically become very loyal after a satisfactory buying experience, the impact factor may significantly exceed 100 percent, possibly by a high multiple. Since determining lifetime value requires a long history of data and assumes, often inaccurately, that the future will reflect the past, an educated guess must suffice.
Establishing the true impact of downtime requires going beyond the IT team and into every operational area of the business. The people “on the ground” are experts in the business pain that results in their area of responsibility if systems are unavailable. Consider your audience. Every Manager, VP and CEO has a boss to answer to. But ultimately, it is your customers who are demanding guaranteed availability. Keeping your end-customer’s viewpoint in mind will help you frame the case for HA in your business.