Complementary infrastructure must match hardware and software investments in order to meet high availability goals.
Information technology (IT) clients expect availability of “five nines,” or 99.999%. Unfortunately, the substantial investment that a business makes to achieve five-nines in its computer hardware and software platforms is unlikely to be sufficient unless matched with a complementary site infrastructure that can support these availability goals. The overall site tier rating is dependent upon all aspects of the site infrastructure, and will be the lowest of the individual subsystem ratings covering such aspects as power, cooling, and distribution.
It is important to be aware that sustainability—how the site is operated once constructed—also plays a significant role in what site availability is actually achieved. All too often people wrongly assume that installing an uninterruptible power supply (UPS) is the end of their problems. However, if the overall design, installation, and ongoing service support are handled badly, it could be just the beginning of problems. For example, it is vital to ensure that mean time to repair (MTTR) the system is kept to a minimum if the highest overall availability is to be achieved. Nowhere is this more important than with the design of data centers.
Each industry has a unique uptime need driving the site infrastructure tier level requirement (see sidebar). After careful alignment of IT availability objectives with site infrastructure performance expectations, an informed company may select a site infrastructure based on any of the tier classifications. Data center owners have the responsibility to determine what tier of functionality is appropriate or required for their sites. As such, it is a business decision to determine the tier classification necessary to support site availability objectives. Part of this decision is to balance the IT operational practices with the facility practices that support the IT world. Once selected, however, the desired tier should be uniformly implemented.
Benchmark tier standards
The Uptime Institute (TUI; www.uptimeinstitute.org) has, for more than 10 years, sponsored research and practical studies into data center design, operation, and resultant resilience. The Institute has developed a tier classification to describe and differentiate facilities from an availability standpoint.
A more-recent addition to TUI is a data center standard, ANSI/TIA-942-2005 Telecommunications Infrastructure Standard for Data Centers, issued by the Telecommunications Industry Association (TIA; www.tiaonline.org). This standard follows TUI’s Tier I through Tier IV format and draws heavily on TUI publications, but extends the detail, especially in connectivity. One point worth noting is that TIA-942 is specifically written for telecommunications-related data center environments with a power density less than 2,700 W/m2.
Maintenance and fault tolerance are the key to the tiers, and progressive levels of redundancy and resilience are required for each successive tier. Table 1 shows the progressive redundancy and resilience requirements, as well as how they might be achieved. It also refers to each of 16 key systems TUI has identified as critical to the operation of a specific data center. For a facility to achieve a tier classification, it must achieve the benchmark in all 16 criteria. Critical power is just one of those 16 criteria.
Achieving a high-percentage availability is simple: Achieve a long mean time between failure (MTBF) and a very short MTTR. The calculation is MTBF/(MTBF+MTTR)x100%.
TUI has assigned a target availability (A%) to each of the tiers and, sensibly, recommends measuring the downtime (MDT) over at least a five-year period.
It will be immediately apparent to the reader that to achieve a defined overall site availability, each of the 16 subsystems must achieve much higher performance (e.g. A% raised to the power of 16). For the ultimate Tier IV site, this means that every subsystem (e.g. power at the load terminals) has to achieve 99.9994%—the magic five nines.
Clearly, a wide range of “answers” can be generated by combinations of MTBF and MTTR, but the reality is that only an emergency service backup that can minimize travel time to site, have parts availability, and excel in first-time fix rate will achieve the sort of MTTRs needed to push the availability to the required level.
It is easy to demonstrate that both Tier III and Tier IV cannot tolerate travel times of even four hours to site if they are to achieve the desired availability performance, even with MTBFs in the 200,000- to 400,000-hour range.
This conclusion raises the need for 24x7 remote monitoring, diagnostics, and tele-assisted service via data-connectivity, either copper/modem or Internet enabled.
The cost of high-tier power
Comparing systems is rather complicated when taking into account the type of load (single-corded or dual-corded) and whether or not static transfer switches (STS) are deployed in the power system. STSs enable rapid transfer from one emergency backup power system to another and maximize the uptime of a critical power system. In the strictest definition, Tier IV is only intended for dual-cord loads without STSs, while they are essential in Tier III to transfer load maintenance routines. However, at the most fundamental level we can take Tier I as the base cost and MTBF (=1) and make the comparisons outlined in Table 3.
24x7 remote diagnostics, tele-maintenace, parts access, and sub-four-hour emergency repair performance achievement are essential to meet the Tier III and Tier IV availability targets. The first-time fix rate will dictate the site availability.
The most resilient power architecture possible, by more than 50x, is Tier IV for dual-corded loads without STSs. The drawbacks to this architecture are the high capital expense (a 50% premium over Tier III), higher operating expense with partial load inefficiencies, and underutilized plant that can be regarded as resource waste.
If the client needs a specific classification, such as Tier IV for a given business case, then there is little choice but to follow TUI and/or TIA-942. It can be shown that other schemes, such as non-Tier, all involving STSs and higher plant utilization, can offer 8 to 10 times better than Tier III performance at a 10 to 20% cost premium.
An argument for having 2xTier II instead of 1xTier IV, plus a disaster recovery site, is proven on power system cost grounds as long as the IT strategy supports parallel computing.
Paul Haake is vice president of engineering for Chloride North America (www.chloridepower.com). He is responsible for all aspects of design and engineering of uninterruptible power supplies, power-conditioning, and communication-line protection devices.
Which Tier classification is right for your organization?Tier I is appropriate for firms such as:
- Small businesses in which IT primarily enhances internal business process
- Companies whose principal use of a “Web presence” is as a passive marketing tool
- Internet-based startup companies without quality-of-service commitments
Tier II is appropriate for firms such as:
- Small businesses whose IT requirements are mostly limited to traditional normal business hours, allowing system shutdown during off hours
- Commercial research-and-development firms, such as software, that do not typically have online or real-time service delivery obligations
- Internet-based companies without serious financial penalties for quality-of-service commitments
Tier III is appropriate for firms such as:
- Companies that support internal and external clients 24x7 such as service centers and help desks, but can schedule short periods when limited service is acceptable
- Businesses whose IT resources support automated business processes, so client impacts of system shutdowns are manageable
- Companies spanning multiple time zones with clients and employees spanning regional areas
Tier IV is appropriate for firms such as:
- Companies with an international market presence delivering 24x365 services in a highly competitive client-facing market space
- Businesses based on e-commerce, market transactions, or financial-settlement processes
- Large, global companies spanning multiple time zones where client access to applications and employee exploitation of IT is a competitive advantage—PH