Disaster Recovery Metrics

Archive for the ‘Business Continuity’ Category

Disaster Recovery Metrics

Posted by

About the only thing an engineer enjoys more than building technology is competing with the technology.

Consider. The first car race was in 1895 in Chicago. In 1904, Henry Ford himself set the speed record by racing down Lake St. Clair near my home town. That’s right. Years before the Ford Model T hit the market in 1908, when just getting a car to move was an accomplishment, people were already racing their technology.

In the same vein, building a disaster recovery strategy is good fun. Racing the strategy, tuning and tweaking it, optimizing, it, now that is even better. I therefore offer up the following metrics, DR speedometers if you will, for racing your technology.

Disaster Recovery Metrics

Recovery Time Objective (RTO). How soon after an event occurs can you recover operations? Typically measured in hours or days, RTO is the time it takes to resume systems, applications, and business operations. RTO is going from the production facility to the disaster recovery facility.

Recovery Point Objective (RPO). How close to when an event occurs can you recover data? Typically measured in hours or days, RPO is the time between the event and the last backup or copy of your business data.

Return to normal operations (RNO). How soon after an event clears can you resume in your production facilities? Typically measured in days or weeks, RNO is the time it takes to go from the disaster recovery facility back to the production facility.

Recovery time granularity (RTG). How many backup jobs or data copies are available within your RPO? Typically measured as a count. For example, assuming a nightly backup and an RPO of one week. The RTG will be 7 as there are a maximum of 7 backup jobs in the 1 week RPO.

Recovery consistency characteristic (RCC). Does the data require consistency across multiple hard drives or logical volumes? Typically a yes or no metric that is applicable mainly to business databases and data warehouses.

Recovery object granularity (ROG). What level of recovery is needed to resume operations? Typically a list, such as: system level (multiple servers); server level (multiple hard drives); hard drive level (multiple folders); folder level; file level; item or brick level.

Recovery service scalability (RSS). How scalable is this particular method of recovery? Qualitative metric that identifies the bottleneck in the recovery method. For example, how many tapes can be restored at one time may limit a recovery strategy based on tapes.

Recovery service resiliency (RSR). How tolerant is the disaster recovery to subsequent disasters. Qualitative metric that identifies ways to continue the disaster recovery in the face of other failures and outages.

Recovery management cost (RMC). How cost effective is the recovery strategy? Typically measured as a percentage. RMC is the disaster’s per incident cost divided by the recovery equipment’s operating cost. RMC represents the efficiency of a given recovery strategy.

Defining Business Continuity and Disaster Recovery

Posted by

The definitions that people have for Business Continuity and Disaster Recovery are all over the map. The confusion makes consulting on BC/DR interesting, to say the least. These terms do have standard definitions.

 

Definitions

Business Continuity is the actions a business takes to maintain key business processes in the face of a large scale outage. Business Continuity is holistic. It covers people, processes, and technology. Ideally, it is a managed as an ongoing concern. Such a Business Continuity Program will regularly review potential events (disasters or emergencies), assess the impact of such events, develop recovery strategies to meet these event, and perform regular testing.

Disaster Recovery is the actions that select business units perform to maintain IT systems and services in the face of an outage. While Business Continuity is holistic, Disaster Recovery is very specific. As a subset of the BC, DR concerns the IT applications, servers and equipment. BC identifies the critical business processes. DR identifies the enabling systems.

 

Misconceptions

I have heard a few misconceptions about the relationship between Business Continuity and Disaster Recovery. Here is some of what I have heard, and the correct interpretation.

“Our Disaster Recovery is that piece of equipment right there.” DR is not a backup library, a server, or rack space at a facility. These may be resources in a recovery strategy used for a specific critical business process.

“Business Continuity is just really fast, really good Disaster Recovery.” It is one thing to recover, another to continue. Right? However, DR is a portion of a BC program.

“We have Business Continuity for the business systems (ERP or back office systems) and Disaster Recovery for our IT systems.” The rationale goes that IT that directly supports the business gets Business Continuity and IT that does not gets Disaster Recovery. The reality is that the business has two different recovery strategies: a faster strategy for ERP, a slower less expensive strategy for general IT.

“We used to have Disaster Recovery and then we bought XYZ and now we upgraded to Business Continuity.” There is a vendor whose sales people are making these claims. The reality is that the upgrade improves your recovery metrics by reducing your recovery time. But you still have a DR product.

In summary, Business Continuity is the overall program that encompasses all aspects of maintaining a business in the face of disasters. Disaster Recovery is the part of Business Continuity that deals specifically with restoring IT systems and services.