Mainframe Business Continuance Via Asynchronous Replication: Mixing Synchronous and Asynchronous Replication for Maximum Cost-Effectiveness

by Nick Tabellion
August 1, 2004

** Read this article online at http://www.mainframezone.com/operating-systems/mainframe-business-continuance-via-asynchronous-replication-mixing-synchronous-and-asynchronous-replication-for-maximum-cost-effectiveness


Mainframe managers today are struggling to craft a cost-effective strategy for business continuance, of which traditional disaster recovery is only one part. The broader mandate of business continuance, however, makes it a more complicated and potentially more expensive task than mainframe disaster recovery. This is forcing managers to explore a growing set of options, especially asynchronous replication, and to assemble the optimum mix of technologies to achieve the organization’s business continuance and disaster recovery goals in the most cost-effective way.

The traditional mainframe disaster recovery strategy— synchronous mirroring—isn’t appropriate for all organizations or for the broader needs of business continuance. Business continuance refers to the ability to maintain business operations, even at a somewhat reduced level of service, in the face of any of a variety of problems that may impact the primary business center. It extends beyond IT to all operating units of the organization, its people and processes, although systems and data clearly play a central role.

Resuming business operations will likely entail recovering secondary applications and data as well as the organization’s primary applications and databases. It may involve positioning copies of data at multiple sites quite distant from the primary location.

Disaster recovery entails the ability to restore the organization’s primary production applications and data if a natural or man-made disaster temporarily or permanently makes those resources unavailable. In the mainframe environment, this traditionally has meant synchronous mirroring of data and applications across the corporate campus or metropolitan area.

 

Business Continuance Challenges

Events of the past few years have led most organizations to look beyond disaster recovery of the primary applications and address a broader set of business continuance needs, such as secondary applications and data. Given the more varied objectives, needs, and requirements of business continuance and the need to extend the range of conventional disaster recovery strategies, organizations are starting to look beyond synchronous mirroring and examine asynchronous replication and the opportunities it presents for addressing both business continuance and disaster recovery challenges.

 

Synchronous and Asynchronous Replication

In the z/OS environment, synchronous mirroring is the primary method of duplicating data for business continuance. With synchronous mirroring, data is written to the second (target) system at the same time it’s written to the primary system. The primary system doesn’t move to the next transaction or piece of data until the target system has acknowledged the previous transaction. So synchronous mirroring is the best way to ensure that the mirrored data is an exact real-time replica of the primary data. For this reason, it has become the standard mechanism for maintaining availability of critical production applications.

Synchronous mirroring, however, has some drawbacks. It slows down system performance when mirroring over long distances as the primary system waits for an acknowledgement from the target system. The cost of a sufficiently fast, dedicated connection over long distances can also be prohibitive, putting it beyond the reach of many organizations. Still, synchronous mirroring remains, for now, the primary method of disaster recovery for large mainframe shops.

In the face of recent mandates by governments throughout the world, companies must reconsider their disaster recovery and business continuance strategies, especially with regard to distance. The attacks of September 11 and the great blackout that covered much of the eastern U.S. and another in western Europe last year made managers acutely sensitive to the need to maintain copies of their data at much greater distances from the primary location. This ensures that the alternative sites are outside any impacted area. When mirroring data over these kinds of distances, however, synchronous mirroring turns out to be an extremely expensive solution that’s justifiable, at best, for only the most critical applications.

It’s completely out of the question, from a cost standpoint, for the many secondary applications that should also be included in a broader business continuance strategy (see Figure 1).



As a result, organizations are turning to asynchronous replication as the practical, cost-effective disaster recovery option. It’s inexpensive enough to be applied to a much wider range of applications. Asynchronous replication momentarily freezes the primary application to capture a snapshot of the data at that moment in time. It sends this data to the target while the primary application resumes operations. Depending on how frequently the organization sets the replication, the target data may lag behind the primary data by anywhere from a few minutes to a few hours. For all but the most time-sensitive, critical systems, such as airline reservations, this isn’t a problem.

Asynchronous replication over IP leverages existing infrastructure and doesn’t require specialized hardware to implement, making it that much more economical. In addition, it doesn’t impact application performance, regardless of the distance, as synchronous replication does. With asynchronous replication, the organization can replicate data to a site 1,000 miles away as easily as a site 60 miles away.

 

RPO and RTO

With two options (synchronous and asynchronous replication) and two strategies to support (disaster recovery of the primary applications and business continuance for the organization as a whole), managers now face the challenge for determining which replication approach to use.

Fortunately, it isn’t an either-or decision. Managers can and should use both, mixing them based on the nature of the data and the needs of the application and the business. By mixing the two options, managers can find the most cost-efficient blend of costly synchronous and inexpensive asynchronous replication best suited to their particular set of needs.

To develop the optimum mix, managers need to understand two key metrics: Recovery Point Objective (RPO) and Recovery Time Objective (RTO).

RPO is the point in time to which systems and data must be recovered for effective disaster recovery or business continuance. In effect, RPO defines the amount of data delay the organization can tolerate in the event of a failure. If your RPO, for example, is zero, it means the organization can tolerate no delay for that application. For many organizations and applications, the RPO can range from a few minutes to hours.

RTO is the time within which systems and applications must be recovered after an outage. Again, if the RTO is zero, it means the organization must provide for virtually instantaneous fail over. However, many applications can tolerate an RTO ranging from minutes to hours.

Applications often have differing RPO and RTO, depending on the purpose and its importance to the business. For example, an airline reservation system or a bank’s deposit and withdrawal system will have a much different RPO and RTO than a retailer’s inventory restocking system. Even within the same industry, a bank may have financial transaction systems with low or even zero RPO and RTO, but e-mail applications that can tolerate significantly higher RPO and RTO levels.

Only after the organization has determined the RPO and RTO for a particular application can it decide what kind of replication is necessary. If the RPO and RTO are zero, there’s little choice. The organization needs synchronous mirroring. However, if the organization can tolerate even slightly higher RPO and RTO levels, it can take advantage of asynchronous replication, which can meet a wide range of RPO and RTO requirements.

 

Cost-Effective Business Continuance

Traditionally, organizations treated most, if not all, the applications the same way for disaster recovery. This meant, for example, that applications with an RPO of 30 minutes were treated the same way as applications with an RPO of zero. This is a costly approach indeed. To save money, the only alternative was to mirror only those applications with an RPO or RTO of zero and rely on slow, tape backup for the rest— even though tape would not meet the more relaxed RPO and RTO requirements of most of the other applications. In effect, managers were forced to settle for unsatisfactory levels of business continuance protection to conserve their budgets.

By analyzing the RPO and RTO for each application that needs to be included in the business continuance and disaster recovery strategy, managers can combine their use of synchronous and asynchronous replication along with tape to create the most cost-effective strategy for each application. Taking this approach, the manager will specify costly synchronous mirroring only for those few applications with the most demanding RPOs and RTOs. For the other applications, the manager can use less costly asynchronous replication, point-intime snapshot copies, and even tape backup, which is the least costly of all but will suffice only for applications that have highly tolerant RPO and RTO requirements. The result is a cost efficient business continuance and disaster recovery strategy that meets business needs in terms of budget, data protection and availability.

 

Point-in-Time Replication

With asynchronous replication comes the ability to make Point-In-Time (PIT) copies of data. PIT copies, which are asynchronous snapshots stored on disk, further enhance the organization’s business continuance options. By creating a series of PIT copies, the organization can roll back to the state of the data at a given time.

For example, PIT copies let the organization that discovers its data has been corrupted recover data from a point before the corruption occurred, thus ensuring a clean copy of the data. PIT copies also speed data recovery by allowing the restoration of data directly from disk, either online or near line.

 

Asynchronous Replication Options

Mainframe managers have a choice of host- or array-based asynchronous replication. Which type of replication the organization chooses should depend on its specific replication objectives, budget and storage environment.

Each replication approach has strengths and limitations. Host-based replication is flexible because it can leverage existing IP networks and servers, including the mainframe. It supports data replication between any two storage arrays. Because it doesn’t require any additional hardware, it tends to cost significantly less.

Array-based replication, commonly used for mainframe-based synchronous mirroring, offers high-performance. However, it requires homogenous storage and dedicated network links between the arrays, making it a high-cost solution.

Using RPO and RTO as guidelines, a manager can mix host-based asynchronous replication and array-based synchronous mirroring to balance performance and cost while achieving the organization’s business continuance and disaster recovery goals (see Figure 2). This requires IT managers to carefully assess their applications and systems in light of their replication needs, strategy and budget.

 

Business Continuance on the Mainframe

Host-based asynchronous replication makes business continuance feasible for mainframe-centric environments. Lower in cost, it lets managers protect and ensure the availability of all applications, not just those that justify the high cost of synchronous mirroring. It also makes it practical to replicate data across far greater distances and to multiple locations. Host-based replication further increases the flexibility and cost advantages by leveraging existing networks and infrastructure and freeing the organization from reliance on proprietary technology. This becomes particularly important as mainframe-oriented organizations increasingly find themselves operating in a mixed mainframe and open systems environment.

It’s no longer enough to simply mirror key mainframe applications for disaster recovery. Business continuance today requires a much broader strategy for ensuring the availability and protection of more applications and data, both mainframe and open systems, while spanning much greater distances. By combining synchronous mirroring and asynchronous replication, managers can meet these new objectives and reduce costs.



This article had no comments at the time of this printout.