Using FlashCopy Version 2 for DB2 UDB for OS/390 & z/OS Object-Level Migration

by Daniel L. Luksetich
August 1, 2004

** Read this article online at http://www.mainframezone.com/applications-and-databases/using-flashcopy-version-2-for-db2-udb-for-os-390-z-os-object-level-migration


FlashCopy is an IBM microcode-based technology that comes as an add-on feature to the more advanced IBM DASD subsystems. Other DASD manufacturers typically have similar capabilities under different names. This technology lets the DASD subsystem handle some of the data movement task rather than relying on the operating system or programs. FlashCopy is a useful utility for backing up databases and moving large volumes of data quickly at a DASD volume level. FlashCopy Version 2 introduces data set level copy.

FlashCopy V2 is available for some of the IBM storage subsystems and may already be available now or will soon be available for HDS and EMC subsystems. This article will focus on a technique for using this technology to move  DB2 objects between databases and subsystems much faster than traditional techniques. Use of this technology will change the way we move our data!

Migrating DB2 Data

DBAs are moving data continuously to:

A DBA can employ several techniques to move DB2 data from one object to another. Among the safest options is an unload/load. Using the DSN1COPY stand-alone utility is more risky. Other utilities can be used for data migration; these include DFDSS, AMS REPRO, and FDR (see the DB2 Administration Guide for the complete list).

The DBA begins the process by creating target tablespaces, tables, and indexes that match the source object entirely. The target objects can be in the same or different databases, or even in a different DB2 subsystem.

If unload/load is the method chosen, then the DSNTIAUL or equivalent program is used to unload the source tablespace data into flat files. The DB2 LOAD online utility (or equivalent third-party LOAD utility) is then used at the target to populate the tablespace and build the indexes. This is the safest method, since it’s in DB2’s control. This method also allows the source tablespaces to be online during the unload process.

If DSN1COPY is chosen, the source tablespace and indexspaces can be copied directly from the source to the target. This is significantly faster than the unload/load method, since the data doesn’t have to be staged in a flat file. Also, indexes can be copied using this technique, which can improve speed. The DBA is responsible for translation of internal DB2 object identifiers. Since DSN1COPY is an offline utility, it requires control of the underlying data sets at the source and target. That means DB2 objects affected must be offline for this process to work. It’s a trade-off between speed and availability.

FlashCopy V2 offers a technique similar to DSN1COPY. However, this technology will dramatically reduce the CPU and elapsed time required to move the data. Using the dfsmsDSS system utility, FlashCopy V2 can be used to quickly copy the data from source to target, and reduce the outages associated with DSN1COPY.

 

DB2 Data Sets

DB2 stores all its user data in VSAM linear data sets. The name of the data set corresponds to the name of the object whose data it contains. The format of the data set name is as follows:

catname.DSNDBx.dbname.psname.y0001.Annn

Here:

So, for example, partition 120 of tablespace YLASPACE1 in database YLADB01 in our DSN1 subsystem (VCAT alias of DSN1) would be named:

DSN1.DSNDBC.YLADB01.YLASPACE1.I0001.A120 for the cluster

DSN1.DSNDBD.YLADB01.YLASPACE1.I0001.A120 for the data component.

DB2 typically has exclusive control over these data sets. However, with proper execution, these data sets can be manipulated outside DB2’s control. For object migration, moving the data between these underlying DB2 data sets can result in a faster migration process. This is how we’ll migrate the data using FlashCopy V2.

 

DB2 Internal Object Identifiers

Several DB2 internal identifiers play a major role in our ability to copy DB2 objects outside DB2’s control. DB2 uses them to ensure that the data in the underlying data sets match the database objects they represent. The identifiers are the:

There’s one DBID value for each database in a DB2 subsystem, one PSID for each tablespace in a database, and one OBID for each table and index in a database. DB2 objects have these internal identifiers in multiple places in the underlying physical data sets. For DB2 tablespaces, the DBID and PSID appear on the header page. The tablespace will also have an OBID of a corresponding table on each page that table occupies in a tablespace. For a DB2 indexspace, the DBID and ISOBID appear on the header page, and the index OBID is on each page of the index.

These internal identifiers are important to the data migration process. DB2 stores these identifiers in its system catalog and directory and uses the identifiers to internally verify the identity of the objects it’s accessing. Since DB2 uses these identifiers in this way, then the internal identifier on a given page of a tablespace or indexspace data set must match the corresponding identifier in the system catalog and directory. If there’s a mismatch, DB2 will either return an error or an unexpected result. When copying data from one DB2 object to another outside DB2’s control (as with FlashCopy), ensure that the internal identifiers are translated from the source to the target.

FlashCopy V2 and DB2

With FlashCopy V2, data sets can be copied without an actual physical copy occurring. In other words, this technology enables the DASD subsystem to manage information about changes to volumes or data sets while ensuring those changes are handled across copies in progress. There are options available to instruct the DASD to perform a physical or logical copy and track changes to source and target objects.

Regardless of the type of copy completed, this technology enables volume or data set copies, which used to take hours, to happen in seconds. The operating system knows no difference between an object copied this way and any other object. The technology is limited in that all FlashCopy operations must occur within the same physical DASD subsystems. These subsystems are limited in size, cache, channels, and back-end bus. The largest subsystem available today provides about 60TB of storage.

Since DB2 stores all its user database objects in VSAM linear data sets, FlashCopy V2 can be used to copy those DB2 data sets. DBAs have used a variety of techniques, including DSN1COPY, to copy DB2 objects across databases or subsystems by copying the underlying data sets. Although this data set movement technique is effective and faster than typical database unloads and reloads, it can still be a tedious, lengthy process for unusually large databases. Like DN1COPY, FlashCopy V2 can be used to quickly move these underlying data sets. Where DSN1COPY is impacted by the size of the data set it needs to copy, FlashCopy V2 works outside the operating system to copy the data, so the size of the data to be copied is largely irrelevant.

All the major DASD subsystem vendors will support FlashCopy V2. This means we can now employ an advanced copy solution that’s fast and mostly independent of the operating system without relying on a vendor-specific solution.

FlashCopy V2 DB2 Data Migration Techniques

Let’s explore three ways to use FlashCopy V2 for data migration. Each approach has trade-offs in ease of use, speed and risk. The three techniques are to use FlashCopy V2 to move:

 

Technique #1

Let’s examine use of FlashCopy V2 to move all objects, and maintain source and target objects with the same internal identifiers. The idea is to create a database in another DB2 subsystem so all the internal identifiers for all the DB2 objects of interest match exactly from the source subsystem to the target subsystem. This is difficult to manage because DB2 only gives you control over the table OBID at object creation. IDs for databases, tablespaces, and indexes must be coincidentally the same. Having all the IDs the same could provide for the fastest copy process. However, dummy objects may have to be created to get IDs to match. Even so, it may be difficult, or perhaps impossible, to always keep the IDs in sync. More risky operations, such as modifying the contents of the DB2 catalog and directory outside DB2’s control, are also a possibility. This option is also limited by the fact that you can have only one target database per DB2 subsystem using this technique. This may create additional resource consumption of CPU and memory to maintain several separate DB2 subsystems. However, with all the internal IDs the same for source and target objects, you need not run a process to modify IDs at the target. Target objects can be immediately available once the copy is completed.

Technique #2

Another option is to use FlashCopy V2 to move tablespaces with a repair of the target tablespace-related internal identifiers, and then rebuild target indexes. There are several database object IDs in each database object that must be modified for DB2 to recognize objects at the target. All the IDs, except for the table and index object IDs (OBIDs), exist in the header pages of each DB2 object. You can use the DB2 REPAIR utility to modify the IDs in the header pages (DBID, PSID, ISOBID) only. This utility would run against the target data sets and touch only the first page in each data set. This is much faster than using DSN1COPY. The OBIDs still exist on every index and table data page, however. DB2 table OBIDs can be specified at table creation time, so these IDs can be set when the object is created. Index OBIDs cannot be set at object creation time (you can supply it in the Data Description Language [DDL], but it’s not used), so they would still have to be modified via a DSN1COPY or index rebuild process.

Technique #3

The third option is to use FlashCopy V2 to move all objects with a repair of the target tablespace identifiers and only some index identifiers. This option is the same as the previous option. The only disadvantage to the previous option is that the target index had to be copied using DSN1COPY, or rebuilt using the REBUILD utility. This is because there’s no way to set the OBID for an index, as is possible for a table. However, DB2 seems to ignore the OBID in the index pages, so the index can be copied using FlashCopy. Index DBID and ISOBID still need to be repaired for the header pages only at the target, but the OBID can be ignored. The only disadvantage is that this is undocumented and presumably not an approach IBM supports.

Using FlashCopy V2 to Migrate DB2 Data Sets

Before copying the data sets, define the target objects at the target database, which can be located in the same DB2 subsystem or a different subsystem. All the target database objects need to exactly match the source objects in structure. In addition, you must create target tables with the OBID that matches the OBID of the corresponding source table. This is critical. The names of the target objects do not need to match, but this certainly simplifies the process. All the internal database identifiers for the corresponding source and target objects need to be recorded, as they’ll be used during the copy process. The method that we’ll use is technique #3, in which the OBID of the index is ignored.

Here are the steps in the copy process:

  1. Stop the target database objects.
  2. Use Access Method Services (dfsmsAMS, better known as IDCAMS) to delete all the underlying data sets of the target object.
  3. Stop the source objects. Optionally, the source objects can be started for read-only and quiesced for a read-only outage at the source.
  4. Initiate the copy using dfsmsDSS copy utility.
  5. Start the source database object.
  6. Start the target objects for UT access.
  7. Repair the target objects to adjust the internal identifiers on the header pages, also repairing the LEVELID of the tablespaces and indexes.
  8. Start the target objects for full access.
Figure 1 lists the source and target objects for our test copy process. The copy process will allocate the target data sets, which need to be stopped for DB2 to release its enqueue on the data sets. The following command would be issued via the TSO DSN command processor:

-STOP DB(YLADB02) SP(YLATS01,YLAIX01)

The copy process will allocate the target data sets, so they’ll need to be deleted first to avoid an error. The dfsmsAMS command in Figure 2 can be used to delete the target data sets.

The copy process needs to get an enqueue on the source data sets. It’s important to have a consistent image of the source object copied to the target, so the source data set must be guaranteed to be stable, with all DB2 data externalized from the buffers, and into the data set. Stopping the source objects accomplishes this. However, the source objects can be optionally started in read-only mode; you can use a QUIESCE utility with the WRITE YES option to externalize the data. If the source object is started in read-only mode, the subsequent dfsmsDSS copy process must use the TOLERATE( ENQFAILURE) option. The following TSO DSN command would be issued to stop the source objects:

-STOP DB(YLADB01) SP(YLATS01,YLAIX01)

You initiate the copy process using the dfsmsDSS COPY command. The options of this command specify that the source data sets will be copied to the target data sets. The options also indicate the target data sets be allocated, renamed, and cataloged. Also indicated is use of a fast replication technique. This will initiate the FlashCopy process. The copy will be initiated within the DASD subsystem and, once the initialization is complete, the dfsmsDSS command will terminate. At that point, the copy process has been initiated within the DASD subsystem, which will maintain the consistency. Once the dfsmsDSS command has completed, both the source and target data sets are available for full use. The dfsmsDSS command to copy our example objects will look like Figure 3.

From the operating system perspective, the copy has finished. From a DB2 perspective, the target objects are an exact copy of the source objects with no impact whatsoever to the table data. So now the source objects can be started for full access using this DB2 command via the TSO DSN command processor:

-START DB(YLADB01) SP(YLATS01,YLAIX01)

Once the copy process has started, target objects can also be started. Since the LEVELID and internal object identifiers still need to be adjusted, the safest action is to start the target objects for utility access only. This lets the REPAIR utility work and keeps users out of the target objects. The TSO DSN command processor issues the following DB2 command to put the target objects into utility mode:

-START DB(YLADB02) SP(YLATS01,YLAIX01)

ACCESS(UT)

Use the DB2 REPAIR utility to adjust the internal object identifiers from their source object values to the target object values, as well as the LEVELIDs. Since the REPAIR utility needs only to touch the header pages, this process is quick. Remember, only the DBID, PSID, and ISOBID in the header pages need to be updated since the OBID of the target tables are set at object definition time and the index OBIDs are ignored. Figure 4 shows the DB2 REPAIR utilities issued for our example.

Once the internal object identifiers are adjusted, the target objects can be started for full access. The command is:

-START DB(YLADB02) SP(YLATS01,YLAIX01)

The copy process may still be active inside the DASD subsystem, but the operating system and DB2 are oblivious to that.

 

FlashCopy V2 DB2 Object Level Migration Performance

Please note that times listed in the following performance information are approximate; your performance may vary significantly.

A small-scale performance test was conducted using a single large DB2 table. This was followed by a large-scale test involving several large DB2 tables. During all performance tests, queries were run against the target tables to confirm that the data at the target was available and accurate. The only technique used during performance testing was technique #3.

 

Single, large DB2 table: A partitioned tablespace with a single partitioning index was copied from the one database to another database in the same subsystem. The DB2 pagesets involved encompassed 508 data sets. The combined size of all these data sets was approximately 66GB. The time taken to perform the copy process was about 25 minutes.

 

Six large DB2 tables: Six large DB2 tables were copied from one database to another database in the same subsystem. All tables were partitioned, and some had secondary indexes (NPIs) as well as partitioning indexes. The DB2 pagesets involved encompassed nearly 3,000 data sets. The combined size of all these data sets was approximately 1.1TB. The time taken to perform the entire copy process for all objects was about 46 minutes and 336.55 seconds of CPU time.

 

Performance vs. Traditional Methods

The tests were conducted on an IBM zSeries 900 computer with four available processors. Since the FlashCopy process didn’t consume much CPU, the number of concurrent processes could scale up nicely. This isn’t possible with the more traditional techniques of unload/load or DSN1COPY, as they consume significantly more operating system resources and have to compete with other processes running on the machine.

Using the unload/load method for the same data would take many days and would have to be carefully scheduled and coordinated with other jobs in the system. Recent copies of just one of the largest of the test tables took five days.

A copy process using DSN1COPY of our six tables is projected at 172.5 hours elapsed and 10.6 hours CPU, based upon a sample of jobs run, and provided there’s no competition with other jobs.

This demonstrates the potential elapsed and CPU time-savings you can realize by using FlashCopy V2 rather than the traditional methods.

 

Summary

FlashCopy V2 lets you perform data set level copies quickly and easily using advanced DASD subsystems from all major vendors. This technology works well with DB2 data sets and can be used as a replacement to traditional methods of migrating data with extremely fast copy times and significantly less CPU resource consumption. Use of this technology could significantly change the way we move data!

 

References and Acknowledgments

SC26-9931-03; DB2 Universal Database for OS/390 and z/OS Administration Guide

SC26-9945-04; DB2 Universal Database for OS/390 and z/OS Utility Guide and Reference

SC26-9934-03; DB2 Universal Database for OS/390 and z/OS Command Reference

SC26-7394-01; DFSMS Access Method Services for Catalogs

SC35-0424-02; z/OS DFSMSdss Storage Administration Reference

SG24-5680-03; Implementing ESS Copy Services with IBM eServer zSeries

Special thanks to Craig Cramer and Rich Fazio from TransUnion LLC for their inspiration and support of the research into this advanced copy technique!



This article had no comments at the time of this printout.