Tips on Disk-to-Disk Backup, Part V

Since pointer-based snapshots do read the source data (unless it has been changed; also called a dependant copy) there is the potential for performance degradation while the snapshot is being backed-up to tape. This degradation is much less than if the database was in a hot-backup mode and being backed up at the same time.

Clone-copy. A clone-copy provides a complete copy of the data. When a full-copy is initiated, all the original data is copied to another area of storage, which may take from a few minutes to hours.

Each full-copy snapshot (also called an independent copy) requires enough storage to hold an exact copy of the original data; there is a 100% capacity overhead per full-copy snapshot.

The attractive part of the full-copy snapshot is that once the copy is completed, since it resides on completely different disks, it can be used with no performance impact to the original data. This means that not only can this full copy be used for a backup, but it can also be used to mine data, test patches or upgrades, be used for development, etc. on a server separate from the production server.

Options

Like many functions related to storage, there are three places where snapshots can be managed: the host, an appliance or a storage device.

Host-based snapshots have a variety of different issues. One major one is the amount of overhead that can be placed on the host.

Companies spend a lot of money for servers to run their databases and in order to keep them running efficiently, should not add functions that may impact performance. Another issue is having to use different software for each operating system. While some vendors may support multiple operating systems, they probably won’t support all that you may use. Many host-based solutions don’t support moving snapshots between hosts.

An appliance-based solution takes away operating system issues, but can be a major bottleneck in the IO path.

For example, it’s almost impossible to push a high-end, mid-range storage device to its limit with two servers with the power that most of these appliance-based solutions have. If there are two storage arrays being managed by the appliances, then 50% or more of their performance will remain untapped.

Storage-based snapshots, where the snapshots are performed within the storage array, are the best solution to consider. There is no overhead on the hosts, snapshots will work for any host attached to the array and use a common interface. A storage array is optimized for IO processing and can perform snapshots far more efficiently than the other options.

Solving backup needs used to be relatively easy: use the size of the backup window, the amount of data to be backed up, and the speed of the backup drives to calculate how many drives were needed. Today, where even the smallest companies have applications that need to be up 7×24, it can be much harder to solve backup needs by throwing bigger libraries with more tape drives at the problem.

My last four columns covered different ways to implement a disk-to-disk solution to help increase the efficiencies of a backup solution (faster backups and faster recoveries). This column discussed how to virtually eliminate a backup window by using snapshots. Take an existing backup solution and add a disk-to-disk solution and disk-based snapshots and you’ve got the ultimate backup solution. Backup windows will drop from hours to seconds or minutes and recoveries will be dramatically faster than tape.

Jim McKinstry is senior systems engineer with the Engenio Storage Group of LSI Logic, an OEM of storage solutions for IBM, TeraData, Sun and others.