One of the biggest problems with backups is that the data being backed up is growing, but the time allotted for a backup (the backup window) is shrinking or remaining static. Another challenge is creating consistently reliable backups.
In July, I described, in general, how disk-to-disk backups can be used to increase the efficiencies of backups. August’s column explained how to implement a disk-to-disk solution using software that runs on a backup server and separate storage. In September I covered virtual tape libraries. This month I’ll discuss solutions that integrate disk, tape and the software into one cohesive solution.
Before disk-based backup solutions, the only way to increase the performance of backups was to buy more tape drives or upgrade to faster tape drives. Over time, solutions became very complex and very expensive.
Companies that could not afford bigger, faster libraries were forced into making some tough decisions: some servers were either not backed up or were backed up infrequently.
Companies that could implement larger solutions introduced a new problem: the more complex the solution, the more likely it is a backup would fail.
Each drive added to increase the performance also increases the odds of the backup failing. A backup that is spread across multiple tape drives requires the robotics to operate multiple times; requires all the drives in the backup to work correctly; and requires all the tapes selected to be “good”.
While these bigger, more complex solutions helped backup performance, they ignore the most important part of the backup: the ability to recover the data. Too many backup projects focus only on backing up the data to address dwindling backup windows, and forget the sole reason that backups are performed: recoveries.
In order to create a continuous stream of backup data, which is often necessary to maximize the performance of tape devices and reduce backup windows, data from multiple servers is interleaved, or multiplexed, onto a single tape. Unfortunately, since the multiplexed backup has spread a single server’s data all over the tape, the recovery may take much longer than the backup.
At first glance, tape-based solutions seem inexpensive. But, when you look past the cost of the actual hardware, tape-based solution can be very expensive. Some studies show that users will buy thirty times the amount of slots worth of tapes during the life of a library. For a medium-sized, 100-slot library, that’s 3,000 tapes.
Manufactures do not sell maintenance contracts for tapes. When a tape goes bad that investment gets tossed in the trash and a new tape must be purchased. On top of that are the soft costs. If a backup doesn’t complete due to a tape-related hardware failure, most companies do not have the time to re-run that backup.
If a system fails (data loss or corruption) that has not been backed up, there is the potential of losing hours worth of data. Having to use a two-day-old backups means that, in the case of a database for example, more than a day’s worth of logs would need to be applied to bring the database close to current time. This could take a very long time to complete and cost many thousands of dollars in lost transactions. Also add the fact that extensive human intervention is required to manage and maintain a tape solution and they are not very inexpensive.
Disk-to-disk solutions help hide the impact of tape and can greatly improve the performance and reliability of day-to-day backup and restore operations.
With host-based solutions, some intelligence is added to the backup server which allows the backup software to backup directly to disk. That backup can then be copied or cloned to tape. If the tape has an issue, the production servers won’t ever know because they have already been backed up to disk.
With the appliance-based solution, the backup software remains unchanged and a dedicated appliance is added to the backup environment to emulate tape devices. The backup server just backs up to those artificial tape devices.
Like with the host-based solution, the backups that were written to the appliance can be copied to tape at any time, once again hiding the fact that there is tape from the application servers.
With the host-based solution, there is a backup server, separate storage and separate tape devices. With the appliance-based solution there is a backup server, the appliance (with its own storage or separate storage) and separate tape devices.
A third option is to use a solution that combines disk and tape in a single, integrated device that also contains logic to manage the movement of data from the disk to the tape process.
With an appliance solution there are four distinct components: backup server, appliance, disk pool and tape device. The integrated library solution has only two components: backup server and integrated library.
The integrated library looks like a standard tape device to a backup server. The backup server backs up its client data to the integrated library like it would to any other tape device. The integrated library stores the backup data to disk. At this point, it performs like and offers the same benefits as the appliance solution (fast, reliable backups and recoveries).
Once the data is stored on disk, the integrated library then controls when and how it will copy the data to its internal tape devices. This is where it differs from the appliance and host-based solutions.
With the other solutions, the backup server manages and performs when and how the physical tapes are created. With the integrated library solution, the backup server has no knowledge of the tape copy and does not incur the overhead of creating the tapes. The integrated library solution provides a single device to manage with a single support contract.
A negative of the solution is that there are few choices with each solution. For example, if you are not pleased with the performance of the disk, you may not be able to swap the integrated disk pool with your preferred vendor. Similarly, if you are not satisfied with the tape library, you may not be able to replace it with a different vendor’s product.
Adding even a small amount of disk into an existing backup solution can significantly reduce the load on existing libraries, extending their useful life. Additionally, introducing disk to the backup solution improves reliability and performance.
With a disk-to-disk solution, files can be backed-up and recovered at disk speeds; in most cases, a disk-to-disk solution can be finished recovering the data before a tape-only solution has even started streaming the data from tape.
So, with all these advantages, is tape dead? Far from it. Most companies will always have a need to archive data to tape for long term storage. These last five months I talked of ways to increase backups by backing up to disk; next month I will discuss other ways to use storage to increase backup performance.
Jim McKinstry is senior systems engineer with Engenio Information Technologies, an OEM of storage solutions for IBM, TeraData, Sun and others.