Just What Is ‘Content-Addressed Storage’?

When a survey sponsored by EMC in 2000 found that 75% of the information in the world was “fixed content,” it didn’t take a prophet to recognize that technology for dealing with such data was in the pipeline.

“Fixed content” simply refers to data that is written once and never changed. It could be an invoice, a purchase order, a financial statement, archived e-mail, or a medical X-ray. It’s information that needs to be stored, but cannot be altered in any way. When you add in compliance for electronic record-keeping, which was spurred even further by new SEC regulations, a storage schema designed specifically for keeping fixed content secure and in place was inevitable.

Keeping data secure and available according to any number of regulations is also big business. According to a study by Enterprise Storage Group on the impact of compliance on information management, compliance-related storage products and services could be worth as much as $6 billion over the next four years.

EMC coined the term “content addressed storage” (CAS) in 2002 when it released its Centera product. At its most basic, CAS provides a digital fingerprint to a stored piece of data. The fingerprint (also known as an ID or logical address) ensures that it is the same exact piece of data that was saved. No duplicates are ever stored. It’s a radical departure from the traditional file system used in most storage systems.

EMC’s Centera is a backend used with any number of applications — including popular, enterprise-class content managers from Documentum and FileNet — thanks to an EMC partner program and APIs. Centera is a magnetic disk-based WORM device, and it’s now available in a Compliance Edition, which is specially suited for enforcing retention periods for fixed content.

Tumbleweed Communications, one of EMC’s Centera partners, uses Centera to archive electronic communications such as e-mail and instant messages for compliance with regulations. Artesia, a maker of digital asset management technology, has also teamed with EMC and will use Centera to archive rich media assets.

While CAS through Centera offers a solution for complying with the more than 15,000 federal and state laws dealing with record retention, what else can it accomplish? That’s a question being asked by Anne MacFarland an analyst with Wellesley, Mass.-based consultancy The Clipper Group.

“Compliance is the low-hanging fruit of the week,” MacFarland said. But she has a hard time seeing other applications for content-addressed storage, and added that the term itself has relied mostly on the clout of EMC, a powerful name in the storage world.

‘Searchability’ May Be Key to CAS Growth

Content-addressed storage is object-oriented, not unlike the Java programming language conceptually. It also uses disk-based technology, which is more easily searchable than removable media such as tape, but also more expensive. Searchability may be the key to finding more applications for CAS.

“There’s great future value for search by content approaches, particularly for images,” MacFarland said. For example, a hospital or medical research center can construct an application that will search for tumor growth in images stored using CAS.

Kevin Daly, CEO of Irvine, Calif.-based Avamar Technologies, readily admits that EMC coined the term content-addressed storage, even though his company is using the same concept for a disk-based back-up system called Axion. But that’s just fine with him.

“The technology has been around in academic circles for a while,” Daly said, “but it hasn’t really found applications until EMC brought Centera to market.”

Axion is a hardware and software appliance that is going where few backups have ever gone — magnetic disk. By using CAS, Avamar has come up with an approach that reduces the amount of disk needed, and in turn brings disk-based backup closer to the price point of tape.

Daly said Axion does use the same CAS approach as EMC, but Axion uses it at what he calls “a very small grain.” Once Axion sees an object when creating a backup, it never stores that object again. This is made possible because each object, like in EMC’s Centera, has a unique logical address related to its content.

“We don’t have to do fancy things like pattern searching,” Daly said. And because no object is stored twice, it uses one-tenth the capacity of a normal backup. An organization with a 10 TB environment, for example, will require 100 TB of tape for a normal tape backup. With Axion, Daly said the backup will require just 10 TB.

Both EMC and Avamar claim their CAS systems are nearly self-managing and much more autonomous than traditional file systems because the objects are fixed and randomly placed in the addressed space.

Avamar has just raised $13 million in its fourth round of funding, and Daly believes his company’s application of CAS will take hold as disk-based backup grows in popularity.

“I believe the entire back-up world will in time move to disk-based,” he said. But right now it has to start small. Disk-based backup may not be efficient for customers with an environment less than 2 or 3 TB, according to Daly. CAS relies on a lot of processing as it recomputes an object’s unique identifier whenever it reads, and thus it is more attractive to larger systems.

Regardless of the system, CAS is very effective at making sure the data you saved is the data you need because of the unique identifier.

“It gives you a very powerful way of making sure the object you need hasn’t changed,” Daly said. “That’s very hard to do in a traditional file system. This is a real change in the storage business.”