Anyone who’s bought a computer lately knows that the price of storage has been falling rapidly. And as disk drives get cheaper, says one expert in large databases, databases are getting bigger.
That is enabling companies to do things they couldn’t do a few years ago, according to Richard Winter, president of Waltham, Mass.,-based Winter Corp., a consulting shop which specializes in very large database installations.
Winter, who periodically surveys his customers about the size of their databases, estimates that the largest commercial databases in use today are in the range of 50 terabytes.
Most databases in the commercial arena are in the 100 gigabyte to one terabyte range. Winter estimates that there are currently a couple of hundred commercial databases over 10 terabytes, and only a handful in the 50 terabyte range.
Based on what his clients are telling him, by next year that figure could climb to around 75 terabytes, he says.
To put that amount of data in perspective, consider that to store one single terabyte of data on paper would require some 150 miles of bookshelves, according to Winter.
Faster Than Moore’s Law
The rapid growth of data is partly a result of the steady decline in the cost of disk drives, he says. “The price of storage capacity has been dropping by half roughly every nine months,” Winter says. “That’s twice as fast as Moore’s law, which says that the number of units on a chip doubles every 18 months.”
That’s a key enabling factor, according to Winter, because large databases
require even larger storage facilities.
“With commercial databases,” he says “the ratio of total storage to actual data is about five to one, on average.”
That means that typically only one fifth of the disk space is being used for the actual database. The rest goes to indexing, mirroring or free space for growth of the database, Winter says.
The exact ratio of data to storage depends both on the individual application, and on the database technology being used. Major players in the very large database space include Oracle, IBM, Sybase and NCR’s Teradata division.
Larger databases will inevitably mean larger storage area networks or other storage structures, says Winter. “If we’re seeing commercial databases of 75, or maybe even 100 terabytes next year, then the total storage associated with those is probably going to be in the range of 500 terabytes, or half a petabyte,” he says.
Oceans of New Data
And databases are growing rapidly.
“We’ve seen over the last few years that video cameras connected to computers have gotten incredibly cheap,” says Winter. “And not just video, all the devices that gather data are getting smaller, faster and cheaper all the time. So the technology of capturing data is improving rapidly, and the number of devices that capture data is growing, and that results in rapidly growing oceans of data that are available for scientific and commercial analysis.”
These vast quantities of data are opening up new avenues of research for some companies.
“The advances of the last several years are enabling new applications,” says Winter. “In many industries, it has not been economically practical until now to retain full transaction detail for long term analysis. Large retailers are now at the point where they can store as much as seven years of full transaction detail, for example, which is allowing them to really look at the details of customer purchase patterns.”