Overview: This Aberdeen InSight examines disk storage and storage area network (SAN) trends. Higher-capacity disk drives are decreasing in price, and SAN capacities, as a result, are on the rise. That sounds good, but the devil is in the details. The phenomena known as “access density” and “locality of reference,” and SAN cache memory algorithms may indicate that when need for simultaneous random access outweighs the need for capacity, small, fast disks could carry the day.
U.S. naval hero Captain John Paul Jones wrote to Congress "I wish to have no connection
with any ship that does not sail fast, for I intend to go in harm’s way." The quotation is often paraphrased
to, "Give me a small fast ship …"
That brave quote is the inspiration for this InSight, because in modern IT, "harm’s way" could be regarded as fulfilling storage infrastructure
requirements for an enterprise’s most critical application.
Often, the most enterprise-critical application is online transaction processing (OLTP) against
an information database. Databases are densely packed with information and they are structured files (i.e., their contents can be sorted and searched). Databases consume relatively less storage than
semi-structured files (such as this text, which can be searched,
but not sorted), and far less than unstructured files
(such as media files, whose contents cannot be searched or sorted).
OLTP application storage seldom needs to be scaled "up" in capacity as much as "out" in
simultaneous access capability (i.e., number of concurrent users). Enterprises using OLTP ought to consider the
A Disk Limitation
|Recent Aberdeen InSights|
Seeking The ‘Holy Grail’ Of Storage: A storage utility would present a simple ‘data tone,’ allowing any qualified user the fastest secure access to stored information.
CIOs Embracing Web Seminars: A recent poll of IT executives found nearly three-quarters turn to Web seminars at least once per quarter.
Is A CRM Turnaround On Horizon?: After five quarters of decline, the Customer Relationship Management software sector may be due for recovery.
Security Policy Automation In The Enterprise: What emerging security policy automation tools can do for your network.
The Promise of Financial Value Chain Management: Using tools to streamline and automate various financial processes in order to cut costs throughout the commerce cycle.
BPM Burns Operational Fat: Business Process Modeling bridges the gap between existing IT infrastructure and emerging B2B collaboration protocols.
Where Financial Processes and Technologies Stand: A look at the opportunities and challenges offered by financial process automation.
However much storage capacity a disk drive has, it has only one read/write head access mechanism, which can
be positioned only over one disk cylinder at a time. And the disk has only one read/write channel, so only one
data stream can move through one of the heads on the access mechanism at one time.
Even obsolete head-per-track disks had only one read/write channel. Head-per-track design eliminated seek delays,
leaving only rotational latency, but they had small capacity and were never designed for simultaneous access.
Storage Area Network Architecture
To achieve high simultaneous throughput, SAN architectures must often be scaled up in capacity in order to be
scaled out in disk count and port count. Today’s typical "large" (78-GB or even 172-GB) disks in a SAN
often may not be allocated effectively to more than about 20% capacity, so that data can be spread out for effective
Every SAN design uses cache memory and a cache maintenance algorithm, in an attempt to smooth
out and speed up effective data access.
Locality of Reference
Most cache algorithms assume locality of reference
(i.e., the next data reference will be to immediately adjacent data) and cache algorithms often pre-fetch disk
data that follows a fetch. Algorithms with that behavior are correct for applications that generate reports or
perform other "batch" type of work on a database.
Randomness Can Be a Problem
OLTP applications serving multiple simultaneous users tend not to make local subsequent access,
but random access instead. When that happens, pre-fetch cache is typically ineffective.
In such situations, using a SAN that comprises many small-capacity fast disks or containing the
entire database in solid-state storage, if it will fit, could be more effective.
The More Things Change, the More They Remain the
We’ve known for decades that for best performance, it is better to use many small disks than the same capacity
supplied by fewer large disks. Queuing theory states that when too many I/O requests descend on the disk array,
multiple short queues perform better than one large queue.
The queuing theory principle also applies to decision support. Locality of reference is helpful, but performance,
even on the same query, will be much faster to data that is striped across multiple disks and accessed in parallel.
Practically, most databases are not that smart about how to lay out the data on multiple disks to maximize parallelism.
Web Architecture Performance
For Web architecture, the types of transactions and data differ from both OLTP and decision support. OLTP includes
many random-access updates on small number/text data items (updates lock out other operations on a data item).
Decision support uses many read-only queries that access a substantial subset of the entire database
again operating on small number/text data items.
Just like a packaged application, a Web site tends to generate "mixed" transaction streams that include
updates, random-access reads (the most likely case), and queries on multimedia data items that vary widely in size
(e.g., video files of several MB). In this case, IT managers could expect a value from lots of smaller capacity
disks to be somewhere between OLTP and decision support.
Although mileage may vary, the "small, fast disk" argument should apply equally well to packaged-application
"embedded" databases and databases servicing Web sites. Many OLTP applications are long in the tooth,
but are still business critical.
The Insert/Update/Delete Factor
In an OLTP application, the greater the frequency of insert/delete/update, the longer it takes to get to a particular
data item because with every insert/delete, the database is only partially reorganized for maximum performance.
Thus, over time, performance degrades drastically, not only because of locality of reference, but indexing
for more rapid access is less well balanced, as well. This performance erosion can be true both for small and large
disks — in some cases, the larger the size of the database, the greater the performance degradation
(linked lists become longer and longer). Online reorganization is now available widely, but many times organizations
still do not use this feature.
In the memory hierarchy, a certain ratio between the amount of main memory and disk must be maintained. If disk
performance becomes slower, adding more disks results in more costly main memory than is necessary.
IT planners should ask themselves, "Do we need more capacity or more I/O operations per
second?" If they need the latter, perhaps they should choose a SAN architecture that employs many small, fast
disks and has either a very large or especially well managed cache. Perhaps, they should use solid-state SAN storage.
Dan Tanner is director of storage research for Aberdeen Group’s Storage and Storage Management practice. Wayne Kernochan is managing vice president, Databases, Development Environments, and Software Infrastructure. Both analysts are located in Aberdeen’s Boston office. For more information, go to www.Aberdeen.com.