Understanding RAID

A company’s greatest asset, besides its employees, is its data. Millions and millions of dollars are spent to backup data, replicate data, etc. all in an attempt to protect against data loss. Backups and replication don’t actually protect against losing data, they are ways to recover from a data loss. The only true defense to protect from data loss is to implement a disk solution based on RAID technology.

RAID (redundant array of independent disks) was first defined in a paper published by U.C. Berkley in 1988. The paper defined RAID levels 1, 2, 3, 4 and 5. Today, even more levels have been defined.

RAID Level Zero – RAID Level Zero is not one of the originally defined RAID levels and there is some debate if it should even be considered a RAID level since the disks are not redundant.

RAID Level Zero, a.k.a. disk striping, is where a stripe of data is written equally across a group of disks. If one of these disks should fail, all of the data on the group of disks is lost.

While not a safe way to protect data, it does deliver higher performance compared to an equal number of independent disks. RAID Zero is rarely used alone but is frequently used with other RAID levels to provide faster performance.

RAID Level 1 – This RAID level is where the same data is written (or mirrored) to two disks. If a disk fails, data is read off the mirrored disk. When the failed disk is replaced, the data on the surviving disk is used to recreate the mirrored pair.

All of this happens with no loss of data for the host applications. RAID Level 1 is one of the most commonly used RAID levels and performs very well for reads and writes.

RAID Level 2 – RAID level 2 is not used by any commercial RAID systems on the market and will not be discussed.

RAID Level 3 – RAID Level 3 uses an error correcting code called parity to protect against the loss of a single disk. Data is written in parallel in bytes to the data disks (at least two) while parity is written to a dedicated disk.

The disk spindles are synchronized (each byte of a stripe of data, and that data’s parity, occupies the same area on each disk) which increases throughput by minimizing disk head movement.

When a data disk fails, the data from the dedicated parity disk is used to recreate the data to serve host requests and to rebuild the failed drive when replaced. If the parity disk should fail, the data disks are used to recreate parity and written to the replaced parity disk.

RAID Level 3 is best for large sequential data access (i.e., video streaming). Performance for small, random access of the data is slow since every I/O requires activity on every disk. RAID 3 is rarely used today since better performance and identical protection can be achieved with RAID level 5.

RAID Level 4 – RAID Level 4 is similar to RAID level 3 (striped parity with a dedicated parity disk) except the data is written in blocks, not bytes.

Writing blocks of data increases random access performance, since an I/O may only require access to one disk instead of every disk in the group like with RAID 3. But the dedicated parity disk can be a bottleneck for writes. Recovery for a lost drive works the same as RAID level 3. RAID level 4 is not widely adopted.

RAID Level 5 – RAID Level 5, like RAID levels 3 and 4, uses parity to protect the data from a single disk failure. Unlike levels 3 and 4, the parity is rotated or distributed across all of the drives in the volume.

Read performance is substantially better than for a single disk because there is independent access to each disk. As with levels 3 and 4, write performance can be impacted due to the complexity of parity processing but with parity being striped across all the drives, there is no single disk bottleneck with RAID 5.

RAID Level 5 performance is scalable, as more disks provide more independent access. In the case of a disk failure, data from the lost drive is computed from parity (using an arithmetic function (XOR)) stored on the other drives in the disk group.