Linux Server How To

How To Setup, Configure, Manage and Secure a Linux Server


Linux Server How To - Linux Server RAID


Linux Server RAID

So what is RAID and what has it done for me lately? RAID stands for Redundant Array of Inexpensive Disks and was first proposed in an article published by the University of California Berkeley in 1987. The article proposed that several smaller disks could be combined into an array of disk drives that could provide redundancy as well as increased performance over a single large drive. Particularly in the instance of a production linux server RAID can provide a means of protecting critical data by mirroring it accross several disks or by increasing overall disk performance by dividing read/write operations accross several hard drives.

There are several different types of RAID array and you should consider your requirements carefully before selecting a specific one for your Linux server. RAID can also be implemented either by hardware, specifically designed disk controllers that perform the task of managing, reading and writing to your RAID array independantly of the operating system or by software, where the RAID array is managed by the Linux operating system itself using specific kernel modules and some additional tools. Whether you choose to implement hardware RAID or software RAID is a matter of personal choice. I have used both and have found hardware RAID to be more agreeable, but this is of course my personal opinion. Software RAID is often cheaper to implement than hardware RAID, more so when using SCSI disks though many modern SATA motherboards now have built in RAID controllers that have made implementing Linux server RAID even easier than ever before.

As previously mentioned there are several different levels of RAID and each one offers slightly different levels of redundancy (multiple copies of the same data) and performance. Lets have a look at the most commonly used RAID configurations, what they offer and what their drawbacks might be.

Linux Server RAID - RAID Levels

Linear Mode

Linear mode is best defined as when two or more disks are combined into one logical drive. The disks are joined together in the eyes of the operating system to form one disk and they are written to in a linear fashion, disk 1 will fill up first then disk 2 and so on. There is no redundancy in linear mode, if a disk fails then the data on that disk will be lost. You may be able to recover data from the other disks in the array however there is no guarantee that all the data written to a linear mode array will not be lost, it will depend wholly and solely on how the data was written. Read/write performance will not increase markedly for single read/writes, however you may see a performance boost if more than one user is accessing the array and accessing files on different disks.

RAID-0

RAID-0 is often referred to as striping because of the way that data is written to the disk, if you were writing an 8k file for example and you had two disks in the array 4k would be written to one disk and 4k would be written to the other. When adopted on your Linux server RAID 0 can provide a substantial increase in disk performance as the read/write operations are divided between the two disks, which the operating system sees as one drive. RAID-0 does not provide any redundancy, if one disk fails then your data is lost as half of your file is written on one disk and the other half on the other.

RAID-1

RAID-1 is the most common form of Linux server RAID. RAID-1 maintains an exact mirror of the information contained on one disk on the other disks in the array. Although it can be used with many disks it is common to only use RAID-1 with two disks only, working on the premise that only one disk will fail at a time. The obvious advantage of RAID-1 is its ease of reconstruction of the array in the event of a single disk failure. Write performance can be worse than a single device with RAID-1 when using software RAID as the additional copies of the same data will slow your bus down. With a hardware RAID arrangement the extra copies are generated by the RAID controller itself thus sidestepping this bottleneck. Otherwise RAID-1 performance is virtually identical to a single disks performance. Those who wish to safeguard their data against accidental loss will probably consider RAID-1 to be the prime candidate.

RAID-4

RAID-4 is not widely utilized as it has a few minor drawbacks when it comes to performance however it does offer redundancy and can survive one disk in the array failing. RAID-4 can be used on 3 or more disks, one of the disks is used to store parity information and the other disks have the data written to them in a striped fashion similar to RAID-0. The main drawback to RAID-4 is the way parity information in stored on one disk and this information is updated every time any of the disks that store the data are written to. This causes the parity disk to become the bottleneck and a very fast disk is required for RAID-4 to perform at its best.

RAID-5

RAID-5 can be used on 3 or more disks and is useful for those who wish to combine a large number of disks and still retain redundancy. A RAID-5 array can survive the loss of one disk and can be compared to RAID-4 except that instead of storing parity information one disk it is spread out through all of the disks in the array, thus avoiding the performance issues this creates. The write speed of the array is higher than that of a single disk though still not at the same level as RAID-0 due to the necessity of writing parity information. Reads perform very comparably to RAID-0.