[CloudRAID] 2. Basics (Continuation)

This post is a continuation of the blog series about the student research paper CloudRAID.

2. Basics

2.3. Background on RAID Technology

In order to provide data-safety on a common server a RAID can be used. First introduced in 1988 under the title Redundant Arrays of Inexpensive Disks [DK88], the usage of hard drives in an array collection is state of the art nowadays. The paper by Petterson, Gibson and Katz introduces the RAID levels 1 to 5 as follows.


Source: [Cbu06a]

RAID Level 1 provides a high data-safety with a complete fall-back to a secondary device. This “mirroring” has a space efficiency of 50% of the total disk capacity [DK88] (ch. 7, p. 112) for two disks. In general the space efficiency is at 1/n. The fault tolerance is n-1 since all disks contain the same data and all but the last can fail without any data-loss.


Source: [kna09]

RAID Level 2 defines a bit-level striping with an Error Correction Code (ECC) for recovery [DK88] (ch. 8, p. 112) that uses the Hamming Code [Wik12a] which is stored on multiple check disks. Nowadays RAID systems of level 2 are not used any more. They are too prohibitive and do not provide more fault tolerance than RAID level 3, 4 or even 5. RAID level 2 can recover from one drive failure and has a space efficiency of [Wik12b]:


Source: [Cbu06b]

RAID Level 3 provides byte-level striping with a single parity per byte [DK88] (ch. 9, pp. 112f). Taking at least three disks, two for content and one dedicated for the parity, RAID level 3 provides a space efficiency of 1 - 1/n and a fault tolerance of one broken disk [Wik12b]. These statistics for minimum number of disks, space efficiency and fault tolerance are the same for RAID level 4 and 5.


Source: [Cbu06c]

RAID Level 4, similar to level 2 and level 3, uses striping to spread the data over multiple disks. But RAID level 4 splits the data block-wise and not bit- or byte-wise [DK88] (ch. 10, pp. 113f). This improves Input / Output (I/O) performance, but can result in a bottleneck, since the parity is stored on a single device.


Source: [Cbu06d]

RAID Level 5 introduces a distribution of the check disk in order to resolve the bottleneck that exists in RAID level 4. This does not change any minimum requirements or fault tolerance, but increases the I/O [DK88] (ch. 11, p. 114), [Wik12b] and therefore is one of the most used RAID levels nowadays.

2.4. Encryption Standards and Hash Algorithms

In modern computer science cryptography is an important topic while planning and designing software. It is mostly used when data is transferred over an insecure channel or is stored at a place that cannot be guaranteed to protect sensitive or confidential data. Encryption algorithms like the Advanced Encryption Standard (AES) and RC4 (also known as ARC4) are widely used and provide strong encryption. The former algorithm is a block cipher which means that a certain number of bytes is encrypted simultaneously with the given key. In contrast, the RC4 algorithm is a stream cipher, meaning that each byte is encrypted on its own with respect to the previously encrypted bytes and the secure key.

Both named algorithms use the same key for encryption and decryption and therefore belong to the group of symmetric encryption algorithms. The RSA algorithm (see U.S. Patent 4,405,829) on the other hand belongs to the group of asymmetric encryption algorithms or public-key algorithms because it uses a private key for decryption and a public key for encryption. One cannot construct one of these keys form the other without knowing an additional private information.

Even if all stated encryption algorithms ensure that the data that has been encrypted is secure, they cannot provide data integrity and consistency. If an encrypted message or file is changed, the decryption will return data that differs from the original data. To eliminate this issue, so called hash algorithm are used. Given some input data, a hash function will return a hash sum (also referred as checksum or only hash) to this data. Since hash functions are one-way functions, one cannot get the original data from the hash sum. A strong hash function will always return a distinct hash sum for a specific input and does not (or at least tries to minimize) the likelihood of hash collisions. Hash collisions occur when a hash function returns the same hash sum for two different input values. This can easily be shown by the following formula, where x and y define the input values and H defines the hash function:

Based on these requirements, the NIST and the German “Bundesnetzagentur” recommend and claim the use of algorithms of the Secure Hash Algorithm (SHA)-2 family [Eck12] with 224, 256, 384 or 512 bits rather than MD5 (128 bits) or SHA-1 (160 bits)


[Cbu06a]Cburnett. RAID 1. Wikimedia Commons, GNU Free Documentation License, December 31, 2006. http://en.wikipedia.org/wiki/File:RAID_1.svg
[Cbu06b]Cburnett. RAID 3. Wikimedia Commons, GNU Free Documentation License, December 31, 2006. http://en.wikipedia.org/wiki/File:RAID_3.svg
[Cbu06c]Cburnett. RAID 4. Wikimedia Commons, GNU Free Documentation License, December 31, 2006. http://en.wikipedia.org/wiki/File:RAID_4.svg
[Cbu06d]Cburnett. RAID 5. Wikimedia Commons, GNU Free Documentation License, December 31, 2006. http://en.wikipedia.org/wiki/File:RAID_5.svg
[DK88](1, 2, 3, 4, 5, 6) Garth Gibson David A. Patterson and Randy H. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). Technical report, University of California Berkeley, 1988.
[Eck12]Claudia Eckert. IT-Sicherheit – Konzepte – Verfahren – Protokolle. Oldenbourg Verlag München, 7 edition, 2012.
[kna09]knakts. RAID 2. Wikimedia Commons, GNU Free Documentation License, January 18, 2009. http://en.wikipedia.org/wiki/File:RAID2_arch.svg
[Wik12a]Wikipedia. Hamming code — Wikipedia, The Free Encyclopedia, January 22, 2012. http://en.wikipedia.org/w/index.php?title=Hamming_code&oldid=472688059
[Wik12b](1, 2, 3) Wikipedia. RAID — Wikipedia, The Free Encyclopedia, January 25, 2012 http://en.wikipedia.org/w/index.php?title=RAID&oldid=473130999