This post is a continuation of the blog series about the student research paper CloudRAID.
After Florian’s publications about the architecture the concrete implementation design will follow.
The supplied RAID 5 implementation uses SHA-256 for integrity and validity checks. The hash implementation has been taken from the Apache Portable Runtime Project (APR) that has been published by the Apache Software Foundation under the terms of the Apache License version 2.0.
The native RAID 5 level implementation uses multiple pre-compiler flags and constants to control the split and merge processes:
#define RAID5BLOCKSIZE 1024
#define ENCRYPT_DATA 1
#define ENCRYPTION_SALT_BYTES 256
#define DEBUG 0
#define _NAME_ "CloudRAID-RAID5"
#define _VENDOR_ "CloudRAID Team"
#define _VERSION_ "1.0.0-beta.1"
Listing 1: Pre-compiler flags and constants
For internal storage purposes, the native RAID implementation needs two C structs that are shown in listing 2:
struct sha256_ctx {
uint32_t state[8];
uint32_t total[2];
size_t buflen;
uint32_t buffer[32];
};
typedef struct raid5md {
unsigned char version;
unsigned char hash_dev0[65];
unsigned char hash_dev1[65];
unsigned char hash_dev2[65];
unsigned char hash_in[65];
unsigned char salt[ENCRYPTION_SALT_BYTES];
unsigned int missing;
} raid5md;
Listing 2: Structs used for internal storage within RAID level 5 imlementation
The The Java Native Interface (JNI) implements six functions whereof two are directly used for the RAID features and three are used to determine some version and vendor information about the library. The remaining, sixth, function is used to exchange information about the size of the struct raid5md from the library to CloudRAID. The function signatures are shown in listing 3 and will be explained here. All JNI functions take at least two parameters: JNIEnv *env and jclass cls. The former parameter refers to the Java environment that provides functions to for example convert and access strings. The latter is the instance of the current object the JNI function is working in but is not used within the CloudRAID implementation.
JNIEXPORT jint JNICALL Java_de_dhbw_1mannheim_cloudraid_core_impl_jni_RaidAccessInterface_mergeInterface(JNIEnv *env, jclass cls,
jstring _tempInputDirPath, jstring _hash, jstring _outputFilePath,jstring _key);
JNIEXPORT jstring JNICALL Java_de_dhbw_1mannheim_cloudraid_core_impl_jni_RaidAccessInterface_splitInterface(JNIEnv *env, jclass cls,
jstring _inputBasePath, jstring _inputFilePath, jstring _tempOutputDirPath, jstring _key);
JNIEXPORT jstring JNICALL Java_de_dhbw_1mannheim_cloudraid_core_impl_jni_RaidAccessInterface_getName(JNIEnv *env, jclass cls);
JNIEXPORT jstring JNICALL Java_de_dhbw_1mannheim_cloudraid_core_impl_jni_RaidAccessInterface_getVendor(JNIEnv *env, jclass cls);
JNIEXPORT jstring JNICALL Java_de_dhbw_1mannheim_cloudraid_core_impl_jni_RaidAccessInterface_getVersion(JNIEnv *env, jclass cls);
JNIEXPORT jint JNICALL Java_de_dhbw_1mannheim_cloudraid_core_impl_jni_RaidAccessInterface_getMetadataByteLength(JNIEnv *env, jclass cls);
Listing 3: JNI header functions signatures.
The main internal functions used by the primary two JNI functions above are explained in this paragraph. For both processes, split and merge, there are two functions each, split_file() and split_byte_block() (merge_file() and merge_byte_block() respectively) that actually handle the processes (see listing 4 at the end of this post). The JNI functions for split and merge must be seen as wrapper for the split_file() and merge_file() functions. These functions handle the complete processes show by the figures 1 and 2 (introduced and explained in a previous article of this article series). All FILE* parameters must be correctly opened for read or write access. On Unix systems this includes the binary b mode for fopen().
split_file() expects a file pointer (rb mode) to the input file as first parameter. The FILE *devices[] array must contain three elements to files opened in wb mode. These files are the device files. The meta parameter points to the meta data file as must have been opened in wb mode as well. const char *key is the secure key used for file encryption and will be passed to the hmac() function together with its length keylen.
This function reads up to 2 × RAID5BLOCKSIZE for each iteration shown in figure 1. These bytes are encrypted with the key generated by hmac() if ENCRYPT_DATA is not 0 and then passed to split_byte_block(). The call-by-reference “return” values are written to the device files. After each iteration the parity position is incremented. Starting with device 2 as parity for the first block, the parity position will be 0 for the second block and 1 for the third block. The forth block will restart with position 2.
The split_byte_block() function takes the bytes read and encrypted in split_file() as first parameter and the length of that buffer as second parameter. The function splits the input character array in three parts that will be returned via the unsigned char *out parameter. The output size for each device is returned via the fourth parameter: size_t out_len[]. The alignment for the output buffer is as follows:
Depending on the input length in_len the values of out_len[0] to out_len[2] may vary:
merge_file() is the inverse function to split_file(). It basically has the same function signature, but the first parameter specifies the output file unlike in split_file() and hence must be opened in wb mode. Thus the device files and meta data file, parameters two and three, are input files and therefore they must be opened in rb mode. The fourth parameter, const char *key is the secure key used for file decryption and will be passed to the hmac() function together with its length keylen in the same way as in split_file().
At first, the integrity of the device files is validated as shown in figure 2 by checking the hash sums for the device files. If more than two check sums are wrong the function terminates with an error. Otherwise the function will read up to RAID5BLOCKSIZE from each device file and passes those buffers with their respective lengths to merge_byte_block(). The call-by-reference “return” value from each block merge is passed to a SHA-256 context to validate the correctness of the output file and then decrypted.
The actual merge occurs in the merge_byte_block() function. Again the content for all three buffers is stored in a single character array as first parameter: const unsigned char *in. The second parameter, const size_t in_len[] represents the lengths for the three read buffers from the device files. Both parameters together align the input buffers in the same way as in split_byte_block():
The third parameter, parity_pos, specifies which device is the parity. This is necessary in combination with the fourth parameter, dead_device, to determine the merge strategy: concatenating the content of the buffers for device files 0 and 1 or using the parity in combination with either of first two device files. missing defines how many bytes are missing on the secondary device. Only the last block of each overall merge process may set this parameter to a value > 0. The penultimate parameter, *out stores the merged data with a total length of *out_len but at most 2 × RAID5BLOCKSIZE.
create_salt(): the complete security of the whole encryption process relies on the strength of the key given to split_file() and merge_file(). However, to protect the data against attacks using rainbow tables the encryption key is being salted. The salt of length ENCRYPTION_SALT_BYTES is generated by this function. On Microsoft Windows each byte of the salt will be set by a return value of rand() after initializing the pseudo random generator with srand(time(NULL)). On all other Operating Systems (OSs), that can be expect to be Unix, the functions reads ENCRYPTION_SALT_BYTES from /dev/urandom which can be seen to be random.
The salted key used during encryption and decryption is created by the function hmac(). This function is an implementation of the Federal Information Processing Standards Publication (FIPS PUB) 198 [NIS02] HMAC specification and takes five parameters: the first two parameters (*key and keylen) define the secure key given by the JNI function split_file() and merge_file(). Parameters three and four link to the *salt (optimally generated by the create_salt() function) and its length that is ENCRYPTION_SALT_BYTES by default. The last parameter, *hash, is a call-by-reference parameter and “returns” the HMAC for the given *key and *salt.
The general mode of operation can be expressed by the following two expressions:
The implementation of the HMAC [NIS02] standard has been validated by computing the digest values for the test cases 1, 2, 3, 4, 6 and 7 defined in RFC 4231 [Nys05]. Test case 5 has not been tested since the truncation is not needed in CloudRAID and was therefore not implemented.
int split_file(FILE *in, FILE *devices[], FILE *meta,
const char *key, const int keylen);
void split_byte_block(const unsigned char *in, const size_t in_len,
unsigned char *out, size_t out_len[]);
int merge_file(FILE *out, FILE *devices[], FILE *meta,
const char *key, const int keylen);
void merge_byte_block(const unsigned char *in, const size_t in_len[],
const unsigned int parity_pos, const unsigned int dead_device,
const unsigned int missing, unsigned char *out, size_t *out_len);
int create_salt(unsigned char *salt);
unsigned long hmac(const unsigned char *key,
const unsigned int keylen, const unsigned char *salt,
const unsigned int saltlen, unsigned char *hash);
Listing 4: Internal native header function signatures
[NIS02] | NIST Computer Security Division (CSD). The Keyed-Hash Message Authentication Code (HMAC). Federal Information Processing Standards Publication FIPS PUB 198, National Institute of Standards and Technology, March 6, 2002. http://csrc.nist.gov/publications/fips/fips198/fips-198a.pdf. |
[Nys05] | Magnus Nystrom. Identifiers and Test Vectors for HMAC-SHA-224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512, December 2005. http://tools.ietf.org/html/rfc4231 |