THE way the Internet has penetrated our lives has led to unimaginable amounts of electronic data being created, stored and exchanged every day. Needless to say, there is extreme paranoia that this data may get lost or damaged. And yet, this happens on a daily basis, though most of the data is retrieved or ‘repaired’ through a variety of measures.
A common approach adopted by many firms to address the issue is to store a copy of every bit of data on three different hard disks. This works effectively in most cases but with a mind-boggling amount of information being generated every day, the physical infrastructure required to store this data makes triplication an unviable option.
Error-correcting codes, such as the Reed-Solomon code developed over 50 years ago, provide effective means of encoding data that permits recovery in the event of partial loss. The key to any such error-correcting code is to split data file into several fragments — different methods differ in the numbers of fragments — and store them across several disks. It also involves creating additional ‘fragments’ and storing in them, further ‘information’ from the original file. Taken together, the resultant set of fragments can be used to recreate the entire data in case a few fragments are lost.
Traditionally, a file would be split into six fragments and triplicated to avoid data loss, which would mean storing 18 fragments. An example of the Reed-Solomon (RS) error-correcting code would keep the six fragments, create three new fragments and store just nine fragments.
In recent times, driven by the sheer size of the data involved, two additional concerns have come to the fore: the number of remaining undamaged disks that need to be contacted to recreate a lost fragment and the amount of data that has to be downloaded from these disks.
These have given rise to two new classes of error-correcting codes that improve upon the RS code in these respects and are termed as regenerating and locally repairable codes. An example of the regenerating code created in our lab would take a data file, split it into nine fragments and create 11 additional fragments for a total of 20 fragments. These 20 fragments would then be stored across five disks, four fragments to a disk, so that a failed disk could be recreated by downloading just four fragments.
Locally-recoverable code would split a data file into 12 fragments, create four additional fragments in such a way that an original data fragment can be repaired by talking to a specific set of just six other fragments.
We at the Codes and Signal Design Laboratory have significantly contributed to the development of both classes of codes. We then asked ourselves, is it possible to design a single code that, in comparison with an RS code, can both reduce the amount of data downloaded for disk repair as well as the number of disks contacted? This led us to devise a third class of codes that do precisely this.
By: P Vijay Kumar & Birenjith Sasidharan
Codes and Signal Design Laboratory, Indian Institute of Science, Bangalore
For your research to be considered for this column, please write to Senior Editor Amitabh Sinha at amitabh.sinha