Sunday, May 5, 2013

Cloud Storage Diversity and vendor lock in


There are several risk involved when a large organization relies solely on a single cloud provider.
A solution to guarding against vendor lock is to replicate the data into multiple cloud service providers. But this straight forward replication is costly and poses technical challenges at the same time. The overhead to maintain data consistency could drastically affect the performance of the virtual storage system consisting of multiple full replicas of the organization’s data spread over multiple vendors.
Another solution is based on extension of design principle of a RAID-5 system used for reliable data storage. A RAID 5 system uses block level striping with distributed parity over a disk array. The disk controller distributes the sequential block of data to the physical disk and computes a parity block by bitwise XORing the data block. The parity block is written on a different disk for each file when all parity is written to a dedicated disk as in RAID 4. This technique allows us to recover data after a single disk is lost.



    The above diagram show a RAID 5 controller. If disk 2 is lost file 3 still has all its blocks and the missing blocks can be for by applying XOR operation (eg: a2=a1XORapXORa3) 

The same process can be applied to detect and correct error in a single block. We can replicate this system in cloud, data may be stripped across four clouds, the proxy cloud provides transparent to data. Proxy carries out function of the RAID controller. The RADI controller allows multiple accesses to data. For example block a1, a2, a3 can be read and written concurrently.

The above diagram shows the concept of RAID 5 applied in Cloud to avoid vendor lock in

Proxy carries out function of RADI controller as well as authentication and other security related functions. The proxy ensures before and after atomicity as well as all or nothing atomicity for data access. The proxy buffers the data, possible converts the data manipulation command, optimizes the data access.  For example aggregates multiple write operation, convert data to formats specific to each cloud and so on. This Model is used by Amazon. Performance penalty due to proxy overhead is minor and cost increase is also minor.


1 comment:

  1. Cloud technology is the leading technology in the current IT, since every domain has numerous data available to store and manage. Your blog is unique and gave useful information about the cloud technology. Thanks for sharing the excellent info. Keep updating
    Cloud Computing Certification in Chennai
    Cloud Certification in Chennai

    ReplyDelete