Cloud based De-duplication Directory Service: A Solution for Data Explosion and Data Reduction
Data explosion and exponential data growth is a well known problem that all enterprise as well as SMBs are facing and all are looking for a different solutions for data reduction and cost. With the studies, it is found that data duplication is one area that is contributing to data explosion. De-duplication is a leading technology that can solve this problem. With de-duplication, it maintains a unique copy of the data and keeps a reference in the form of data signatures for other duplicated copies. De-duplication and other forms of data optimization provide an ROI by reducing the amount of data required by all forms of storage infrastructure but it will not be effective if the scope is limited. A Real ROI can be achieved only if the de-duplication can be done at a global level. But global de-duplication poses a new set of challenges where it is a daunting task to maintain the data within the de-duplication servers. The problem can be solved by using a de-duplication directory service. With the popularity and adoption of cloud, cloud based global de-duplication directory service could be a powerful solution. Directory service can also maintain the location intelligence of the data nodes. This can help in complying with the compliance regulatory laws where data can be fetched from the local region as well as help in reducing network traffic. This can also help in controlling data storage requirements. There could be many other applications that can take advantage of de-duplication directory service like controlling data storage growth by replacing old data block with block signature references or adding more security by having just data signature based data storage. De-duplication directory service relies on many data nodes across the globe that can host the data and provide it based on the request. A directory service can maintain the block signatures and references to all data nodes. At the time of data access request, data source hosts list can be provided by this directory service for the requested signatures so that data can be fetched directly from those hosts. Or as another approach, directory service can fetch the data and provide it to the requested application. This paper presents de-duplication directory service, various applications based on this and the possible solution approach.
- by Vishal Bajpai
Principal Software Engineer of Symanmtec Software