Making Data Safe for the Cloud (and not the Reverse)
While Cloud Computing has been touted as a way for enterprises to reduce costs, increase business agility, and focus on IT projects with high ROI, security concerns are a large barrier toward adoption. In particular, IT shops are nervous about placing sensitive and valuable data in infrastructure that they do not control. In this presentation, we describe an approach and architecture that allows enterprises to feel confident about putting their data into the cloud. Instead of insisting that Cloud Infrastructures be totally secure, our approach instead assumes that cloud infrastructure is insecure and that data will be exposed! We do not care if data has become exposed because our approach uses anonymization techniques to render that data worthless to others while still being able to process it in a useful way. There are a number of approaches to obscuring/anonymization data. You can remove data values, you can add or subtract offsets to numerical values, and you can rename data values to something else. This presentation talks about these techniques and also discusses what software is available to do this anonymization. Anonymization is not as simple, though, as changing values or adding offsets to numbers. With side knowledge, information can be correlated to anonymized data to be able to obtain the real values corresponding to those data points. We discuss the theory of anonymization and how you can ensure that anonymized data sets are protected from various correlation attacks. This all might sound wonderful in theory, but to make sure it was practical, we designed and implemented a Proof of Concept to see if we could really use anonymized data in a cloud to learn useful information. We generated web performance and web access log information for an application developed on PlanetLab, an academic cloud. We anonymized sensitive information, and then sent the performance and access log information to a SaaS log management vendor. We found that we could glean useful performance information and security information from the anonymized data that we stored and analyzed in the Cloud. In the presentation, we discuss both the proof of concept implementation and how our approach was validated.
- by Jeff Sedayao
Enterprise Architect of Intel Corporation
Author`s Bio:
Jeff Sedayao is a enterprise architect in Intel's IT Research Group. He focuses on distributed systems, cloud computing and security in particular. Jeff has participated in IETF working groups, published papers on policy, network measurement, network and system administration, and authored the O'Reilly and Associates book, Cisco IOS Access Lists.