Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009
Introduction Data collection and publishing is a core operation in many distributed systems Outsourced database systems Organizations outsource their databases to service providers Organizations can focus on their core tasks without considering the management of their database Two-tiered wireless sensor networks Storage nodes gather data from nearby sensors and process queries from the sink Power and storage saving for sensors as well as the efficiency of query processing
Outsourced database systems Outsourcing databases offers many advantages Significantly reduce the management cost of organizations Service Providers have higher bandwidths and lower latencies Having multiple service providers helps to avoid the organizations being a single point of failure Database Outsourced Database Organizations Service Provider Customer Query Result Query Result Query Result Database An outsourced database system
Two-tired sensor networks A two-tired sensor network Benefits Power saving for sensors Memory saving for sensors Query processing becomes more efficient Data Storage Node Sensor Query Result Sensor Sink Sensor
Comparison between two distributed systems Similarity Three common parties Data owners, i.e., organizations and sensors Data publishers, i.e., service providers and storage nodes Users, i.e., customers and the sink The two distributed systems can be modeled as There may be multiple publishers For outsourced database systems, there may be multiple users For two-tiered sensor networks, there are multiple data owners Difference For outsourced database systems, users may not be fully trusted by data owners For two-tiered sensor networks, the sink is fully trusted by sensors Data Outsourced Data Data Owner Data Publisher User Query Result
Security Challenges Due to the important role of data publishers, there are two security challenges Preserve privacy of the data stored in a data publisher Data Outsourced Data Data Owner Data Publisher User Query Result Untrusted Data Encrypted How can a data publisher search the query result over the encrypted data?
Security Challenges Preserve integrity of a query result from a data publisher Data Outsourced Data Data Owner Data Publisher User Query Result Database Untrusted manipulate results (1)Forge data (2)Return portion of the result How can we prevent the misbehavior of data publishers?
Problem Statement Design the storage scheme and query protocol in a privacy and integrity preserving manner Data and query privacy Publishers cannot figure out the original data Publishers cannot figure out queries Queries over data Data publishers can search query results over the encrypted data, e.g., range queries. Query result integrity Users can detect whether a query result contains forged data or misses some legitimate data. Efficiency e.g. communication and computation cost
The Proposed Approaches To preserve the privacy of the data, the data owner encrypts the data To enable the searching operation for data publishers, the data owner encodes the private data in a format which supports the searching operation To preserve the integrity of query results, the data owner computes verification objects (VOs) for all possible queries Let {t 1, t 2, …, t m } denote the data of a data owner, the basic idea is illustrated Data Publisher User search(query ) encrypt(t i1 ), …, encrypt(t ig ) Data Owner {encrypt(t 1 ), …, encrypt(t m )} {t 1, …, t m } search(t 1, …, t m ) query VOs(t 1, …, t m ) VO(t i1, …, t ig )
Previous Work Outsourced database systems Preserving Privacy Bucket Partition [Hacigumus et al., SIGMOD 2002] A Public-key system [Boneh and Waters, TCC 2007] Preserving Integrity Merkle hash trees [Devanbu et al., Journal of Computer Security 2003] Signature aggregation and chaining techniques [Narasimha and Tsudik, DASFAA 2006] Spatial data structures [Chen et al., ESORICS 2008] Two-tiered sensor networks S&L scheme [Infocom 2008] The optimized version of S&L scheme [Infocom 2009, Mobihoc 2009]
Privacy in outsourced database systems Bucket Partition [Hacigumus et al., SIGMOD 2002] Data Publisher Result: Data owner (Key K i ) User (Key K i ) {2,5,9,15,20,23,34,40} {2,5,9} Ki {15,20,23} Ki {34,40} Ki Bucket ids : [35,45] 3, 4 {34,40} Ki Return more data Outsourced Database
Privacy in outsourced database systems Bucket Partition Drawbacks A query result may have false positive errors It allows data publishers to obtain a reasonable estimation on the actual value of data items and queries
Privacy in outsourced database systems A Public key system [Boneh and Waters, TCC 2007] Hidden Vector Encryption Using bilinear groups to produce tokens for searching conjunctive, subset, and range queries on an encrypted database. Drawback Computationally expensive Public key cryptography Require a database owner to perform O(zD) encryption for each tuple, where z is the number of dimensions and D is the domain size
Integrity in outsourced database systems Merkle hash trees [Devanbu et al., Journal of Computer Security 2003] H 18 H 1 =h((d 1 ) k i ) H 12 =h(H 1 |H 2 ) H 14 =h(H 12 |H 34 ) H 18 =h (H 14 |H 58 ) H4H4 H3H3 H 34 H 14 H2H2 H1H1 H 12 H8H8 H7H7 H 78 H 58 H6H6 H5H5 H 56 (d1)ki(d1)ki (d2)ki(d2)ki (d3)ki(d3)ki (d4)ki(d4)ki (d5)ki(d5)ki (d6)ki(d6)ki (d7)ki(d7)ki (d8)ki(d8)ki
Integrity in outsourced database systems Merkle hash trees H 18 Query [10, 30] Query result Verification object H 14 H 58 H 34 H 12 H 56 H1H1 H 78 H2H2 H3H3 H4H4 H5H5 H6H6 H7H7 H8H8 (2) k i (5) k i (9) k i (15) k i (20) k i (23) k i (34) k i (40) k i
Integrity in outsourced database systems Drawbacks A query result has false positive errors It is hard to extend Merkle hash trees to verify the integrity for multi- dimensional data H 18 Query [10, 14] Query result H 14 H 58 H 34 H 12 H 56 H1H1 H 78 H2H2 H3H3 H4H4 H5H5 H6H6 H7H7 H8H8 (2) k i (5) k i (9) k i (15) k i (20) k i (23) k i (34) k i (40) k i Verification object
Integrity in outsourced database systems Signature Aggregation and Chaining It aggregates multiple individual signatures into one unified signature Verifying the unified signature is equivalent to verifying all individual signatures It presents a signature chain that links a signature of a data item with the signatures of the data item’s neighbors Drawbacks A query result has false positive errors It is computationally expensive to verify the integrity of multi- dimensional data
Integrity in outsourced database systems Spatial Data Structures [Chen et al., ESORICS 2008]
Integrity in outsourced database systems Chen et al. proposed a Canonical Range Tree (CRT) to count the number of data items in access control areas and query spaces. Advantages No false positive errors. Do not need to provide the boundary data items. It can be used to perform access control Drawbacks Only can be applied for range queries, while SQL includes other types of queries
Privacy and Integrity in Two-tiered sensor networks S&L scheme [Infocom 2008] Two major drawbacks Fairly accurate estimating data items and quires Power and space consumption grows exponentionally with the number of dimensions. Data Storage Node Query Sensor S i (Key K i ) Sink (Key K i ) {1, 4, 5, 7, 9} {1,4} Ki {5} Ki h(i||4||t||K i ) {7, 9} Ki Bucket IDs: [9,10] 3, 4 h(i||4||t||K i ) 7 is out of the range Prove empty bucket {7, 9} Ki Result:
Privacy and Integrity in Two-tiered sensor networks Optimized version of S&L scheme For one-dimensional data [Infocom 2009] Embed relationships among data collected by each sensor Define a vector where each bit indicates whether the node has data in the corresponding bucket or not Storage Node Sensor 3 Sensor 2 Sensor 1 2,5,915,20,23 34,40 Bucket Vector V 1 : V1V1 V1V1 318 {3, 1110} Ki {18, 1110} Ki
Privacy and Integrity in Two-tiered sensor networks Storage Node Sensor 3 Sensor 2 Sensor 1 V1V1 V1V (12,5) ki (15,6) ki (23,4) ki (45,3) ki For Multi-dimensional data [Mobihoc 2009] These two schemes are less secure than S&L’s scheme They inherit the same weakness of allowing storage nodes to estimate the original data and queries The optimization technique allows a compromised sensor to easily compromise the integrity verification functionality of the network Send falsified bit maps to sensors and storage nodes V 1 =
Future Research Directions For outsourced database systems No complete solutions of preserving privacy and integrity for outsourced database systems Preserving privacy and integrity for multi-dimensional data is not well studied For two-tiered sensor networks Prevent a storage node from estimating data and queries Multi-dimensional data Efficiency
Questions Thank you!