Data Security and Privacy Keke Chen

Slides:



Advertisements
Similar presentations
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Advertisements

CSC 774 Advanced Network Security
21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
Data Mining Classification: Alternative Techniques
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
SafeQ: Secure and Efficient Query Processing in Sensor Networks Fei Chen and Alex X. Liu Department of Computer Science and Engineering Michigan State.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
Preserving Privacy in Clickstreams Isabelle Stanton.
Practical Techniques for Searches on Encrypted Data Yongdae Kim Written by Song, Wagner, Perrig.
Protecting data privacy and integrity in clouds By Jyh-haw Yeh Computer Science Boise state University.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
CS573 Data Privacy and Security Statistical Databases
SEC835 Practical aspects of security implementation Part 1.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Secure Data Outsourcing. Outline  Motivation  Background  Research issues  Summary.
Chapter No 4 Query optimization and Data Integrity & Security.
Other Perturbation Techniques. Outline  Randomized Responses  Sketch  Project ideas.
Additive Data Perturbation: the Basic Problem and Techniques.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Wei-Shinn Ku Slide 1 Auburn University Computer Science and Software Engineering Query Integrity Assurance of Location-based Services Accessing Outsourced.
Data Integrity Proofs in Cloud Storage Author: Sravan Kumar R and Ashutosh Saxena. Source: The Third International Conference on Communication Systems.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Deck 10 Accounting Information Systems Romney and Steinbart Linda Batch March 2012.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Privacy preserving data mining – multiplicative perturbation techniques Li Xiong CS573 Data Privacy and Anonymity.
Presented By Amarjit Datta
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
2011 IEEE TrustCom-11 Sushmita Ruj Amiya Nayak and Ivan Stojmenovic Regular Seminar Tae Hoon Kim.
Secure Data Outsourcing
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
© 2017 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
CMSC 818J: Privacy enhancing technologies Lecture 2.
Searchable Encryption in Cloud
Outline The basic authentication problem
Professor Tzong-Chen Wu
Data Mining: Concepts and Techniques
University of Texas at El Paso
Efficient Multi-User Indexing for Secure Keyword Search
Reporter:Chien-Wen Huang
Privacy Preserving Similarity Evaluation of Time Series Data
based on slides by Debra Cook
563.10: Bloom Cookies Web Search Personalization without User Tracking
Location Privacy.
Hash Table.
EE 122: Peer-to-Peer (P2P) Networks
Verifiable Oblivious Storage
Differential Privacy in Practice
COSC 4335: Other Classification Techniques
Unit# 5: Internet and Worldwide Web
Building an Encrypted and Searchable Audit Log
Multiplicative Data Perturbations (1)
Privacy preserving cloud computing
Presented by : SaiVenkatanikhil Nimmagadda
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Kriti shreshtha.
A task of induction to find patterns
Path Oram An Extremely Simple Oblivious RAM Protocol
Multiplicative data perturbation (2)
Presentation transcript:

Data Security and Privacy Keke Chen Cloud Computing Data Security and Privacy Keke Chen

Outline Data confidentiality Data privacy access pattern privacy Integrity of cloud processing

Private data in cloud computing To use cloud services, the users may have to outsource data Except for public services, such as maps, search engine Data can be relational databases, multidimensional vector data, graph data, text data, images, video…

Problem with outsourced data Data confidentiality Threat model External attackers encryption is sufficient Internal attackers Cannot work on data encrypted with existing techniques New methods are needed

Problem Access pattern queries, query results, and how the query is processed the user’s query might be private The returned data, the access path may reveal data distributions – used to compromise data confidentiality Threat model Internal attackers

Problem Query correctness Is the returned result the correct result? Example: Outsourced database services content distribution network Threat models Malicious service providers Honest but lazy service providers

Normal Assumption on the attackers Honest but curious service providers Honest – do what you want it to do Curious – want to peek at your information Applies to data confidentiality and access pattern Query correctness assumes dishonest providers To reduce the service cost Intentionally change content

Data confidentiality The ultimate solution: fully homomorphic encryption Computation on encrypted data directly, without the need to decrypt the data Example E(X) ◊ E(Y) = E(X+Y) : Pailier E(X) ◊ E(Y) = E(X*Y) : ElGamal, RSA Both ‘+’ and ‘*’: Gentry’s method (Gentry 2009) Problem: efficiency google search with encrypted keywords and FHE would increase the cost by about a trillion times

Data confidentiality keyword search for encrypted documents (Song 2000) Basic idea Linear scan through the text and try to verify whether each encrypted word is matched Only the searched word can be correctly decrypted. Other words will generate random code Problem Linear scan, not efficient for large data Privacy of search is not preserved – the service provider knows Special documents might be identified

Secure keyword search Song 2000 (paper 186) Seed is random, different for each Wi Key idea: Li and Ri are self- verifiable Advantage of XOR

How to set K?

Data confidentiality Public-key keyword search and range search (Shi 2007) Scenario: store encrypted data in public storage. Only authorized users can access the authorized part of the data Example Audit log: authorized users access certain days of the log (range query) Problem: key management The range of a searchable dimension is discretized Each value needs a pair of public-private keys Still linear scan

Data confidentiality For many data intensive applications Problem Linear scan is not sufficient Need indices to make search faster, for example OLAP k nearest search in geological/multimedia databases Problem Indexable encrypted data

Data confidentiality Indexable encryption on relational data Order preserving encryption Map a column of data to arbitrary distribution, while preserving the value order Able to build index on the column (order is preserved) Easy to translate queries to the encrypted data Crypto-index Map a column of data to buckets Give random ID to the buckets Values are represented with the bucket IDs Range search becomes a search on a list of bucket IDs

Problems with OPE and crypto-index Weak threat model: assume attackers have no knowledge about data distribution Distributional attack Assume the attacker knows the column distribution: easily map OPE encrypted column back to the original column Crypto-index: Attacking queried ranges  derive the mapping between bucket-id and real value range Return junk records (crypto-index) to hide the column distribution, have to make each bucket equi-height (i.e., same number records in each bucket)

Weaker data confidentiality Privacy protection Do not hide the complete information Attackers may estimate but cannot get exact values Data utility is easier to preserve Techniques Data perturbation Data anonymization

Data perturbation Definition Techniques 1. randomly change the original data 2. the attacker cannot effectively recover the original data 3. the desired properties are preserved Techniques Single dimension: noise addition Multidimensional Geometric perturbation Random projection

Noise addition Y = X+ R Applications in data mining Attacks X: original data column, R: random noise (distribution published), Y: published data Applications in data mining Reconstructing column distribution Rakesh Agrawal SIGMOD 2000 Applied to privacy-preserving decision tree, naïve bayes classifier Attacks Spectral filtering (Kargupta ICDM 2004) PCA reconstruction (Huang SIGMOD2005)

Geometric data perturbation Y=RX+T+D R: secret rotation matrix (preserve Euclidean distances) T: secret random translation matrix, D: secret random noise matrix Distances are approximately preserved (D) Resilient to most attacks to rotation perturbation Applications Outsourced privacy preserving data mining, applicable for many classification and clustering algorithms Attacks Population based attacks (when covariance matrix is revealed)

Random Projection Y=AX+D Applications Attacks A: random projection, e.g., entries from N(0,1) Distances are approximately preserved Applications Many classification and clustering algorithms Worse accuracy than geometric perturbation Good for sparse high-dimensional data (text data), i.e., sketch methods (A is randomly generated for EACH record) Attacks Possibly more resilient than other two perturbation methods

Random Space Perturbation y = A( E(x), 1, v)T x is a k-D vector, E(x): order preserving encryption, v: random value A : (k+2)x(k+2) random invertible matrix Properties Preserve half space queries, e.g., f(x)<0 Does not preserve distance, orders, (resilient to the known attacks to OPE, GDP, etc.) Applications: range query, data mining (linear classifier + boosting)

Data anonymization Quasi Identifier anonymization problem Governor of MA uniquely identified using ZipCode, Birth Date, and Sex. Name linked to Diagnosis 87 % of US population Name SSN Visit Date Diagnosis Procedure Medication Total Charge Name Address Date Registered Party affiliation Date last voted Zip Birth date Sex Zip Birth date Sex Quasi Identifier Voter List Medical Data Sweeney, IJUFKS 2002 23

Basic problems in anonymization Linkage between Records Attributes Tables Attacker’s prior knowledge and published data Current research address possible data linkage attacks Propose techniques to prevent them A nice survey paper http://www.cs.sfu.ca/~wangk/pub/FWCY10csur.pdf

Differential privacy Statistical database Perturb the query results Allow users to submit aggregate queries But need to protect from query inference attacks Perturb the query results Cynthia Dwork (http://research.microsoft.com/en-us/people/dwork/) proposed differential privacy A method for designing the random noise for user-specific “privacy budget” and specific aggregate functions.

Privacy of access pattern Do not want the attacker know what items the user is accessing Known access pattern can be used to derive a lot of private information Example: AOL search log  identify real person  what sensitive keywords he/she used naive solution: download the entire database 

Privacy of access pattern Private information retrieval (PIR) Basic idea: hide the real access pattern among a lot of accesses Multi-server information theoretic PIR (Chor 98) Computational PIR – same assumption as computationally secure encryption (Kushilevitz 97) Problem: High communication cost: need to transfer a lot of contents Most efficient solution still needs O(n1/2) communication cost

Query correctness check whether server faithfully returns the results Challenge token (Sion 2005) put a challenge query (that the user knows the answer) among a batch of b queries This solution only provides a probabilistic guarantee on query correctness

Use Merkle hash tree (Li 2006) Integrate Merkle hash tree with B-tree index For each search the service provider need to provide a hash value to prove the right access path was used Client can verify the hash with its own merkle hash tree

Proof of retrievability Problem Archived or backup data in the cloud, sporadically accessed Make sure the data is still there (not discarded by cloud provider for saving money) Make sure the data is not modified Approaches Similar to challenge based methods