Secure Data Outsourcing. Outline  Motivation  Background  Research issues  Summary.

Slides:



Advertisements
Similar presentations
Cloud Computing Security Monir Azraoui, Kaoutar Elkhiyaoui, Refik Molva, Melek Ӧ nen, Pasquale Puzio December 18, 2013 – Sophia-Antipolis, France.
Advertisements

A Privacy Preserving Index for Range Queries
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Implementing Oblivious Transfer Using a Collection of Dense Trapdoor Permutations Iftach Haitner WEIZMANN INSTITUTE.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Introduction to Practical Cryptography Lecture 9 Searchable Encryption.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
CS7380: Privacy Aware Computing Oblivious RAM 1. Motivation  Starting from software protection Prevent from software piracy A valid method is using hardware.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
SafeQ: Secure and Efficient Query Processing in Sensor Networks Fei Chen and Alex X. Liu Department of Computer Science and Engineering Michigan State.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Practical Techniques for Searches on Encrypted Data Yongdae Kim Written by Song, Wagner, Perrig.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Mohammad Ahmadian COP-6087 University of Central Florida.
Secure Cloud Database using Multiparty Computation.
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science UC Santa Barbara DBSec 2010.
1 Practical Techniques for Searches on Encrypted Data Dawn Song, David Wagner, Adrian Perrig.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Secure Cloud Database with Sense of Security. Introduction Cloud computing – IT as a service from third party service provider Security in cloud environment.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Managing and querying encrypted data Trần Mỹ Giao Huỳnh Mai Thúy.
Data Confidentiality on Clouds Sharad Mehrotra University of California, Irvine.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Multiplicative Data Perturbations. Outline  Introduction  Multiplicative data perturbations Rotation perturbation Geometric Data Perturbation Random.
Other Perturbation Techniques. Outline  Randomized Responses  Sketch  Project ideas.
Additive Data Perturbation: the Basic Problem and Techniques.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Wei-Shinn Ku Slide 1 Auburn University Computer Science and Software Engineering Query Integrity Assurance of Location-based Services Accessing Outsourced.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
多媒體網路安全實驗室 Practical Searching Over Encrypted Data By Private Information Retrieval Date: Reporter: Chien-Wen Huang 出處: GLOBECOM 2010, 2010 IEEE.
DES Analysis and Attacks CSCI 5857: Encoding and Encryption.
Privacy preserving data mining – multiplicative perturbation techniques Li Xiong CS573 Data Privacy and Anonymity.
Presented By Amarjit Datta
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Secure Data Outsourcing
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Database Laboratory Regular Seminar TaeHoon Kim Article.
IIIT Hyderabad Private Outlier Detection and Content based Encrypted Search Nisarg Raval MS by Research, CSE Advisors : Prof. C. V. Jawahar & Dr. Kannan.
หัวข้อบรรยาย Stream cipher RC4 WEP (in)security LFSR CSS (in)security.
Data Security and Privacy Keke Chen
Searchable Encryption in Cloud
Towards Human Computable Passwords
COMP 430 Intro. to Database Systems
Database Performance Tuning and Query Optimization
based on slides by Debra Cook
Human Computable Passwords
Verifiable Oblivious Storage
A Privacy-Preserving Index for Range Queries
CS7380: Privacy Aware Computing
Differential Privacy (2)
Multiplicative Data Perturbations (1)
Chapter 11 Database Performance Tuning and Query Optimization
Path Oram An Extremely Simple Oblivious RAM Protocol
Multiplicative data perturbation (2)
Presentation transcript:

Secure Data Outsourcing

Outline  Motivation  Background  Research issues  Summary

Motivation  Cost of maintaining/mining large data 4-5 times of the cost of data acquisition DBAs are paid well  More and more data service providers Low cost – cloud computing  Maintain one database for one user  multiple users Examples:  Alentus.com  Datapipe.com  Discountasp.net  …  Concerns about data security and privacy Untrusted service provider

Un-trusted service provider  Lazy: incentives to perform less  Curious: incentives to acquire information  Malicious: Denial of service Incorrect results Possibly compromised

Challenges  Data confidentiality Data need to be encrypted (?) Utility of protected data?  Query utility  Mining utility  Access pattern privacy  Integrity Data integrity Query integrity  Correct  Complete  Fresh

Why is it hard for query services?  Arbitrary expressivity SQL statements Often, restricted for certain type of query for simplicity (e.g. range query, knn query)  Cost Communication Computation (server side vs client side)

Why it is hard for mining services?  Many data mining models Different utilities to preserve No one-size-for-all solutions

Data confidentiality  Bucketization method (crypto-index)  Order preserving encryption  Perturbations

Bucketization method  Hacigumus (SIGMOD02)

 Main steps Partition sensitive attributes  Order preserving: supports comparison  Random: query rewriting becomes hard Build index on the partitions Rewrite queries to target partitions  ‘john doe’  105  Select * from T’ where name=105 Execute queries and return results Prune/post-process results on client

 Trade off between confidentiality and overhead Larger partition  increased privacy  increased overheads

Order preserving encryption  Agrawal2004, Boldyreva2009  The set of data is securely transformed so that the order is preserved but the distribution and domain are changed  Benefits: indexing/searching on OPE encrypted data  Weakness: once the original distribution is known, OPE is broken

 Not attribute-wise order preserving Order preserving encryption (OPE, Agrawal et al 2004) is not resilient to distribution-based attacks Original Xi distribution is knownTransformed Xi’ distribution OPE Bucket based Estimation

Data perturbation  Definition 1. randomly change the original data 2. the attacker cannot effectively recover the original data 3. the desired properties are preserved  Techniques Single dimension: noise addition Multidimensional  Geometric perturbation  Random projection  RASP random space perturbation

Noise addition  Y = X+ R X: original data column, R: random noise (distribution published), Y: published data  Applications in data mining Reconstructing column distribution  Rakesh Agrawal SIGMOD 2000  Applied to privacy-preserving decision tree, naïve bayes classifier  Attacks Spectral filtering (Kargupta ICDM 2004) PCA reconstruction (Huang SIGMOD2005)

 Multiplicative perturbations Geometric data perturbation for outsourced data mining Random Projection RASP perturbation for query services (range query, kNN query).

Perturbation-based framework Mining service

Geometric data perturbation  Y=RX+T+D R: secret rotation matrix (preserve Euclidean distances) T: secret random translation matrix, D: secret random noise matrix Distances are approximately preserved (D) Resilient to most attacks to rotation perturbation  Applications Outsourced privacy preserving data mining, applicable for many classification and clustering algorithms  Attacks Population based attacks (when covariance matrix is revealed)

Random Projection  Y=AX+D A: random projection, e.g., entries from N(0,1) Distances are approximately preserved  Applications Many classification and clustering algorithms  Worse accuracy than geometric perturbation Good for sparse high-dimensional data (text data), i.e., sketch methods (A is randomly generated for EACH record)  Attacks Possibly more resilient than other two perturbation methods But utility (distance) is not well preserved

RASP perturbation k-dimensional numeric data, n records, represented as a k x n matrix, x: a record (1) Extend x to k+2 dimensions -(K+1) th dimension is always 1 – homogeneous dimension -(K+2) th dimension v is a real random number drawn from (2) Encryption - A is a (k+2)x(k+2) invertible real value matrix, with at least two non-zero values for each row and the last column of A has all non-zero values - A is shared by all records

 Properties Not an OPE Preserves convexity of the dataset  Convex dataset in R k  another convex dataset in R k+2. Good for range query  Each range query in R k  hyperplane based query  range query in R k+2.

RASP properties  Convexity preserving Queried range (hypercube) is convex RASP transforms the range to another convex (polyhedron) w T x=a half space: w T x<=a The intersection of convex sets is also convex.

illustration of convexity preserving Original space Encrypted space

Secure query transformation  A naïve solution Based on the convexity preserving property Problems: (1) A -1 can be probed (2) is.. If a is known, the whole dimension i is breached.

Secure query transformation  Enhanced solution X k+2 is always positive (X i -a)  0  (X i -a)X k+2  0 Correspondingly, in the encrypted space y T y  0, Problems addressed: (1) A -1 cannot be derived from  (2) (X i -a)X k+2  0 contains the random component X k+2 that protects the condition (X i -a)  0

Efficient two-stage query processing  illustrated Original space Transformed space Stage1: Querying this bounding box A multidimensional tree index is been built on the encrypted data (in the transformed space) in the server. Stage2: Filter out the junk records

Stage 1: The client calculates the large bounding box; The server uses the index to find the results. Stage 2: filter the initial results with the conditions y T  i y  0 for  1 … 2k Note: the two-stage strategy works, if the output of stage 1 is significantly smaller than the original database and can be fit into the memory. Otherwise, use linear scan with stage 2 filtering.

RASP-based data mining  Preserving range query  linear classifier  Use the boosting framework to get strong classifiers (PerturBoost, in ICDM 2013)

Access pattern privacy  On database queries Problem is the same as PIR Attackers may use the access pattern to breach data confidentiality  Each of previous approaches should handle this problem!

PIR is impractical  Solutions based on private Information retrieval (PIR) PIR is still impractical

For Bucktization approach  Based on the architecture of Hacigumus (SIGMOD02)  Hore VLDB04 For range query Privacy concern: reveal the distribution of value in each bucket “Diffusion”: split buckets and combine parts of different buckets Trade off: now the server needs to return more noisy results  larger size

For OPE  Use queries to find out the distributions, then break the encryption

For RASP  Secure query transformation  Attacks to transformed queries

Oblivious RAM  Access pattern: read/write data items  Setting: Client has a small secure memory Server has large insecure storage, semi- honest Data items are encrypted Client cannot hide the accessed locations  An active area

Existing Approaches  Inside a level Some real blocks  Useful data Some dummy blocks  Random data Randomly permuted  Only the client knows the permutation Dummy Block Real Block Dummy Block Real Block Dummy Block Real Block

Existing Approaches  Reading Read a block from each level One real block. Remaining are dummy blocks Client Server real dummy

Existing Approaches  Writing Shuffle consecutively filled levels. Write into next unfilled level. Clear the source levels Server (before) Server (after) Client shuffle blocks

Continuous Shuffling  … To write:

The Problem with Existing Approaches 

Integrity guarantee  Merkle hash tree H(H(x1)+H(x2)), + is string concatenation Can be stored with tree like structure : index, xml

 Hash chains

Query correctness with merkle by Devanbu et. al.

Using merkle tree Example: 5<=q<=10 LUB(q) = 4 GLB(q) = 11

 Operations: Selections, projections, equijoins, set ops  Issues Works only on data with verification objects Query expressiveness Expensive  Related work Pang et. al (ICDE04, SIGMOD05), using ElGamal function Sion VLDB05: challenge token F.Li SIGMOD06: freshness

Secure keyword search  Simple information retrieval For a keyword, find the documents containing the keyword  What if the documents are encrypted word by word  and if the keyword is also encrypted

Secure keyword search  Song 2000 Seed is random, different for each Wi Key idea: Li and Ri are self- verifiable Advantage of XOR

How to set K?

 Setting of ki Ki = Fk’(Wi), k’ is secret User publishes W and k = Fk’(W) Server checks CiW  whether == CiW It reveals nothing if Ci is not the ciphertext for W. And Li is random for different Wi – server cannot find any information from Li.

Hidden search  In previous schemes, W is revealed  Weakness: each search will have to release k for W  Easy to collect information  Solution: encrypt Wi with an private key, then xor with

Recent developments  Reza 2006 “Searchable symmetric encryption: improved definitions and efficient constructions” Completely solved this problem, with a solution indistinguishability under chosen ciphertext attack (IND-CCA)

Trusted hardware

Possible benefits

Discussion  Data confidentiality/access pattern Restrict cryptographic definition (keyword search) or Relaxed definition (perturbation, bucketization, OPE, etc.)  It is very difficult to formulate and prove the security of non-traditional approaches Do we need to reformulate the security model? and how?