Download presentation
Presentation is loading. Please wait.
Published byHelen Cooper Modified over 9 years ago
1
Distributing Data for Secure Data Services Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani April 8th, 2011 Stanford, TRDDC, TRUST TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA
2
RoadMap Motivation for Secure Databases Column level distribution Encryption, Distribution Privacy constraints Set cover initialization Query Mediation Cost estimation Where and Select clause processing Query decomposition Experiments Related Work
3
Health Personal medical details Disease history Clinical research data Banking Bank statement Loan Details Transaction history Finance Portfolio information Credit history Transaction records Investment details Insurance Claims records Accident history Policy details Outsourcing Customer data for testing Remote DB Administration BPO & KPO Retail Business Inventory records Individual credit card details Audits Manufacturing Process details Blueprints Production data Govt. Agencies Census records Economic surveys Hospital Records Motivation 1: Data Privacy in Enterprises
4
Motivation 2: Government Regulations CountryPrivacy Legislation AustraliaPrivacy Amendment Act of 2000 European UnionPersonal Data Protection Directive 1998 Hong KongPersonal Data (Privacy) Ordinance of 1995 United KingdomData Protection Act of 1998 United StatesSecurity Breach Information Act (S.B. 1386) of 2002 Gramm-Leach-Bliley Act of 1999 Health Insurance Portability and Accountability Act of 1996
5
Motivation 3: Personal Information Emails Searches on Google/Yahoo Profiles on Social Networking sites Passwords / Credit Card / Personal information at multiple E- commerce sites / Organizations Documents on the Computer / Network
6
Losses due to Lack of Privacy: ID-Theft 3% of households in the US affected by ID-Theft US $5-50B losses/year UK £1.7B losses/year AUS $1-4B losses/year
7
Data Privacy Value disclosure: What is the value of attribute salary of person X Perturbation Privacy Preserving OLAP Identity disclosure: Whether an individual is present in the database table Randomization, K-Anonymity etc. Data for Outsourcing / Research Linkage disclosure: Linking columns from multiple sites
8
RoadMap Motivation for Secure Databases Column level distribution Encryption, Distribution Privacy constraints Set cover initialization Query Mediation Cost estimation Where and Select clause processing Query decomposition Experiments Related Work
9
Masketeer: A tool for data privacy Lodha, Patwardhan, Roy, Sundaram etal.
10
Two Can Keep a Secret: A Distributed Architecture for Secure Database Services Aggarwal, Bawa, Ganesan, Garcia-Molina, Kenthapadi, Motwani, Srivastava, Thomas, Xu CIDR 2005 How to distribute data across multiple sites for (1)redundancy and (2) privacy so that a single site being compromised does not lead to data loss
11
Motivation Data outsourcing growing in popularity –Cheap, reliable data storage and management 1TB $399 < $0.5 per GB $5000 – Oracle 10g / SQL Server $68k/year DBAdmin Privacy concerns looming ever larger –High-profile thefts (often insiders) UCLA lost 900k records Berkeley lost laptop with sensitive information Acxiom, JP Morgan, Choicepoint www.privacyrights.org
12
Present solutions Application level: Salesforce.com On-Demand Customer Relationship Management $65/User/Month ---- $995 / 5 Users / 1 Year Amazon Elastic Compute Cloud 1 instance = 1.7Ghz x86 processor, 1.75GB RAM, 160GB local disk, 250 Mb/s network bandwidth Elastic, Completely controlled, Reliable, Secure $0.10 per instance hour $0.20 per GB of data in/out of Amazon $0.15 per GB-Month of Amazon S3 storage used Google Apps for your domain Small businesses, Enterprise, School, Family or Group
13
Encryption Based Solution Encrypt Client DSP Client-side Processor Query Q Q’ “Relevant Data”Answer Problem: Q’ “SELECT *”
14
The Power of Two Client DSP1 DSP2
15
The Power of Two DSP1 DSP2 Client-side Processor Query Q Q1 Q2 Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)
16
SB1386 Privacy { Name, SSN}, { Name, LicenceNo} { Name, CaliforniaID} { Name, AccountNumber} { Name, CreditCardNo, SecurityCode} are all to be kept private. A set is private if at least one of its elements is “hidden”. Element in encrypted form ok
17
Techniques Vertical Fragmentation Partition attributes across R1 and R2 E.g., to obey constraint {Name, SSN}, R1 Name, R2 SSN Use tuple IDs for reassembly. R = R1 JOIN R2 Encoding One-time Pad For each value v, construct random bit seq. r R1 v XOR r, R2 r Deterministic Encryption R1 E K (v) R2 K Can detect equality and push selections with equality predicate Random addition R1 v+r, R2 r Can push aggregate SUM
18
Example An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode} Privacy Constraints {Telephone}, {Email} {Name, Salary}, {Name, Position}, {Name, DoB} {DoB, Gender, ZipCode} {Position, Salary}, {Salary, DoB} Will use just Vertical Fragmentation and Encoding. Decomposed Schema R1:{TID, Name, Email, Telephone, Gender, Salary} R2:{TID, Name, Email, Telephone, DoB, Position,ZipCode} Encrypted Attributes E: {Telephone, Email, Name}
19
Partitioning, Execution Partitioning Problem –Partition to minimize communication cost for given workload –Even simplified version hard to approximate –Hill Climbing algorithm after starting with weighted set cover Query Reformulation and Execution –Consider only centralized plans –Algorithm to partition select and where clause predicates between the two partitions
20
Set Cover+ Greedy for partitioning
21
RoadMap Motivation for Secure Databases Column level distribution Encryption, Distribution Privacy constraints Set cover initialization Query Mediation Cost estimation Where and Select clause processing Query decomposition Experiments Related Work
22
Cost Estimation
23
State Definitions 0: condition clause cannot be pushed to either servers 1: condition clause can be pushed to Server 1 2: condition clause can be pushed to Server 2 3: condition clause can be pushed to both servers 4: condition clause can be pushed to either servers
24
OR State Evaluation
25
AND State Evaluation
26
Query Partitioning Query 1: SELECT TID, name, salary FROM R1 WHERE Name=’Tom’ Query 2: SELECT TID, dob, zipcode FROM R2 WHERE Position=’Staff’ Original Query SELECT Name, DoB, Salary FROM R WHERE (Name =’Tom’ AND Position=’Staff’) AND (Zipcode =’94305’ OR Salary > 60000) R1: R1:{TID, Name, Email, Telephone,Gender, Salary} R2:{TID, Name, Email, Telephone, DoB, Position,Zipcode}
27
Distributed Query Plan
28
RoadMap Motivation for Secure Databases Column level distribution Encryption, Distribution Privacy constraints Set cover initialization Query Mediation Cost estimation Where and Select clause processing Query decomposition Experiments Related Work
29
Number of Iterations
30
Perfomance Gain Experiment
31
Iterations Vs Privacy Constraints
32
Papers [CIDR05]Two Can Keep A Secret. [PAIS11] Distributing Data for Secure Databases. [SIGMOD05] Privacy Preserving OLAP. [ICDT05]Anonymizing Tables. [PODS06]Clustering For Anonymity. [KDD07] Probabilistic Anonymity.
33
Acknowledgements: Collaborators Stanford Privacy Group TRDDC Privacy Group PORTIA, TRUST, Google
34
March 18, 2011 Back Up slides
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.