1 Lecture 11: Provenance and Data privacy December 8, 2010 Dan Suciu -- CSEP544 Fall 2010.

Slides:



Advertisements
Similar presentations
Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 14: DATA PROVENANCE PRINCIPLES OF DATA INTEGRATION.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Provenance analysis of algorithms 10/1/13 V. Tannen University of Pennsylvania 1WebDam someTowards ?
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
An Ω(n 1/3 ) Lower Bound for Bilinear Group Based Private Information Retrieval Alexander Razborov Sergey Yekhanin.
Cryptography and Network Security 2 nd Edition by William Stallings Note: Lecture slides by Lawrie Brown and Henric Johnson, Modified by Andrew Yang.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
Efficient Query Evaluation on Probabilistic Databases
Containment of Conjunctive Queries on Annotated Relations Todd J. Green University of Pennsylvania March 25, ICDT 09, Saint Petersburg.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Ragib Hasan Johns Hopkins University en Spring 2010 Lecture 7 03/29/2010 Security and Privacy in Cloud Computing.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Session 4 Asymmetric ciphers.
An Overview of Data Provenance
1 Current Trends in Data Security Dan Suciu Joint work with Gerome Miklau.
1 Lecture 13: Security Wednesday, October 26, 2006.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
1 Section 10.1 Boolean Functions. 2 Computers & Boolean Algebra Circuits in computers have inputs whose values are either 0 or 1 Mathematician George.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Cryptography and Network Security Third Edition by William Stallings Lecture slides by Lawrie Brown.
Cryptography and Network Security Chapter 1 Fourth Edition by William Stallings Lecture slides by Lawrie Brown.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Approximated Provenance for Complex Applications
Math 3121 Abstract Algebra I Lecture 3 Sections 2-4: Binary Operations, Definition of Group.
CPSC 3730 Cryptography and Network Security
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Boolean Algebra and Digital Circuits
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Data Security and Encryption (CSE348) 1. Lecture # 12 2.
Number Theory Project The Interpretation of the definition Andre (JianYou) Wang Joint with JingYi Xue.
Chapter No 4 Query optimization and Data Integrity & Security.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
Reconcilable Differences Todd J. GreenZachary G. IvesVal Tannen University of Pennsylvania March 24, ICDT 09, Saint Petersburg.
CompSci 102 Discrete Math for Computer Science
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Lecture 4 Boolean Algebra. Logical Statements °A proposition that may or may not be true: Today is Monday Today is Sunday It is raining °Compound Statements.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania PODS 2007.
Great Theoretical Ideas in Computer Science.
Great Theoretical Ideas In Computer Science Anupam GuptaCS Fall 2006 Lecture 15Oct 17, 2006Carnegie Mellon University Algebraic Structures: Groups,
Quantification of Integrity Michael Clarkson and Fred B. Schneider Cornell University IEEE Computer Security Foundations Symposium July 17, 2010.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Provenance for Database Transformations Val Tannen University of Pennsylvania Joint work with J.N. Foster T.J. Green G. Karvounarakis IPAW ’08 Salt Lake.
Default-all is dangerous! Wolfgang Gatterbauer Alexandra Meliou Dan Suciu Database group University of Washington.
Chapter 2 1. Chapter Summary Sets (This Slide) The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
University of Texas at El Paso
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Set Operations Section 2.2.
Lesson 5 Relations, mappings, countable and uncountable sets
Instructor: Mohamed Eltabakh
Lesson 5 Relations, mappings, countable and uncountable sets
Probabilistic Databases
On Provenance of Queries on Linked Web Data
Cryptography and Network Security
Presentation transcript:

1 Lecture 11: Provenance and Data privacy December 8, 2010 Dan Suciu -- CSEP544 Fall 2010

Outline Database provenance –Slides based on Val Tannen’s Keynote talk at EDBT 2010 Data privacy –Slides from my UW colloquium talk in 2005 Dan Suciu -- CSEP544 Fall 20102

Data Provenance provenance, n. The fact of coming from some particular source or quarter ∅ origin, derivation [Oxford English Dictionary] Data provenance [BunemanKhannaTan 01]: aims to explain how a particular result (in an experiment, simulation, query, workflow, etc.) was derived. Most science today is data-intensive. Scientists, eg., biologists, astronomers, worry about data provenance all the time. 303/24/10EDBT Keynote, Lausanne

Provenance? Lineage? Pedigree? Cf. Peter Buneman: – Pedigree is for dogs – Lineage is for kings – Provenance is for art For data, let’s be artistic (artsy?) 03/24/10EDBT Keynote, Lausanne4

Database transformations? Queries Views ETL tools Schema mappings (as used in data exchange) 03/24/10EDBT Keynote, Lausanne5

Outline What’s with the semirings? Annotation propagation [GK&T PODS 07, GKI&T VLDB 07] Housekeeping in the zoo of provenance models 03/24/10EDBT Keynote, Lausanne6

Propagating annotations through database operations 03/24/10EDBT Keynote, Lausanne7 JOIN (on B) … a b c … p The annotation p ⋅ r means joint use of data annotated by p and data annotated by r … a b c d e p ⋅ r … R R R ⋈ S S S … d b e … r A B C D B E A B C D E

Another way to propagate annotations 03/24/10EDBT Keynote, Lausanne8 UNION … a b c … p The annotation p + r means alternative use of data … a b c p + r … R R R ⋃ S S S … a b c … r A B C

Another use of + 03/24/10EDBT Keynote, Lausanne9 … a b c 1 p … a b c 2 r … a b c 3 … s … a b p + r + s … + means alternative use of data PROJECT R R A B C Π AB R A B

An example in positive relational algebra (SPJU) 03/24/10EDBT Keynote, Lausanne10 a b c p d b e r f g e s a c ( p ⋅ p + p ⋅ p ) ⋅ 0 a e p ⋅ r ⋅ 1 d c r ⋅ p ⋅ 0 d e ( r ⋅ r + r ⋅ s + r ⋅ r ) ⋅ 1 f e ( s ⋅ s + s ⋅ r + s ⋅ s ) ⋅ 1 For selection we multiply with two special annotations, 0 and 1 Q =  C=e Π AC ( Π AC R ⋈ Π BC R ⋃ Π AB R ⋈ Π BC R ) R R A B C A C

Summary so far 03/24/10EDBT Keynote, Lausanne11

Summary so far A space of annotations, K 03/24/10EDBT Keynote, Lausanne12

Summary so far A space of annotations, K K -relations: every tuple annotated with some element from K. 03/24/10EDBT Keynote, Lausanne13

Summary so far A space of annotations, K K -relations: every tuple annotated with some element from K. Binary operations on K: ⋅ corresponds to joint use (join), and + corresponds to alternative use (union and projection). 03/24/10EDBT Keynote, Lausanne14

Summary so far A space of annotations, K K -relations: every tuple annotated with some element from K. Binary operations on K: ⋅ corresponds to joint use (join), and + corresponds to alternative use (union and projection). We assume K contains special annotations 0 and 1. 03/24/10EDBT Keynote, Lausanne15

Summary so far A space of annotations, K K -relations: every tuple annotated with some element from K. Binary operations on K: ⋅ corresponds to joint use (join), and + corresponds to alternative use (union and projection). We assume K contains special annotations 0 and 1. ‘‘Absent’’ tuples are annotated with 0! 03/24/10EDBT Keynote, Lausanne16

Summary so far A space of annotations, K K -relations: every tuple annotated with some element from K. Binary operations on K: ⋅ corresponds to joint use (join), and + corresponds to alternative use (union and projection). We assume K contains special annotations 0 and 1. ‘‘Absent’’ tuples are annotated with 0! 1 is a ‘‘neutral’’ annotation (no restrictions). 03/24/10EDBT Keynote, Lausanne17

Summary so far A space of annotations, K K -relations: every tuple annotated with some element from K. Binary operations on K: ⋅ corresponds to joint use (join), and + corresponds to alternative use (union and projection). We assume K contains special annotations 0 and 1. ‘‘Absent’’ tuples are annotated with 0! 1 is a ‘‘neutral’’ annotation (no restrictions). Algebra of annotations? What are the laws of (K, +, ⋅, 0, 1) ? 03/24/10EDBT Keynote, Lausanne18

Annotated relational algebra 03/24/10EDBT Keynote, Lausanne19 DBMS query optimizers assume certain equivalences: – union is associative, commutative – join is associative, commutative, distributes over union – projections and selections commute with each other and with union and join (when applicable) – Etc., but no R ⋈ R = R ⋃ R = R (i.e., no idempotence, to allow for bag semantics) Equivalent queries should produce same annotations!

Annotated relational algebra 03/24/10EDBT Keynote, Lausanne20 Hence, for each commutative semiring K we have a K-annotated relational algebra. Proposition. Above identities hold for queries on K- relations iff (K, +, ⋅, 0, 1) is a commutative semiring DBMS query optimizers assume certain equivalences: – union is associative, commutative – join is associative, commutative, distributes over union – projections and selections commute with each other and with union and join (when applicable) – Etc., but no R ⋈ R = R ⋃ R = R (i.e., no idempotence, to allow for bag semantics) Equivalent queries should produce same annotations!

Annotated relational algebra 03/24/10EDBT Keynote, Lausanne21 Hence, for each commutative semiring K we have a K-annotated relational algebra. DBMS query optimizers assume certain equivalences: – union is associative, commutative – join is associative, commutative, distributes over union – projections and selections commute with each other and with union and join (when applicable) – Etc., but no R ⋈ R = R ⋃ R = R (i.e., no idempotence, to allow for bag semantics) Equivalent queries should produce same annotations!

What is a commutative semiring? An algebraic structure (K, +, ⋅, 0, 1) where: o K is the domain o + is associative, commutative, with 0 identity o ⋅ is associative, with 1 identity semiring o ⋅ distributes over + o a ⋅ 0 = 0 ⋅ a = 0 o ⋅ is also commutative Unlike ring, no requirement for inverses to /24/10EDBT Keynote, Lausanne

Back to the example 03/24/10EDBT Keynote, Lausanne23 a b c p d b e r f g e s a c (p ⋅ p + p ⋅ p ) ⋅ 0 a e p ⋅ r ⋅ 1 d c r ⋅ p ⋅ 0 d e (r ⋅ r + r ⋅ s + r ⋅ r ) ⋅ 1 f e (s ⋅ s + s ⋅ r + s ⋅ s ) ⋅ 1 R R A B C A C Q

Using the laws: polynomials 03/24/10EDBT Keynote, Lausanne24 a b c p d b e r f g e s R R A B C A C a e pr d e 2r 2 + rs f e rs + 2s 2 A C Polynomials with coefficients in N and annotation tokens as indeterminates p, r, s capture a very general form of provenance Q

Provenance reading of the polynomials 03/24/10EDBT Keynote, Lausanne25 a b c p d b e r f g e s R R A B C A C a e pr d e 2r 2 + rs f e rs + 2s 2 A C three different ways to derive d e two of the ways use only r but they use it twice the third way uses r once and s once Q

Low-hanging fruit: deletion propagation 03/24/10EDBT Keynote, Lausanne26 a b c p d b e r f g e s R R A B C A C a e pr d e 2r 2 + rs f e rs + 2s 2 We used this in Orchestra [VLDB07] for update propagation Q

Low-hanging fruit: deletion propagation 03/24/10EDBT Keynote, Lausanne27 a b c p d b e r f g e s R R A B C A C a e pr d e 2r 2 + rs f e rs + 2s 2 Delete d b e from R ? We used this in Orchestra [VLDB07] for update propagation Q

Low-hanging fruit: deletion propagation 03/24/10EDBT Keynote, Lausanne28 a b c p d b e r f g e s R R A B C A C a e pr d e 2r 2 + rs f e rs + 2s 2 Delete d b e from R ? Set r = 0 ! We used this in Orchestra [VLDB07] for update propagation Q

Low-hanging fruit: deletion propagation 03/24/10EDBT Keynote, Lausanne29 a b c p d b e r f g e s R R A B C A C a e pr d e 2r 2 + rs f e rs + 2s 2 Delete d b e from R ? Set r = 0 ! a e 0 d e 0 f e 2s 2 A C We used this in Orchestra [VLDB07] for update propagation Q Q

Low-hanging fruit: deletion propagation 03/24/10EDBT Keynote, Lausanne30 a b c p d b e r f g e s R R A B C A C a e pr d e 2r 2 + rs f e rs + 2s 2 Delete d b e from R ? Set r = 0 ! a e 0 d e 0 f e 2s 2 f e 2s 2 A C We used this in Orchestra [VLDB07] for update propagation Q Q Q

But are there useful commutative semirings? ( B, ∧, ∨, ⊤, ⊥ ) Set semantics ( ℕ, +, ∙, 0, 1) Bag semantics ( P (  ), ⋃, ⋂, ∅,  ) Probabilistic events [FuhrRölleke 97] (BoolExp(X), ∧, ∨, ⊤, ⊥ ) Conditional tables (c-tables) [ImielinskiLipski 84] ( R + ∞, min, +, 1, 0)Tropical semiring (cost/distrust score/confidence need) ( A, min, max, 0, P) where A = P < C < S < T < 0 Access control levels [PODS8] 3103/24/10EDBT Keynote, Lausanne

But are there useful commutative semirings? ( B, ∧, ∨, ⊤, ⊥ ) Set semantics ( ℕ, +, ∙, 0, 1) Bag semantics ( P (  ), ⋃, ⋂, ∅,  ) Probabilistic events [FuhrRölleke 97] (BoolExp(X), ∧, ∨, ⊤, ⊥ ) Conditional tables (c-tables) [ImielinskiLipski 84] ( R + ∞, min, +, 1, 0)Tropical semiring (cost/distrust score/confidence need) ( A, min, max, 0, P) where A = P < C < S < T < 0 Access control levels [PODS8] 32 publi c 03/24/10EDBT Keynote, Lausanne

But are there useful commutative semirings? ( B, ∧, ∨, ⊤, ⊥ ) Set semantics ( ℕ, +, ∙, 0, 1) Bag semantics ( P (  ), ⋃, ⋂, ∅,  ) Probabilistic events [FuhrRölleke 97] (BoolExp(X), ∧, ∨, ⊤, ⊥ ) Conditional tables (c-tables) [ImielinskiLipski 84] ( R + ∞, min, +, 1, 0)Tropical semiring (cost/distrust score/confidence need) ( A, min, max, 0, P) where A = P < C < S < T < 0 Access control levels [PODS8] 33 publi c top secret 03/24/10EDBT Keynote, Lausanne

Outline What’s with the semirings? Annotation propagation Housekeeping in the zoo of provenance models [GK&T PODS 07, FG&T PODS 08, Green ICDT 09] 03/24/10EDBT Keynote, Lausanne34

Semirings for various models of provenance (1) 03/24/10EDBT Keynote, Lausanne35 a b c p d b e r f g e s R R A B C … d e … A C Q Lineage [CuiWidomWiener 00 etc.] Sets of contributing tuples Semiring: (Lin(X), ⋃, ⋃ *, ∅, ∅ * ) … d e { r, s } …

Semirings for various models of provenance (2) 03/24/10EDBT Keynote, Lausanne36 a b c p d b e r f g e s R R A B C … d e … A C Q … d e {{r }, {r, s }} … (Witness, Proof) why-provenance [BunemanKhannaTan 01] & [Buneman+ PODS08] Sets of witnesses (w. =set of contributing tuples) Semiring: (Why(X), ⋃, ⋓, ∅, { ∅ })

Semirings for various models of provenance (3) 03/24/10EDBT Keynote, Lausanne37 a b c p d b e r f g e s R R A B C … d e … A C Q … d e {{r }} … Minimal witness why-provenance [BunemanKhannaTan 01] Sets of minimal witnesses Semiring: (PosBool(X), ∧, ∨, ⊤, ⊥ )

Semirings for various models of provenance (4) 03/24/10EDBT Keynote, Lausanne38 Notation: { } set [ ] bag a b c p d b e r f g e s R R A B C … d e … A C Q … d e [{r }, {r }, {r, s }] … Trio lineage [Das Sarma+ 08] Bags of sets of contributing tuples (of witnesses) Semiring: (Trio(X), +, ⋅, 0, 1) (defined in [Green, ICDT 09] )

Semirings for various models of provenance (5) 03/24/10EDBT Keynote, Lausanne39 Notation: { } set [ ] bag a b c p d b e r f g e s R R A B C … d e … A C Q … d e {[r,r ], [r, s ]} … Polynomials with boolean coefficients [Green, ICDT 09] ( B [X]-provenance ) Sets of bags of contributing tuples Semiring: ( B [X], +, ⋅, 0, 1)

Semirings for various models of provenance (6) 03/24/10EDBT Keynote, Lausanne40 a b c p d b e r f g e s R R A B C … d e … A C Q … d e [[r,r ], [r,r ], [r, s ]] … Provenance polynomials [GKT, PODS 07] ( N [X]-provenance ) Bags of bags of contributing tuples Semiring: ( N [X], +, ⋅, 0, 1)

A provenance hierarchy N[X]N[X] B[X]B[X] Trio(X) Why(X) Lin(X) PosBool(X) most informative least informative 4103/24/10EDBT Keynote, Lausanne

One semiring to rule them all… (apologies!) N[X]N[X] B[X]B[X] Trio(X) Why(X) Lin(X) PosBool(X) Example: 2x 2 y + xy + 5y 2 + z drop exponents 3xy + 5y + z drop coefficients x 2 y + xy + y 2 + z collapse terms xyz drop both exp. and coeff. xy + y + z apply absorption (ab + b = b) y + z 4203/24/10EDBT Keynote, Lausanne A path downward from K 1 to K 2 indicates that there exists an onto (surjective) semiring homomorphism h : K 1  K 2

Using homomorphisms to relate models N[X]N[X] B[X]B[X] Trio(X) Why(X) Lin(X) PosBool(X) Homomorphism? h(x+y) = h(x)+h(y) h(xy)=h(x)h(y) h(0)=0 h(1)=1 Moreover, for these homomorphisms h(x)= x Example: 2x 2 y + xy + 5y 2 + z drop exponents 3xy + 5y + z drop coefficients x 2 y + xy + y 2 + z collapse terms xyz drop both exp. and coeff. xy + y + z apply absorption (ab + b = b) y + z 4303/24/10EDBT Keynote, Lausanne

Containment and Equivalence [Green ICDT 09] 44 N[X]N[X] B[X]B[X] Trio(X)Why(X) Lin(X) PosBool(X) B N SPJ containment N[X]N[X] B[X]B[X] Trio(X)Why(X) Lin(X) PosBool(X) B N[X]N[X] B[X]B[X] Trio(X)Why(X) Lin(X) PosBool(X) B SPJ equivalence N N SPJU containment N[X]N[X] Trio(X) Lin(X) PosBool(X) B SPJU equivalence N Why(X) B[X]B[X]

Data Security Based on my colloquium talk from /24/10EDBT Keynote, Lausanne45

46 Data Security Dorothy Denning, 1982: Data Security is the science and study of methods of protecting data (...) from unauthorized disclosure and modification Data Security = Confidentiality + Integrity

47 Data Security Distinct from systems and network security – Assumes these are already secure Tools: – Cryptography, information theory, statistics, … Applications: – An enabling technology

48 Outline An attack Data security research today

49 Latanya Sweeney’s Finding In Massachusetts, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees GIC has to publish the data: GIC(zip, dob, sex, diagnosis, procedure,...) This is private ! Right ?

50 Latanya Sweeney’s Finding Sweeney paid $20 and bought the voter registration list for Cambridge Massachusetts: VOTER(name, party,..., zip, dob, sex) GIC(zip, dob, sex, diagnosis, procedure,...) This is private ! Right ?

51 Latanya Sweeney’s Finding William Weld (former governor) lives in Cambridge, hence is in VOTER 6 people in VOTER share his dob only 3 of them were man (same sex) Weld was the only one in that zip Sweeney learned Weld’s medical records ! zip, dob, sex

52 Latanya Sweeney’s Finding All systems worked as specified, yet an important data has leaked How do we protect against that ? Some of today’s research in data security address breaches that happen even if all systems work correctly

53 Today’s Approaches K-anonymity – Useful, but not really private Differential privacy – Private, but not really useful

54 FirstLastAgeRace HarryStone34Afr-Am JohnReyser36Cauc BeatriceStone47Afr-am JohnRamos22Hisp k-Anonymity Definition: each tuple is equal to at least k-1 others Anonymizing: through suppression and generalization Hard: NP-complete for supression only Approximations exists [Samarati&Sweeney’98, Meyerson&Williams’04]

55 FirstLastAgeRace HarryStone34Afr-Am JohnReyser36Cauc BeatriceStone47Afr-am JohnRamos22Hisp FirstLastAgeRace *Stone30-50Afr-Am JohnR*20-40* *Stone30-50Afr-am JohnR*20-40* k-Anonymity Definition: each tuple is equal to at least k-1 others Anonymizing: through suppression and generalization Hard: NP-complete for supression only Approximations exists [Samarati&Sweeney’98, Meyerson&Williams’04]

Differential Privacy A randomized algorithm A is differentially private if by removing/inserting one tuple in the database, the output of A is “almost the same”, i.e. every possible outcome for A has almost the same probability 56 [Dwork’05]

Differential Privacy How can we achieve that ? Add some random noise to the result of A For example: –Query: select count(*) from R where blah –Add some random noise (Laplacian distribution: e -x/x0 ) Problem: can only ask a limited number of queries –Must keep track of the queries answered, then deny –Cannot release “the entire data” 57 [Dwork’05]

58 Privacy All these techniques address confidentiality, but they are often claim privacy Privacy is more complex: – “Is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to others” [Agrawal’03]

Take Home Lessons Data management does not stop at normal forms and query optimization Our field (Computer Science) is becoming data- centric. Dominated by massive amounts of data. This affects businesses, science, society Watch the data management & data mining fields for excitement future innovations 03/24/10EDBT Keynote, Lausanne59