Foundations of Privacy Lecture 5 Lecturer: Moni Naor.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Linear-Degree Extractors and the Inapproximability of Max Clique and Chromatic Number David Zuckerman University of Texas at Austin.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Foundations of Cryptography Lecture 11 Lecturer: Moni Naor.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
BOOSTING & ADABOOST Lecturer: Yishay Mansour Itay Dangoor.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Dictator tests and Hardness of approximating Max-Cut-Gain Ryan O’Donnell Carnegie Mellon (includes joint work with Subhash Khot of Georgia Tech)
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Introduction to Modern Cryptography, Lecture ?, 2005 Broadcast Encryption, Traitor Tracing, Watermarking.
Analysis of greedy active learning Sanjoy Dasgupta UC San Diego.
The Goldreich-Levin Theorem: List-decoding the Hadamard code
Oded Regev Tel-Aviv University On Lattices, Learning with Errors, Learning with Errors, Random Linear Codes, Random Linear Codes, and Cryptography and.
Private Information Retrieval. What is Private Information retrieval (PIR) ? Reduction from Private Information Retrieval (PIR) to Smooth Codes Constructions.
Foundations of Privacy Lecture 7 Lecturer: Moni Naor.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
On Everlasting Security in the Hybrid Bounded Storage Model Danny Harnik Moni Naor.
CS151 Complexity Theory Lecture 9 April 27, 2004.
Computer Security CS 426 Lecture 3
Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
1 The Santa Claus Problem (Maximizing the minimum load on unrelated machines) Nikhil Bansal (IBM) Maxim Sviridenko (IBM)
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
 1  Outline  stages and topics in simulation  generation of random variates.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Privacy by Learning the Database Moritz Hardt DIMACS, October 24, 2012.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
1 Information Security – Theory vs. Reality , Winter Lecture 10: Garbled circuits and obfuscation Eran Tromer Slides credit: Boaz.
CS654: Digital Image Analysis
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Computer Science Revocation and Tracing Schemes for Stateless Receivers Dalit Naor, Moni Naor, Jeff Lotspiech Presented by Attila Altay Yavuz CSC 774 In-Class.
CS555Spring 2012/Topic 31 Cryptography CS 555 Topic 3: One-time Pad and Perfect Secrecy.
1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim.
List Decoding Using the XOR Lemma Luca Trevisan U.C. Berkeley.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
CS555Spring 2012/Topic 81 Cryptography CS 555 Topic 8: Pseudorandom Functions and CPA Security.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
Does Privacy Require True Randomness? Yevgeniy Dodis New York University Joint work with Carl Bosley.
Approximation Algorithms based on linear programming.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Property Testing (a.k.a. Sublinear Algorithms )
Private Data Management with Verification
Data Driven Resource Allocation for Distributed Learning
New Characterizations in Turnstile Streams with Applications
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Privacy-preserving Release of Statistics: Differential Privacy
Digital Signature Schemes and the Random Oracle Model
Differential Privacy in Practice
Nikhil Bansal, Shashwat Garg, Jesper Nederlof, Nikhil Vyas
Foundations of Privacy Lecture 7
Cryptography Lecture 8.
Published in: IEEE Transactions on Industrial Informatics
Cryptography Lecture 9.
Presentation transcript:

Foundations of Privacy Lecture 5 Lecturer: Moni Naor

Recap of last week’s lecture The Exponential Mechanism –Differential privacy – May yield utility/approximation –Is defined and evaluated by considering all possible answers Counting Queries –The BLR Algorithm –Efficient Algorithm

query 1, query 2,... Synthetic DB: Output is a DB Database answer 1 answer 3 answer 2 ? Sanitizer Synthetic DB: output also a DB (of entries from same universe X ), user reconstructs answers by evaluating query on output DB Software and people compatible Consistent answers

Counting Queries Queries with low sensitivity Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis Assume all queries given in advance U Database D of size n Query c Non-interactive

The BLR Algorithm For DBs F and D dist(F,D) = max q 2 C |q(F) – q(D)| Intuition: far away DBs get smaller probability Algorithm on input DB D : Sample from a distribution on DBs of size m : ( m < n ) DB F gets picked w.p. / e -ε·dist(F,D) Blum Ligett Roth08

Counting Queries Queries with low sensitivity Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis U Database D of size n Query c Sample F of size m approx D on all given predicates c

The BLR Algorithm: Error Õ(n 2/3 log|C|) There exists F good of size m =Õ((n\α) 2· log|C|) s.t. dist(F good,D) ≤ α Pr[F good ] / e -εα For any F bad with dist 2α, Pr[F bad ] / e -2εα Union bound : ∑ bad DB F bad Pr[F bad ] / |U| m e -2εα For α=Õ(n 2/3 log|C|), Pr[F good ] >> ∑ Pr[F bad ] Algorithm on input DB D : Sample from a distribution on DBs of size m : ( m < n ) DB F gets picked w.p. / e -ε·dist(F,D)

The BLR Algorithm: Running Time Generating the distribution by enumeration: Need to enumerate every size- m database, where m = Õ((n\α) 2· log|C|) Running time ≈ |U| Õ((n\α) 2 ·log|c|) Algorithm on input DB D : Sample from a distribution on DBs of size m : ( m < n ) DB F gets picked w.p. / e -ε·dist(F,D)

Conclusion Offline algorithm, 2ε- Differential Privacy for any set C of counting queries Error α is Õ(n 2/3 log|C|/ε) Super-poly running time: |U| Õ((n\α) 2 ·log|C|)

Can we Efficiently Sanitize? The good news If the universe is small, Can sanitize EFFICIENTLY The bad news cannot do much better, namely sanitize in time: sub-poly(|C|) AND sub-poly(|U|) Time poly(|C|,|U|)

How Efficiently Can We Sanitize? |C| |U| subpolypoly subpoly poly ? Good news! ? ??

The Good News: Can Sanitize When Universe is Small Efficient Sanitizer for query set C DB size n ¸ Õ(|C| o(1) log|U|) error is ~ n 2/3 Runtime poly(|C|,|U|) Output is a synthetic database Compare to [Blum Ligget Roth]: n ¸ Õ(log|C| log|U|), runtime super-poly(|C|,|U|)

Recursive Algorithm C 0 =CC1C1 C2C2 CbCb Start with DB D and large query set C Repeatedly choose random subset C i+1 o f C i : shrink query set by (small) factor

Recursive Algorithm Start with DB D and large query set C Repeatedly choose random subset C i+1 o f C i : shrink query set by (small) factor End recursion: sanitize D w.r.t. small query set C b Output is good for all queries in small set C i+1 Extract utility on almost-all queries in large set C i Fix remaining “underprivileged” queries in large set C i C 0 =CC1C1 C2C2 CbCb

Recursive Algorithm Overview Want to sanitize DB D for query set C Say we have a small sanitizer A’ for smaller subsets C’ ½ C, and A’ outputs small synthetic database Choose random C’ ½ C, sanitize D for C’ using A’ “Magic”: Sanitization gives accurate answers on all but small subset B ½ C Fix “ underprivileged ” queries in B “manually” C C’ B A’ sanitizes Fix manually Why? How? Where?

Sanitize for few queries, get utility for almost all Consider m -bit synthetic DB output y of A’ vs. DB D : If y is “bad” for query set B y of fractional size ≥m/s : Pr C’ [C’  B y = φ ] ≤ (1-m/s) |C’| ≈ e -m W.h.p. simultaneously for all y ‘s with large set B y of bad queries, C’ intersects B y C’ C y*=A’(D) good for all of C’ y* good for almost all C y : potential m -bit output DB ByBy Occam’s Razor B y*

How to get Synthetic DB? Syntheticizer Problem: need small synthetic DB, have large other output Lemma [“Syntheticizer”] Given sanitizer A with α -accuracy and arbitrary output Produce sanitizer A’ with 2α -accuracy and synthetic DB output of size Õ(log|C|/α 2) Runtime is poly(|U|,|C|) Transform output to synthetic DB using linear programming Variable per item in U, constraint per query in C

The Linear Program Run the sanitizer A and then use it to get differentially private counts v c on all the concepts in C –Database never used again - privacy Come up with a low-weight fractional database that approximates these counts. Transform this fractional database into a standard synthetic database by rounding the fractional counts.

For all i 2 U variable x i For all c 2 C constraint v c -  ·  i s.t c(i)=1 x i · v c + 

The Linear Program Why is there a fractional solution? –The real one integer solution is one example! Rounding: –scale the fractional database so that its total weight is 1, –Round down each fractional point to closest multiple of  /|U| –Treat the rounded fractional database, as an integer synthetic database of size at most |U| /  –If too large -sample

How Do We Use Synthetic DB? Why Synthetic DB? 1.Easy to “shrink” DBs by sub-sampling Õ(log|C|/α 2 ) DB items 2.Gives counts for every query output is well-defined even for queries that were not around when sanitizing

Utility for all queries: First Attempt Sanitizing small C’ is easy ( “brute force” ), can “shrink” using syntheticizer Sub-sample small C’, work for all but a few queries Repeat many times, take majority Doesn’t work: Underprivileged queries C’ C B C’’

Utility for all queries: fix “underpriveleged” Lemma Given query set C, diff. private sanitizer A that: 1.Works for every C’ ½ C, |C’|=s 2.Outputs synthetic DB of size ≤ m Get sanitizer for C, utility on all queries Need DB size n ≥ Õ(|C|m/s)

Proof Outline Subsample small C’, get synthetic DB that works for all but a few ( ~|C|m/s ) “underprivileged” queries Now “manually” correct those few : “brute force” : release noisy counts v c (noise ~|C|m/s ) Also need to say which ones are underprivileged … depends on DB D. What about privacy ? Key point: regardless of D, almost all queries strongly privileged. Release noisy indicator vector. For privacy analysis, need only consider the ~|C|m/s potentially underprivileged queries

Recursive Algorithm: Recap C 0 =CC1C1 C2C2 CbCb Start with DB D and large query set C Repeatedly choose rand. subset C i+1 o f C i : shrink by f factor v

Recursive Algorithm: Recap Start with DB D and large query set C Repeatedly choose rand. subset C i+1 o f C i : shrink by f factor Sanitize D w.r.t. small C b (use “brute force” sanitizer) Syntheticizer transforms output to small synthetic DB Fix “underprivileged” (need n ≥ Õ(f) ) Lose 2 b accuracy, “brute force” needs n ≥ 2 b |C b | C 0 =CC1C1 C2C2 CbCb n ≥ |C| o(1) by trading off b, f

And Now… Bad News Runtime cannot be subpoly in |C| or |U| Output is synthetic DB (as in positive result) General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

The Bad News For large C and U can’t get efficient sanitizers! Output is synthetic DB (as in positive result) General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

Digital Signatures Digital Signatures ( sk, vk ) Can build from one-way function [NaYu,Ro] m1m1 sig( m 1 ) m2m2 sig( m 2 ) mnmn sig( m n ) m’ sig( m’ ) valid signatures under vk Hard to forge new signature

Signatures ! No Synthetic DB Universe: ( m, s ) msg,sig pair Queries: c vk ( m, s ) output 1 iff s valid sig of m under vk m1m1 sig( m 1 ) m2m2 sig( m 2 ) mnmn sig( m n ) sanitizer m’ 1 s1s1 m’ k sksk most are valid signatures under vk inputs appear in output, no privacy! valid signatures under same vk

Can We output Synthetic DB Efficiently? |C| |U| subpolypoly subpoly poly ?? ?

Where is Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries

Hardness on Average Universe: ( vk, m, s ) key,msg,sig Queries: c i ( vk, m, s ) - i -th bit of ECC(vk) c v ( vk, m, s ) - 1 iff valid sig under vk sanitizer valid signatures under vk m’ 1 s1s1 vk’ 1 m1m1 sig( m 1 ) vkm2m2 sig( m 2 ) vk mnmn sig( m n ) vkm’ k sksk vk’ k are these keys related to vk ? Yes! At least one is vk !

Hardness on Average Samples: ( vk, m, s ) key,msg,sig Queries: c i ( vk, m, s ) - i -th bit of ECC(vk) c v ( vk, m, s ) - 1 iff valid sig under vk m’ 1 s1s1 m’ k sksk vk’ 1 vk’ k  8 i 3/4 of vk’ j agree w. ECC(vk)[i]  9 vk’ j s.t. ECC(vk’ j ), ECC(vk) are 3/4-close vk’ j = vk (error-correcting code) m’ j appears in input. No privacy! are these keys related to vk ? Yes! At least one is vk !

Where is Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries

Can We output Synthetic DB Efficiently? |C| |U| subpolypoly subpoly poly ?? ? Signatures Hard on Avg. Using PRFs

General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is hard Tight connection between |U|, |C| hard to sanitize and key, ciphertext sizes in traitor tracing Separation between efficient/non-efficient sanitizers uses [BoSaWa] scheme

Traitor Tracing: The Problem Center transmits a message to a large group Some Users leak their keys to pirates Pirates construct a clone: unauthorized decryption devices Given a Pirate Box want to find who leaked the keys E(Content) K 1 K 3 K 8 Content Pirate Box Traitors ``privacy” is violated!

Equivalence of TT and Hardness of Sanitizing Ciphertext Key Traitor Tracing Database entry Query Sanitizing hard TT PirateSanitizer for distribution of DBs (collection of)

Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme –cipher length c(n), –key length k(n), can construct: 1.Query set C of size ≈2 c(n) 2.Data universe U of size ≈2 k(n) 3.Distribution D on n -user databases with entries from U D is “ hard to sanitize ”: exists tracer that can extract an entry in D from any sanitizer’s output Separation between efficient/non-efficient sanitizers uses [BoSaWa06] scheme Violate its privacy!