Foundations of Privacy Lecture 6 Lecturer: Moni Naor.

Slides:



Advertisements
Similar presentations
Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Advertisements

I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Many-to-one Trapdoor Functions and their Relations to Public-key Cryptosystems M. Bellare S. Halevi A. Saha S. Vadhan.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Foundations of Cryptography Lecture 11 Lecturer: Moni Naor.
CIS 5371 Cryptography 3b. Pseudorandomness.
Digital Signatures and Hash Functions. Digital Signatures.
Foundations of Cryptography Lecture 5 Lecturer: Moni Naor.
1 Introduction CSE 5351: Introduction to cryptography Reading assignment: Chapter 1 of Katz & Lindell.
Foundations of Cryptography Lecture 13 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
NON-MALLEABLE EXTRACTORS AND SYMMETRIC KEY CRYPTOGRAPHY FROM WEAK SECRETS Yevgeniy Dodis and Daniel Wichs (NYU) STOC 2009.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Traitor Tracing Papers Benny Chor, Amos Fiat and Moni Naor, Tracing Traitors (1994) Moni Naor and Benny Pinkas, Threshold Traitor Tracing (1998) Presented.
Foundations of Cryptography Lecture 12 Lecturer: Moni Naor.
Computability and Complexity 20-1 Computability and Complexity Andrei Bulatov Random Sources.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Session 5 Hash functions and digital signatures. Contents Hash functions – Definition – Requirements – Construction – Security – Applications 2/44.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
CMSC 414 Computer and Network Security Lecture 6 Jonathan Katz.
CMSC 414 Computer and Network Security Lecture 7 Jonathan Katz.
Introduction to Modern Cryptography, Lecture ?, 2005 Broadcast Encryption, Traitor Tracing, Watermarking.
Co-operative Private Equality Test(CPET) Ronghua Li and Chuan-Kun Wu (received June 21, 2005; revised and accepted July 4, 2005) International Journal.
Private Information Retrieval. What is Private Information retrieval (PIR) ? Reduction from Private Information Retrieval (PIR) to Smooth Codes Constructions.
Foundations of Privacy Lecture 7 Lecturer: Moni Naor.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
CMSC 414 Computer and Network Security Lecture 6 Jonathan Katz.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
1 CIS 5371 Cryptography 9. Data Integrity Techniques.
On Everlasting Security in the Hybrid Bounded Storage Model Danny Harnik Moni Naor.
Foundations of Cryptography Lecture 10: Pseudo-Random Permutations and the Security of Encryption Schemes Lecturer: Moni Naor Announce home )deadline.
Computer Security CS 426 Lecture 3
Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 8 Lecturer: Moni Naor.
Encryption Schemes Second Pass Brice Toth 21 November 2001.
Foundations of Cryptography Rahul Jain CS6209, Jan – April 2011
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
8. Data Integrity Techniques
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Cryptography Lecture 8 Stefan Dziembowski
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
A Few Simple Applications to Cryptography Louis Salvail BRICS, Aarhus University.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
1 Lect. 13 : Public Key Encryption RSA ElGamal. 2 Shamir Rivest Adleman RSA Public Key Systems  RSA is the first public key cryptosystem  Proposed in.
Foundations of Cryptography Lecture 6 Lecturer: Moni Naor.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 2 – Cryptographic.
1 Information Security – Theory vs. Reality , Winter Lecture 10: Garbled circuits and obfuscation Eran Tromer Slides credit: Boaz.
Lecture 2: Introduction to Cryptography
15-499Page :Algorithms and Applications Cryptography I – Introduction – Terminology – Some primitives – Some protocols.
NEW DIRECTIONS IN CRYPTOGRAPHY Made Harta Dwijaksara, Yi Jae Park.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
CS555Spring 2012/Topic 71 Cryptography CS 555 Topic 7: Stream Ciphers and CPA Security.
多媒體網路安全實驗室 Anonymous Authentication Systems Based on Private Information Retrieval Date: Reporter: Chien-Wen Huang 出處: Networked Digital Technologies,
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Does Privacy Require True Randomness? Yevgeniy Dodis New York University Joint work with Carl Bosley.
A plausible approach to computer-aided cryptographic proofs (a collection of thoughts) Shai Halevi – May 2005.
Data Integrity / Data Authentication. Definition Authentication (Signature) algorithm - A Verification algorithm - V Authentication key – k Verification.
Privacy-preserving Release of Statistics: Differential Privacy
Digital Signature Schemes and the Random Oracle Model
Cryptography Lecture 10.
Topic 7: Pseudorandom Functions and CPA-Security
Foundations of Privacy Lecture 7
Cryptography Lecture 5.
Cryptography Lecture 8.
Topic 13: Message Authentication Code
Published in: IEEE Transactions on Industrial Informatics
Cryptography Lecture 10.
Cryptography Lecture 9.
Presentation transcript:

Foundations of Privacy Lecture 6 Lecturer: Moni Naor

Recap of last week’s lecture Counting Queries –The BLR Algorithm –Efficient Algorithm –Hardness Results

query 1, query 2,... Synthetic DB: Output is a DB Database answer 1 answer 3 answer 2 ? Sanitizer Synthetic DB: output also a DB (of entries from same universe X ), user reconstructs answers by evaluating query on output DB Software and people compatible Consistent answers

Counting Queries Queries with low sensitivity Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis Assume all queries given in advance U Database D of size n Query c Non-interactive

And Now… Bad News Runtime cannot be subpoly in |C| or |U| Output is synthetic DB (as in positive result) General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

The Bad News For large C and U can’t get efficient sanitizers! Output is synthetic DB (as in positive result) General output Exponential Mechanism cannot be implemented Want hardness… Got Crypto?

Showing (Cryptographic) Hardness Have to come with universe U and concept class C A distribution on –databases –Concepts that is hard to sanitize The distribution may use cryptographic primitives

Digital Signatures Digital Signatures ( sk, vk ) Can build from one-way function [NaYu,Ro] m1m1 sig( m 1 ) m2m2 sig( m 2 ) mnmn sig( m n ) m’ sig( m’ ) valid signatures under vk Hard to forge new signature

Signatures ! No Synthetic DB Universe: ( m, s ) msg,sig pair Queries: c vk ( m, s ) output 1 iff s valid sig of m under vk m1m1 sig( m 1 ) m2m2 sig( m 2 ) mnmn sig( m n ) sanitizer m’ 1 s1s1 m’ k sksk most are valid signatures under vk inputs appear in output, no privacy! valid signatures under same vk

Can We output Synthetic DB Efficiently? |C| |U| subpolypoly subpoly poly ?? ?

Where is the Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries

Hardness on Average Universe: ( vk, m, s ) key,msg,sig Queries: c i ( vk, m, s ) - i -th bit of ECC(vk) c v ( vk, m, s ) - 1 iff valid sig under vk sanitizer valid signatures under vk m’ 1 s1s1 vk’ 1 m1m1 sig( m 1 ) vkm2m2 sig( m 2 ) vk mnmn sig( m n ) vkm’ k sksk vk’ k are these keys related to vk ? Yes! At least one is vk ! Error correcting code

Hardness on Average Samples: ( vk, m, s ) key,msg,sig Queries: c i ( vk, m, s ) - i -th bit of ECC(vk) c v ( vk, m, s ) - 1 iff valid sig under vk m’ 1 s1s1 m’ k sksk vk’ 1 vk’ k  8 i 3/4 of vk’ j agree w. ECC(vk)[i]  9 vk’ j s.t. ECC(vk’ j ), ECC(vk) are 3/4- close vk’ j = vk ( error-correcting code ) m’ j appears in input. No privacy! are these keys related to vk ? Yes! At least one is vk !

Where is Hardness Coming From? Signature example: Hard to satisfy a given query Easy to maintain utility for all queries but one More natural: Easy to satisfy each individual query Hard to maintain utility for most queries Ullman-Vadhan: even marginals on 2 variables hard

Can We output Synthetic DB Efficiently? |C| |U| subpolypoly subpoly poly ?? ? Signatures Hard on Avg. Using PRFs

Hardness with PRFs Let F={ f s |s seed} be a family of Pseudo-random functions. Length of seed = k Pseudo-random functions: a family of efficiently computable functions, such that –a random function from the family is indistinguishable ( via black-box access ) from truly random functions. f s : [ℓ]  [ℓ] Data Universe U = {(a, b) : a, b 2 [ℓ]}. Concepts = { c s |s seed}. c s ((a, b) ) = 1 iff f s (a)=b Polynomial size

The Hard-to-sanitize Distribution The distribution D on samples Generate a key s 2 {0, 1} k Generate n distinct elements a 1,..., a n 2 [ ℓ ]. The i -th entry in the database X is x i = (a i, f s (a i )). Claim : any differentially private sanitizer A cannot be better than 1/3 correct

The function f s is a pseudorandom function – with overwhelming probability over the choice of seed s, for any a 2 [ ℓ ] that does not appear in a 1,..., a n A sanitizer A cannot predict f s ( a) any better than it could a random function Expect : no more than a (1/ℓ + neg())- fraction of the a ’s in A(X) that are not in X to appear most frequently with the correct b. Suppose this event does not occur. Since all of the items in the input X satisfy the concept c s i.e. with probability noticeably greater than 1/ ℓ.

General output sanitizers Theorem Traitor tracing schemes exist if and only if sanitizing is hard Tight connection between |U|, |C| hard to sanitize and key, ciphertext sizes in traitor tracing Separation between efficient/non-efficient sanitizers uses [BoSaWa] scheme

Traitor Tracing: The Problem Center transmits a message to a large group Some Users leak their keys to pirates Pirates construct a clone: unauthorized decryption devices Given a Pirate Box want to find who leaked the keys E(Content) K 1 K 3 K 8 Content Pirate Box Traitors ``privacy” is violated!

Need semantic security! Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup : generates a key bk for the broadcaster and N subscriber keys k 1,..., k N. Encrypt : given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt : takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1,...,N} of a key k i used to create the pirate box

Simple Example of Tracing Traitor Let E K (m) be a good shared key encryption sche Key generation: generate independent keys for E bk = k 1,..., k N Encrypt : for bit b generate independent ciphertexts E K 1 (b), E K 2 (b), … E K N (b) Decrypt : using k i : decrypt i th ciphertext Tracing algorithm: using hybrid argument Properties : ciphertext length N, key length 1.

Equivalence of TT and Hardness of Sanitizing Ciphertext Key Traitor Tracing Database entry Query Sanitizing hard TT PirateSanitizer for distribution of DBs (collection of)

Traitor Tracing ! Hard Sanitizing Theorem If exists TT scheme –cipher length c(n), –key length k(n), can construct: 1.Query set C of size ≈2 c(n) 2.Data universe U of size ≈2 k(n) 3.Distribution D on n -user databases w\ entries from U D is “ hard to sanitize ”: exists tracer that can extract an entry in D from any sanitizer’s output Separation between efficient/non-efficient sanitizers uses [BoSaWa06] scheme Violate its privacy!

Need semantic security! Traitor Tracing ! Hard Sanitizing A (private-key) traitor-tracing scheme consists of algorithms Setup, Encrypt, Decrypt and Trace. Setup : generates a key bk for the broadcaster and N subscriber keys k 1,..., k N. Encrypt : given a bit b generates ciphertext using the broadcaster’s key bk. Decrypt : takes a given ciphertext and using any of the subscriber keys retrieves the original bit Tracing algorithm: gets bk and oracle access to a pirate decryption box. Outputs an i 2 {1,...,N} of a key k i used to create the pirate box

Collusion Important parameter of a traitor-tracing scheme its collusion-resistance A scheme is t -resilient if tracing is guaranteed to work as long as no more than t keys were used to create the pirate decoder. When t = N scheme is said to be fully resilient. Other parameters ciphertext and private key lengths c(n) and k(n). One-time t-resilient TT scheme: semantic security is only guaranteed against adversaries given a single ciphertext Need it

Data universe : all possible keys U ={0,1} k(n). Concept class C : a concept for every possible ciphertext - for every m 2 {0,1} c(n) –The concept c m on input a key-string K outputs the decryption of m using the key K Hard-to-sanitize distribution : –Setup to generate n decryption keys for the users, database X.

Can view any sanitizer that maintains utility as – adversary that outputs an “ object ” that decrypts encryptions of 0 or 1 correctly. We can use the traitor-tracing algorithm on such a sanitizer to trace one of the keys in the input of the sanitizer.

From Hard to Sanitize to Tracing Traitors Given hard to sanitize distributions, can create a weak TT scheme: Ciphertext: generate database of individuals. Each key is a separate subset. Ciphertext corresponds to queries: knowing individuals allows approximating the query on the database Need coordination between the different part, since the approximations may differ.

Interactive Model Data Multiple queries, chosen adaptively ? query 1 query 2 Sanitizer

Counting Queries: answering queries interactively Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis Queries given one by one and should be answered. U Database D of size n Query c Interactive

Can we answer queries when not known in advance? Can always answer with independent noise –Limited to number of queries that is smaller than database size. We do not know the future but we do know the past! –Can answer based on past answers

Idea: Maintain list of Possible Databases DStart with D 0 = list of all databases of size m Each round j : D –if list D j-1 is representative: answer according to average database in list –Otherwise: prune the list to maintain consistency D D j-1 DDjDDj

Low sensitivity! DInitialize D 0 = { all databases of size m over U}. DEach round D j-1 = {x 1, x 2, …} where x i of size m For each query c 1, c 2, …, c k in turn: DLet A j à Average i 2 D j-1 min{d(x*,x i ), √n} DIf A j is small: answer according to median db in D j-1 –DD –D j à D j-1 DIf A j is large: remove all db’s that are far away to get D j –Give true answer Noisy threshold Plus noise

Need to show Accuracy and functionality: The result is accurate DIf A j is large: many of x i 2 D j-1 are removed D D j is never empty Privacy Not many large A j Can release large rounds Can release noisy answers.

Why can we release when large rounds occur? Do not expect more than O(m) large rounds Make the threshold noisy For every pair of neighboring databases: D and D’ Consider vector of thresholds If far away from threshold – can be the same in both If close to threshold: can correct at cost –Cannot occur too frequently

Why is there a good x i Queries with low sensitivity Counting-queries C is a set of predicates c: U  {0,1} Query : how many D participants satisfy c ? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: error anyway inherent in statistical analysis U Database D of size n Query c Sample F of size m approximates D on all given c

m is Õ(n 2/3 log|C|) There exists x of size m =Õ((n\α) 2· log|C|) s.t. max cj dist(F good,D) ≤ α For α=Õ(n 2/3 log|C|),