Download presentation
Presentation is loading. Please wait.
Published byJair Sherrell Modified over 10 years ago
1
When Random Sampling Preserves Privacy Kamalika Chaudhuri U.C.Berkeley Nina Mishra U.Virginia
2
The Problem Setting: Table : Set of rows Sanitizer: Releases each row with probability p What are the conditions under which this sanitizer preserves privacy? Database Sanitizer Sanitized Database
3
Search Data AOL released user search data: Replaced usernames with random ids
4
Search Data “Berkeley restaurants” “Low degree spanning trees” “Tickets to India” “Privacy sampling” “Airfare Santa Barbara” Kamalika “Traffic on 101N” “Restaurants Mountain View” “Rank Aggregation” “Memory bound functions” “Crypto registration” “Falafel Charlottesville” “Query Auditing” “Clustering streaming” “Tickets to SFO” “Privacy sampling” CynthiaNina
5
U.S. Census Data Random sample of preprocessed data: Removing unique values Merging cells with less than a threshold number of individuals
6
Privacy Definition [DMNS06,…] -Indistinguishability Two tables T, T’, differ by a single row S : Output of the sanitizer Pr[S | T] ≤ (1 + ) Pr[S | T’] TT’ S
7
An Example Cannot always get -Indistinguishability with random sampling T : n rows with value 0 T’ : n-1 rows with value 0, 1 row with value 1 S : 1 row with value 1, s – 1 rows with value 0 TT’ S
8
Privacy Definition[DKMMiNa06,BDMN05] (,-Indistinguishability : Two tables T, T’, differ by a single row S : Output of the sanitizer With probability at least 1 - , Pr[S | T] ≤ (1 + ) Pr[S | T’] TT’ S
9
An Example Cannot always get (,- Indistinguishability for all tables A table where all rows have unique values TT’ S
10
When does Random Sampling preserve Privacy? Parameters: (, )-indistinguishability k : number of distinct values in T t : number of values which occur at most log(k/)/ times in T Theorem: This can be guaranteed if p < (if t = 0) p < Õ( /t)
11
Classification of Values Rare Value Infrequent Value Common Value Number of rows with value v log(k/)/log(k/)/p For (, )-indistinguishability:
12
Rare Values If a rare value v is observed in a random sample, Pr[S|T’]>(1 + log(k Pr[S|T] TT’ S
13
Common Values For a common value v, Pr[S|T] ≈ Pr[S|T’] Typically, the number of rows with a common value is close to its expectation TT’ S log(k/ )/ log(k/)/p RareCommonInfrequent
14
Infrequent Values For an infrequent value v, Pr[S|T] ≈ Pr[S|T’] Typically, the number of rows with an infrequent value is at most log(k/) away from its expected value TT’ S log(k/ )/ log(k/)/p RareCommonInfrequent
15
Properties of a Good Sample A sample S is -indistinguishable if: No rare values The number of rows with common value v is within a constant factor of expectation The number of rows with infrequent value v is at most an additive O(log(k/)) more than its expected value
16
When does Random Sampling preserve Privacy? Such a sample occurs with probability at least 1 - if p < (if t=0) p < Õ( /t)
17
Utility of Random Sampling Assuming no rare values: Error in the frequency of each value : additive 1/√n [DMNS06] Estimates histogram with an additive error of 1/n in each frequency Sampling may give a compact representation of the histogram
18
Conclusions Random sampling preserves privacy only when there are few rare values With rare values, the probability of failure can be high = (1/n) as opposed to 1/2^n [DKMMiNa06, BDMN05] Error in estimating the frequency of each value can be high Additive 1/√n as opposed to 1/n of [DMNS06]
19
Thank You
20
The Problem What are the conditions under which this sanitizer preserves privacy?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.