Download presentation
Presentation is loading. Please wait.
Published byDiego Deal Modified over 9 years ago
1
Raef Bassily Penn State Local, Private, Efficient Protocols for Succinct Histograms Based on joint work with Adam Smith (Penn State) (To appear in STOC 2015). ITA 2015
2
Finance.com Fashion.com WeirdStuff.com How many users like Google.com?...... A conundrum Google server How would the server compute aggregate statistics about users without storing user- specific information?
3
Succinct histograms Goal is to produce a succinct histogram: a list of frequent items (“heavy hitters”) and estimates of their frequencies while providing rigorous privacy guarantees to the users....... n 1 2...... Untrusted server A set of items (e.g. websites) = [d] = {1, …, d} Set of users = [n] Frequency of an item a is f(a) = ( ♯ users holding a)/n Finance.com Fashion.com WeirdStuff.com... 1 2 3 Item ♯... d-2 d-1 d f(1) f(2)... f(3) f(d)... 1 2 3 Item ♯... d-2 d-1 d f(1) f(2)... f(3) f(d)
4
Local model of Differential Privacy Algorithm Q is -local differentially private (LDP) if for any pair v, v’ [d], for all events S, v1v1...... v2v2 vnvn Q1Q1 Q1Q1 Q2Q2 Q2Q2 QnQn QnQn...... z1z1 z2z2 znzn Succinct histogram is item of user z i is differentially-private report of user i LDP protocols for frequency estimation is used in Chrome web browser (RAPPOR) [Erlingsson-Korolova-Pihur’14] as a basis for other estimation tasks [Dwork-Nissim’04]
5
Error is measured by the worst-case estimation error: Performance measures v1v1...... v2v2 vnvn Q1Q1 Q1Q1 Q2Q2 Q2Q2 QnQn QnQn...... z1z1 z2z2 znzn implicitly Succinct histogram = is item of user z i is differentially-private report of user i for some A protocol is efficient if it runs in time poly(log(d), n) Communication Complexity measured by number of bits transmitted per user.
6
Contributions 1.Efficient -LDP protocol with optimal error: run in time poly(log(d), n). Estimate all frequencies up to error. 2.Matching lower bound on the error. 3.Generic transformation reducing the communication complexity to 1 bit/user. Previous protocols either ran in time [Mishra-Sandler’06, Hsu-Khanna-Roth’12, EKP’14] or, had larger error [HKR’12] Best previous lower bound was
7
UHH: at least fraction of users have the same item while the rest have (i.e., “no item”) Design paradigm Reduction from a simpler problem with a unique heavy hitter (UHH problem) Efficient protocol with optimal error for UHH efficient protocol with optimal error for the general problem.
8
Construction for the UHH problem v*v* Encoder z1z1 Noising operator (error-correcting code) Encoder z2z2 Noising operator v*v*...... znzn Round Decoder Key idea: is the signal-to-noise ratio. Decoding succeeds when
9
Construction for the general setting Key insight: Decompose general scenario into multiple instances of UHH via hashing. Run parallel copies of the UHH protocol on these instances. Guarantees that w.h.p., every heavy hitter is allocated a “collision-free” copy of the UHH protocol Protocol worst-case error = O( ) Hashing paradigm: Given pair-wise independent HASH: [d] [K] for some fixed K = poly(n): FOR j = 1 to K FOR each user i with item v i IF j = HASH( v i ) THEN user i simulates a HH user in the UHH protocol ELSE user i simulates an idle user in the UHH protocol Hashing paradigm: Given pair-wise independent HASH: [d] [K] for some fixed K = poly(n): FOR j = 1 to K FOR each user i with item v i IF j = HASH( v i ) THEN user i simulates a HH user in the UHH protocol ELSE user i simulates an idle user in the UHH protocol Item whose frequency
10
Transforming to a protocol with 1-bit reports generate public random string; one for each user User i sends a biased bit B i Conditioned on B i = 1, the public string has the same distribution as the output of local randomizer Q i Gen( Q i, v i, s i ) vivi BiBi s i Local randomizer: Q i IF B i = 1, THEN report of user i = s i ELSE ignore user i IF B i = 1, THEN report of user i = s i ELSE ignore user i This transformation works for any local protocol not only heavy hitters. Key idea: What matters is the distribution of the output of each local randomizer. Public string does not depend on private data: can be generated by untrusted server. For our HH protocol, this transformation gives essentially same error and computational efficiency (Gen can be computed in O(log(d))).
11
Summary 1.Efficient -Local Private protocol for succinct histograms with optimal error: run in time poly(log(d), n). Estimate all frequencies up to error. 2.Matching lower bound on the error. 3.Generic transformation in a model with public randomness reducing the communication complexity to 1 bit/user.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.