Raef Bassily Penn State Local, Private, Efficient Protocols for Succinct Histograms Based on joint work with Adam Smith (Penn State) (To appear in STOC.

Slides:



Advertisements
Similar presentations
Constant-Round Private Database Queries Nenad Dedic and Payman Mohassel Boston UniversityUC Davis.
Advertisements

Estimating Distinct Elements, Optimally
Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
The Data Stream Space Complexity of Cascaded Norms T.S. Jayram David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Analysis of Computer Algorithms
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Secret Sharing Protocols [Sha79,Bla79]
Truthful Mechanisms for Combinatorial Auctions with Subadditive Bidders Speaker: Shahar Dobzinski Based on joint works with Noam Nisan & Michael Schapira.
On the (Im)Possibility of Arthur-Merlin Witness Hiding Protocols Iftach Haitner, Alon Rosen and Ronen Shaltiel 1.
An Introduction to Randomness Extractors Ronen Shaltiel University of Haifa Daddy, how do computers get random bits?
Lower Bounds for Non-Black-Box Zero Knowledge Boaz Barak (IAS*) Yehuda Lindell (IBM) Salil Vadhan (Harvard) *Work done while in Weizmann Institute. Short.
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Randomness Extractors: Motivation, Applications and Constructions Ronen Shaltiel University of Haifa.
Detection of Algebraic Manipulation with Applications to Robust Secret Sharing and Fuzzy Extractors Ronald Cramer, Yevgeniy Dodis, Serge Fehr, Carles Padro,
Short seed extractors against quantum storage Amnon Ta-Shma Tel-Aviv University 1.
Efficient Non-Malleable Codes and Key-derivations against Poly-size Tampering Circuits PRATYAY MUKHERJEE (Aarhus University) Joint work with Sebastian.
PRATYAY MUKHERJEE Aarhus University Joint work with
Extracting Randomness David Zuckerman University of Texas at Austin.
Decoding of Convolutional Codes  Let C m be the set of allowable code sequences of length m.  Not all sequences in {0,1}m are allowable code sequences!
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
The Communication Complexity of Approximate Set Packing and Covering
Data Compression CS 147 Minh Nguyen.
Spread Spectrum Chapter 7. Spread Spectrum Input is fed into a channel encoder Produces analog signal with narrow bandwidth Signal is further modulated.
Gillat Kol (IAS) joint work with Ran Raz (Weizmann + IAS) Interactive Channel Capacity.
D.J.C MacKay IEE Proceedings Communications, Vol. 152, No. 6, December 2005.
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
Amortizing Garbled Circuits Yan Huang, Jonathan Katz, Alex Malozemoff (UMD) Vlad Kolesnikov (Bell Labs) Ranjit Kumaresan (Technion) Cut-and-Choose Yao-Based.
Improving the Round Complexity of VSS in Point-to-Point Networks Jonathan Katz (University of Maryland) Chiu-Yuen Koo (Google Labs) Ranjit Kumaresan (University.
Yan Huang, Jonathan Katz, David Evans University of Maryland, University of Virginia Efficient Secure Two-Party Computation Using Symmetric Cut-and-Choose.
Advanced Topics in Algorithms and Data Structures Page 1 Parallel merging through partitioning The partitioning strategy consists of: Breaking up the given.
Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis IBM Research Almaden February.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Sketching in Adversarial Environments Or Sublinearity and Cryptography 1 Moni Naor Joint work with: Ilya Mironov and Gil Segev.
Parallel Merging Advanced Algorithms & Data Structures Lecture Theme 15 Prof. Dr. Th. Ottmann Summer Semester 2006.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst.
EECS 598 Fall ’01 Quantum Cryptography Presentation By George Mathew.
Channel Polarization and Polar Codes
Data Compression Arithmetic coding. Arithmetic Coding: Introduction Allows using “fractional” parts of bits!! Used in PPM, JPEG/MPEG (as option), Bzip.
Basics of Compression Goals: to understand how image/audio/video signals are compressed to save storage and increase transmission efficiency to understand.
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Equality Function Computation (How to make simple things complicated) Nitin Vaidya University of Illinois at Urbana-Champaign Joint work with Guanfeng.
Privacy-Aware Personalization for Mobile Advertising
Preference elicitation Communicational Burden by Nisan, Segal, Lahaie and Parkes October 27th, 2004 Jella Pfeiffer.
Abhik Majumdar, Rohit Puri, Kannan Ramchandran, and Jim Chou /24 1 Distributed Video Coding and Its Application Presented by Lei Sun.
On the Communication Complexity of SFE with Long Output Daniel Wichs (Northeastern) joint work with Pavel Hubáček.
Raef Bassily Computer Science & Engineering Pennsylvania State University New Tools for Privacy-Preserving Statistical Analysis Yahoo! Labs Sunnyvale February.
Improving Loss Resilience with Multi- Radio Diversity in Wireless Networks by Allen Miu, Hari Balakrishnan and C.E. Koksal Appeared in ACM MOBICOM 2005,
Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)
Data Stream Algorithms Lower Bounds Graham Cormode
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Efficient Private Matching and Set Intersection Mike Freedman, NYU Kobbi Nissim, MSR Benny Pinkas, HP Labs EUROCRYPT 2004.
Raptor Codes Amin Shokrollahi EPFL. BEC(p 1 ) BEC(p 2 ) BEC(p 3 ) BEC(p 4 ) BEC(p 5 ) BEC(p 6 ) Communication on Multiple Unknown Channels.
When is Key Derivation from Noisy Sources Possible?
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
The Message Passing Communication Model David Woodruff IBM Almaden.
Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)
Image Processing Architecture, © Oleh TretiakPage 1Lecture 5 ECEC 453 Image Processing Architecture Lecture 5, 1/22/2004 Rate-Distortion Theory,
Post-Modern Private Data Analysis
Representing Sets (2.3.3) Huffman Encoding Trees (2.3.4)
Query-Friendly Compression of Graph Streams
Differential Privacy in the Local Setting
Presentation transcript:

Raef Bassily Penn State Local, Private, Efficient Protocols for Succinct Histograms Based on joint work with Adam Smith (Penn State) (To appear in STOC 2015). ITA 2015

Finance.com Fashion.com WeirdStuff.com How many users like Google.com? A conundrum Google server How would the server compute aggregate statistics about users without storing user- specific information?

Succinct histograms Goal is to produce a succinct histogram: a list of frequent items (“heavy hitters”) and estimates of their frequencies while providing rigorous privacy guarantees to the users n Untrusted server A set of items (e.g. websites) = [d] = {1, …, d} Set of users = [n] Frequency of an item a is f(a) = ( ♯ users holding a)/n Finance.com Fashion.com WeirdStuff.com Item ♯... d-2 d-1 d f(1) f(2)... f(3) f(d) Item ♯... d-2 d-1 d f(1) f(2)... f(3) f(d)

Local model of Differential Privacy Algorithm Q is -local differentially private (LDP) if for any pair v, v’ [d], for all events S, v1v v2v2 vnvn Q1Q1 Q1Q1 Q2Q2 Q2Q2 QnQn QnQn z1z1 z2z2 znzn Succinct histogram is item of user z i is differentially-private report of user i LDP protocols for frequency estimation is used in Chrome web browser (RAPPOR) [Erlingsson-Korolova-Pihur’14] as a basis for other estimation tasks [Dwork-Nissim’04]

Error is measured by the worst-case estimation error: Performance measures v1v v2v2 vnvn Q1Q1 Q1Q1 Q2Q2 Q2Q2 QnQn QnQn z1z1 z2z2 znzn implicitly Succinct histogram = is item of user z i is differentially-private report of user i for some A protocol is efficient if it runs in time poly(log(d), n) Communication Complexity measured by number of bits transmitted per user.

Contributions 1.Efficient -LDP protocol with optimal error: run in time poly(log(d), n). Estimate all frequencies up to error. 2.Matching lower bound on the error. 3.Generic transformation reducing the communication complexity to 1 bit/user. Previous protocols either  ran in time [Mishra-Sandler’06, Hsu-Khanna-Roth’12, EKP’14]  or, had larger error [HKR’12] Best previous lower bound was

UHH: at least fraction of users have the same item while the rest have (i.e., “no item”) Design paradigm Reduction from a simpler problem with a unique heavy hitter (UHH problem)  Efficient protocol with optimal error for UHH  efficient protocol with optimal error for the general problem.

Construction for the UHH problem v*v* Encoder z1z1 Noising operator (error-correcting code) Encoder z2z2 Noising operator v*v* znzn Round Decoder Key idea: is the signal-to-noise ratio. Decoding succeeds when

Construction for the general setting Key insight: Decompose general scenario into multiple instances of UHH via hashing. Run parallel copies of the UHH protocol on these instances. Guarantees that w.h.p., every heavy hitter is allocated a “collision-free” copy of the UHH protocol  Protocol worst-case error = O( ) Hashing paradigm: Given pair-wise independent HASH: [d]  [K] for some fixed K = poly(n): FOR j = 1 to K FOR each user i with item v i IF j = HASH( v i ) THEN user i simulates a HH user in the UHH protocol ELSE user i simulates an idle user in the UHH protocol Hashing paradigm: Given pair-wise independent HASH: [d]  [K] for some fixed K = poly(n): FOR j = 1 to K FOR each user i with item v i IF j = HASH( v i ) THEN user i simulates a HH user in the UHH protocol ELSE user i simulates an idle user in the UHH protocol Item whose frequency

Transforming to a protocol with 1-bit reports generate public random string; one for each user User i sends a biased bit B i Conditioned on B i = 1, the public string has the same distribution as the output of local randomizer Q i Gen( Q i, v i, s i ) vivi BiBi s i Local randomizer: Q i IF B i = 1, THEN report of user i = s i ELSE ignore user i IF B i = 1, THEN report of user i = s i ELSE ignore user i  This transformation works for any local protocol not only heavy hitters. Key idea: What matters is the distribution of the output of each local randomizer.  Public string does not depend on private data: can be generated by untrusted server.  For our HH protocol, this transformation gives essentially same error and computational efficiency (Gen can be computed in O(log(d))).

Summary 1.Efficient -Local Private protocol for succinct histograms with optimal error: run in time poly(log(d), n). Estimate all frequencies up to error. 2.Matching lower bound on the error. 3.Generic transformation in a model with public randomness reducing the communication complexity to 1 bit/user.