Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
New Evidence That Quantum Mechanics Is Hard to Simulate on Classical Computers Scott Aaronson Parts based on joint work with Alex Arkhipov.
Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
Sublinear Algorithms … Lecture 23: April 20.
Shortest Vector In A Lattice is NP-Hard to approximate
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
An Ω(n 1/3 ) Lower Bound for Bilinear Group Based Private Information Retrieval Alexander Razborov Sergey Yekhanin.
Locally Decodable Codes from Nice Subsets of Finite Fields and Prime Factors of Mersenne Numbers Kiran Kedlaya Sergey Yekhanin MIT Microsoft Research.
Paper by: Craig Gentry Presented By: Daniel Henneberger.
6.1 Vector Spaces-Basic Properties. Euclidean n-space Just like we have ordered pairs (n=2), and ordered triples (n=3), we also have ordered n-tuples.
Pattern Recognition and Machine Learning
© 2011 Pearson Education, Inc
Factoring Polynomials
Curve fit metrics When we fit a curve to data we ask: –What is the error metric for the best fit? –What is more accurate, the data or the fit? This lecture.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Poorvi Vora/CTO/IPG/HP 01/03 1 The channel coding theorem and the security of binary randomization Poorvi Vora Hewlett-Packard Co.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
PART 7 Constructing Fuzzy Sets 1. Direct/one-expert 2. Direct/multi-expert 3. Indirect/one-expert 4. Indirect/multi-expert 5. Construction from samples.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Estimation 8.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM.
Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
Factoring Polynomials
Interpolation. Interpolation is important concept in numerical analysis. Quite often functions may not be available explicitly but only the values of.
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
CISE301_Topic11 CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4:
SECOND-ORDER DIFFERENTIAL EQUATIONS Series Solutions SECOND-ORDER DIFFERENTIAL EQUATIONS In this section, we will learn how to solve: Certain.
1 Fingerprinting techniques. 2 Is X equal to Y? = ? = ?
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Privacy of profile-based ad targeting Alexander Smal and Ilya Mironov.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb (Technion)
CS555Topic 251 Cryptography CS 555 Topic 25: Quantum Crpytography.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
7 - 1 © 1998 Prentice-Hall, Inc. Chapter 7 Inferences Based on a Single Sample: Estimation with Confidence Intervals.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
1 Fast Polynomial and Integer Multiplication Jeremy R. Johnson.
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
6-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Curve fit metrics When we fit a curve to data we ask: –What is the error metric for the best fit? –What is more accurate, the data or the fit? This lecture.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
1 Approximating functions, polynomial interpolation (Lagrange and Newton’s divided differences) formulas, error approximations.
Estimating standard error using bootstrap
On the Size of Pairing-based Non-interactive Arguments
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Differential Privacy in Practice
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
The Curve Merger (Dvir & Widgerson, 2008)
1 Department of Engineering, 2 Department of Mathematics,
Key Management Network Systems Security
Function - when every x is paired to one y
Numerical Analysis Lecture 2.
Lecture 6: Counting triangles Dynamic graphs & sampling
One Way Functions Motivation Complexity Theory Review, Motivation
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Presentation transcript:

Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise

Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries Want to make deductions about the database; learn large scale trends. E.g. Learn that a drug V increases likelihood of heart disease Do not leak info about individual patients Setting

Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Curator Analyst Message

Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Curator Analyst Summary Message

Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Thesis: The interactive approach is the right way to give good accuracy for a given level of privacy Any non-interactive solution permitting “too accurate” answers to “too many” questions leaks private information. Message

Mathematical model of database and queries Attacks Somewhat accurate answers to all queries lead to privacy leakage. (Fourier analysis) [Y] (extends [DiNi]). Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY] Study of privacy leads to a variety of mathematical challenges! Plan

[Dinur-Nissim] Simple Model (easily justifiable) Database: n -bit binary vector x Query: vector a True answer: Dot product ax Response is ax + e = True Answer + Noise Privacy Leakage: Attacker learns a certain bit of x. Blatant Non-Privacy: Attacker learns n − o ( n ) bits of x. Model

Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database. Put database records in one-to-one correspondence with elements of a group. Think of a database as a function D from to {0,1}. Choose queries to ask for Fourier coefficients of D. Noisy Fourier coefficients approximately determine the Boolean function D! (Parseval identity). Fourier attack

Theorem: If a curator adds o(√n) noise to fraction of responses; then an attacker can ask O(n) questions, perform O(n 3 ) computation and recover n-o(n) bits of the database. Arbitrarily large error on arbitrary and unknown fraction on answers. Linear programming attack

Ask O(n) random +1/-1 questions Obtain y = Ax + e, where e is the error vector A natural approach to recover x from y: Solve: min |e'| 0 such that y=Ax'+e‘, x' in R n (hard!) Solve a linear program [D, CT, MT]: min |e'| 1 such that y=Ax'+e' x' in R n Ax ' y Linear programming attack

Model: Questions have O(c) large coefficients Theorem: If a curator adds o(c) noise to fraction of responses; then an attacker can ask c questions, perform O(c 4 ) computation and reliably recover any particular bit of the database. Arbitrarily large error on arbitrary and unknown fraction on answers. Polynomial interpolation attack

Assume c is prime. Think of the space of queries as a linear space. To obtain a reliable answer to query x = (1,0, …, 0), draw a degree two curve through x. Ask all queries that correspond to points on the curve. Use polynomial interpolation to carefully combine the answers. x q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 Polynomial interpolation attack

Privacy has a Price There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to attack. Cannot just output a noisy table. Implications

Non-interactive approach has inherent limitations Interactive approach works Can also publish a summary, as long as its clear which stats are accurate, and which ones are not. Future directions: Fewer queries Understand what can and what cannot be done privately