Download presentation
Presentation is loading. Please wait.
Published byOphelia Elliott Modified over 8 years ago
1
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise
2
Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries Want to make deductions about the database; learn large scale trends. E.g. Learn that a drug V increases likelihood of heart disease Do not leak info about individual patients Setting
3
Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Curator Analyst Message
4
Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Curator Analyst Summary Message
5
Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Thesis: The interactive approach is the right way to give good accuracy for a given level of privacy Any non-interactive solution permitting “too accurate” answers to “too many” questions leaks private information. Message
6
Mathematical model of database and queries Attacks Somewhat accurate answers to all queries lead to privacy leakage. (Fourier analysis) [Y] (extends [DiNi]). Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY] Study of privacy leads to a variety of mathematical challenges! Plan
7
[Dinur-Nissim] Simple Model (easily justifiable) Database: n -bit binary vector x Query: vector a True answer: Dot product ax Response is ax + e = True Answer + Noise Privacy Leakage: Attacker learns a certain bit of x. Blatant Non-Privacy: Attacker learns n − o ( n ) bits of x. Model
8
Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database. Put database records in one-to-one correspondence with elements of a group. Think of a database as a function D from to {0,1}. Choose queries to ask for Fourier coefficients of D. Noisy Fourier coefficients approximately determine the Boolean function D! (Parseval identity). Fourier attack
9
Theorem: If a curator adds o(√n) noise to 0.773 fraction of responses; then an attacker can ask O(n) questions, perform O(n 3 ) computation and recover n-o(n) bits of the database. Arbitrarily large error on arbitrary and unknown 0.239 fraction on answers. Linear programming attack
10
Ask O(n) random +1/-1 questions Obtain y = Ax + e, where e is the error vector A natural approach to recover x from y: Solve: min |e'| 0 such that y=Ax'+e‘, x' in R n (hard!) Solve a linear program [D, CT, MT]: min |e'| 1 such that y=Ax'+e' x' in R n Ax ' y Linear programming attack
11
Model: Questions have O(c) large coefficients Theorem: If a curator adds o(c) noise to 0.501 fraction of responses; then an attacker can ask c questions, perform O(c 4 ) computation and reliably recover any particular bit of the database. Arbitrarily large error on arbitrary and unknown 0.499 fraction on answers. Polynomial interpolation attack
12
Assume c is prime. Think of the space of queries as a linear space. To obtain a reliable answer to query x = (1,0, …, 0), draw a degree two curve through x. Ask all queries that correspond to points on the curve. Use polynomial interpolation to carefully combine the answers. x q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 Polynomial interpolation attack
13
Privacy has a Price There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to attack. Cannot just output a noisy table. Implications
14
Non-interactive approach has inherent limitations Interactive approach works Can also publish a summary, as long as its clear which stats are accurate, and which ones are not. Future directions: Fewer queries Understand what can and what cannot be done privately
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.