Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise
Database of information about individuals E.g. Medical history, Census data, Customer info. Need to guarantee confidentiality of individual entries Want to make deductions about the database; learn large scale trends. E.g. Learn that a drug V increases likelihood of heart disease Do not leak info about individual patients Setting
Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Curator Analyst Message
Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Curator Analyst Summary Message
Two approaches to database privacy: Interactive: Analyst asks questions; curator returns approximate answers Non-interactive: Publish a “summary” of the database; analyst can use summary to get answers Thesis: The interactive approach is the right way to give good accuracy for a given level of privacy Any non-interactive solution permitting “too accurate” answers to “too many” questions leaks private information. Message
Mathematical model of database and queries Attacks Somewhat accurate answers to all queries lead to privacy leakage. (Fourier analysis) [Y] (extends [DiNi]). Somewhat accurate answers to a fraction of queries lead to privacy leakage. (Linear programming / Polynomial interpolation) [DMT,DY] Study of privacy leads to a variety of mathematical challenges! Plan
[Dinur-Nissim] Simple Model (easily justifiable) Database: n -bit binary vector x Query: vector a True answer: Dot product ax Response is ax + e = True Answer + Noise Privacy Leakage: Attacker learns a certain bit of x. Blatant Non-Privacy: Attacker learns n − o ( n ) bits of x. Model
Theorem: If a curator adds o(√n) noise to every response; then an attacker can ask n questions, perform O(n log n) computation and recover n-o(n) bits of the database. Put database records in one-to-one correspondence with elements of a group. Think of a database as a function D from to {0,1}. Choose queries to ask for Fourier coefficients of D. Noisy Fourier coefficients approximately determine the Boolean function D! (Parseval identity). Fourier attack
Theorem: If a curator adds o(√n) noise to fraction of responses; then an attacker can ask O(n) questions, perform O(n 3 ) computation and recover n-o(n) bits of the database. Arbitrarily large error on arbitrary and unknown fraction on answers. Linear programming attack
Ask O(n) random +1/-1 questions Obtain y = Ax + e, where e is the error vector A natural approach to recover x from y: Solve: min |e'| 0 such that y=Ax'+e‘, x' in R n (hard!) Solve a linear program [D, CT, MT]: min |e'| 1 such that y=Ax'+e' x' in R n Ax ' y Linear programming attack
Model: Questions have O(c) large coefficients Theorem: If a curator adds o(c) noise to fraction of responses; then an attacker can ask c questions, perform O(c 4 ) computation and reliably recover any particular bit of the database. Arbitrarily large error on arbitrary and unknown fraction on answers. Polynomial interpolation attack
Assume c is prime. Think of the space of queries as a linear space. To obtain a reliable answer to query x = (1,0, …, 0), draw a degree two curve through x. Ask all queries that correspond to points on the curve. Use polynomial interpolation to carefully combine the answers. x q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 Polynomial interpolation attack
Privacy has a Price There is no safe way to avoid increasing the noise as the number of queries increases Applies to Non-Interactive Setting Any non-interactive solution permitting answers that are “too accurate” to “too many” questions is vulnerable to attack. Cannot just output a noisy table. Implications
Non-interactive approach has inherent limitations Interactive approach works Can also publish a summary, as long as its clear which stats are accurate, and which ones are not. Future directions: Fewer queries Understand what can and what cannot be done privately