Restriction Access, Population Recovery & Partial Identification Avi Wigderson IAS, Princeton Joint with Zeev Dvir Anup Rao Amir Yehudayoff.

Slides:



Advertisements
Similar presentations
1/15 Agnostically learning halfspaces FOCS /15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )
Advertisements

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
Quantum Lower Bounds The Polynomial and Adversary Methods Scott Aaronson September 14, 2001 Prelim Exam Talk.
Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
How to Solve Longstanding Open Problems In Quantum Computing Using Only Fourier Analysis Scott Aaronson (MIT) For those who hate quantum: The open problems.
Agnostically Learning Decision Trees Parikshit Gopalan MSR-Silicon Valley, IITB00. Adam Tauman Kalai MSR-New England Adam R. Klivans UT Austin
Hardness of Reconstructing Multivariate Polynomials. Parikshit Gopalan U. Washington Parikshit Gopalan U. Washington Subhash Khot NYU/Gatech Rishi Saket.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
On the Complexity of Parallel Hardness Amplification for One-Way Functions Chi-Jen Lu Academia Sinica, Taiwan.
Analysis of Algorithms
Unconditional Weak derandomization of weak algorithms Explicit versions of Yao s lemma Ronen Shaltiel, University of Haifa :
Pseudorandom Generators for Polynomial Threshold Functions 1 Raghu Meka UT Austin (joint work with David Zuckerman)
Deterministic Extractors for Small Space Sources Jesse Kamp, Anup Rao, Salil Vadhan, David Zuckerman.
Computing with adversarial noise Aram Harrow (UW -> MIT) Matt Hastings (Duke/MSR) Anup Rao (UW)
LEARNIN HE UNIFORM UNDER DISTRIBUTION – Toward DNF – Ryan ODonnell Microsoft Research January, 2006.
Sep 16, 2013 Lirong Xia Computational social choice The easy-to-compute axiom.
Randomness Extractors: Motivation, Applications and Constructions Ronen Shaltiel University of Haifa.
Russell Impagliazzo ( IAS & UCSD ) Ragesh Jaiswal ( Columbia U. ) Valentine Kabanets ( IAS & SFU ) Avi Wigderson ( IAS ) ( based on [IJKW08, IKW09] )
Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS.
COMPLEXITY THEORY CSci 5403 LECTURE XVI: COUNTING PROBLEMS AND RANDOMIZED REDUCTIONS.
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
Direct Product : Decoding & Testing, with Applications Russell Impagliazzo (IAS & UCSD) Ragesh Jaiswal (Columbia) Valentine Kabanets (SFU) Avi Wigderson.
THE CENTRAL LIMIT THEOREM
Extracting Randomness David Zuckerman University of Texas at Austin.
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Sep 15, 2014 Lirong Xia Computational social choice The easy-to-compute axiom.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Simple extractors for all min- entropies and a new pseudo- random generator Ronen Shaltiel Chris Umans.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
DNF Sparsification and Counting Raghu Meka (IAS, Princeton) Parikshit Gopalan (MSR, SVC) Omer Reingold (MSR, SVC)
Distributional Property Estimation Past, Present, and Future Gregory Valiant (Joint work w. Paul Valiant)
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Enumerative Lattice Algorithms in any Norm via M-Ellipsoid Coverings Daniel Dadush (CWI) Joint with Chris Peikert and Santosh Vempala.
Derandomized parallel repetition theorems for free games Ronen Shaltiel, University of Haifa.
What is Statistical Modeling
Arithmetic Hardness vs. Randomness Valentine Kabanets SFU.
Population Stratification with Limited Data By Kamalika Chaudhuri, Eran Halperin, Satish Rao and Shuheng Zhou.
Oded Regev Tel-Aviv University On Lattices, Learning with Errors, Learning with Errors, Random Linear Codes, Random Linear Codes, and Cryptography and.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Learning Parities with Structured Noise Sanjeev Arora, Rong Ge Princeton University.
The Power and Weakness of Randomness (when you are short on time) Avi Wigderson School of Mathematics Institute for Advanced Study.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Some Fundamental Insights of Computational Complexity Theory Avi Wigderson IAS, Princeton, NJ Hebrew University, Jerusalem.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
List Decoding Using the XOR Lemma Luca Trevisan U.C. Berkeley.
Pseudo-random generators Talk for Amnon ’ s seminar.
Almost SL=L, and Near-Perfect Derandomization Oded Goldreich The Weizmann Institute Avi Wigderson IAS, Princeton Hebrew University.
Information Complexity Lower Bounds
Dana Ron Tel Aviv University
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Introduction to Machine Learning
Pseudorandomness when the odds are against you
Background: Lattices and the Learning-with-Errors problem
Linear sketching with parities
Chapter 11 Limitations of Algorithm Power
What is The Optimal Number of Features
Clustering.
Impossibility of SNARGs
Switching Lemmas and Proof Complexity
On Derandomizing Algorithms that Err Extremely Rarely
Presentation transcript:

Restriction Access, Population Recovery & Partial Identification Avi Wigderson IAS, Princeton Joint with Zeev Dvir Anup Rao Amir Yehudayoff

Restriction Access, A new model of “Grey-box” access

Systems, Models, Observations From Input-Output (I 1,O 1 ), (I 2,O 2 ), (I 3,O 3 ), ….? Typically more!

Black-box access Successes & Limits Learning: PAC, membership, statistical…queries Decision trees, DNFs? Cryptography: semantic, CPA, CCA, … security Cold boot, microwave,… attacks? Optimization: Membership, separation,… oracles Strongly polynomial algorithms? Pseudorandomness: Hardness vs. Randomness Derandomizing specific algorithms? Complexity:  2 = NP NP What problems can we solve if P=NP?

The gray scale of access f:  n   m D: “device” computing f (from a family of devices) D x 1, f(x 1 ) x 2, f(x 2 ) x 3, f(x 3 ) …. D D How to model? Many specific ideas. Ours: general, clean Black Box Gray Box – natural starting point - natural intermediate pt Clear Box

Restriction Access (RA) f:  n   m D: “device” computing f Restriction:  = (x,L), L  [n], x   n, L L live vars Observations: ( , D|  ) D|  (simplified after fixing) computes f|  on L Black L =  Gray Clear L = [n] (x,f(x)) ( , D|  ) ( x,D) D f(x) x 1, x 2, *,* …. * ||

Example: Decision Tree x1x1 x4x4 x2x2 x3x3 x2x D  = (x,L) L = {3,4} x = (1010) D|  = x4x4 x3x3 0 10

Modeling choices (RA-PAC) Restriction:  = (x,L), L  [n], x   n, unknown D Input x : friendly, adversarial, random Unknown distribution (as in PAC) Live vars L : friendly, adversarial, random  - independent dist (as in random restrictions)

RA-PAC Results Probably, Approximately Correct (PAC) learning of D, from restrictions with each variable remains alive with prob  Thm 1 [DRWY] : A poly(s,  ) alg for RA-PAC learning size-s decision trees, for every  >0 (reconstruction from pairs of live variables) Thm 2 [DRWY] : A poly(s,  ) alg for RA-PAC learning size-s DNFs, for every  >.365… (reduction to “Population Recovery Problem”) Positive - In contrast to PAC !!!

Population Recovery (learning a mixture of binomials)

Population Recovery Problem k species, n attributes, from , Vectors v 1, v 2, … v k   n Distribution p 1, p 2, … p k ,  >0 Task: Recover all v i, p i (upto  ) from samples p 1 1/ v 1 p 2 1/ v 2 p 3 1/ v 3 Red: Known Blue: Unknown n k

Population Recovery Problem k species, n attributes, from , ,  >0 v 1, v 2, … v k   n p 1, p 2, … p k fraction in population Task: Recover all v i, p i (upto  ) from samples Samplers: (1) u  v i with prob. p i  -Lossy Sampler: (2) u(j)  ? with prob. 1-   j  [n]  -Noisy Sampler: (2) u(j) flipped w.p. 1/2-   j  [n] 0110 ?1?0?1? p 1 1/ v 1 p 2 1/ v 2 p 3 1/ v 3

Skull Teeth Vertebrae Arms Ribs Legs Tail Loss – Paleontology Height (ft) Length (ft) Weight (lbs) True Data 26% 11% 13% 30% 20%

Skull Teeth Vertebrae Arms Ribs Legs Tail Loss – Paleontology Height (ft) Length (ft) Weight (lbs) From samples Dig #1 Dig #2 Dig #3 Dig #4 …… each finding common to many species! How do they do it?

Socialism Abortion Gay marriage Marijuana Male Rich North US Noise – Privacy True Data 2% % …… From samples Joe Jane …. Who flipped every correct answer with probability 49% Deniability? Recovery?

PRP - applications Recovering from loss & noise - Clustering / Learning / Data mining - Computational biology / Archeology / …… - Error correction - Database privacy - …… Numerous related papers & books

PRP - Results Facts:  =0 obliterates all information. - No polytime algorithm for  = o(1) Thm 3 [DRWY] A poly(k, n,  ) algorithm, from lossy samples, for every  >.365… Thm 4 [WY]: A poly(k log k, n,  ) algorithm, from lossy and/or noisy samples, for every  > 0 Kearns, Mansour, Ron, Rubinfeld, Schapire, Sellie exp(k) algorithm for this discrete version Moitra, Valiant exp(k) algorithm for Gaussian version (even when noise is unknown)

Proof of Thm 4 Reconstruct v i, p i From samples,,,…. Lemma 1: Can assume we know the v i ’s ! Proof: Exposing one column at a time. Lemma 2: Easy in exp(n) time ! Proof: Lossy - enough samples without “?” Noisy – linear algebra on sample probabilities. Idea: Make n=O(log k) [Dimension Reduction] p 1 1/ v 1 p 2 1/ v 2 p 3 1/ v 3 ?1?00??01100 n k

Partial IDs a new dimension-reduction technique

Dimension Reduction and small IDs Lemma: Can approximate p i in exp(| S i |) time ! Does one always have small IDs? p v 1 p v 2 p v 3 p v 4 p v 5 p v 6 p v 7 p v 8 p v 9 IDs S 1 = {1,2} S 2 = {8} S 3 = {1,5,6} n = 8 k = 9 u – random sample q i = Pr[u[S i ]=v i [S i ]]

Small IDs ? NO! However,… p v 1 p v 2 p v 3 p v 4 p v 5 p v 6 p v 7 p v 8 p v 9 IDs S 1 = {1} S 2 = {2} S 3 = {3} … S 8 = {8} S 9 = {1,2,…,8} n = 8 k = 9

Linear algebra & Partial IDs However, we can compute p 9 = 1- p 1 - p 2 -…- p p v 1 p v 2 p v 3 p v 4 p v 5 p v 6 p v 7 p v 8 p v 9 IDs S 1 = {1} S 2 = {2} S 3 = {3} … S 8 = {8} S 9 =  n = 8 k = 9 P

Back substitution and Imposters Can use back substitution if no cycles ! Are there always acyclic small partial IDs? p v 1 p v 2 p v 3 p v 4 p v 5 p v 6 p v 7 p v 8 p v 9 PIDs S 1 = {1,2} S 2 = {8} S 3 = {1,5,6} u – random sample q i = Pr[u[S i ]=v i [S i ]] q 1 = q 2 = q 3 = q 4 - p 1 - p 2 = S 4 = {3} any subset

Acyclic small partial IDs exist Lemma: There is always an ID of length log k p v 1 p v 2 p v 3 p v 4 p v 5 p v 6 p v 7 p v 8 p v 9 PIDs S 8 = {1,5,6} n = 8 k = 9 Idea: Remove and iterate to find more PIDs Lemma: Acyclic (log k)-PIDs always exists!

Chains of small Partial IDs Compute: q i = Pr[u i = 1] = Σ j≤i p i from sample u Back substitution : p i = q i - Σ j<i p j Problem: Long chains! Error doubles each step, so is exponential in the chain length. Want: Short chains! p v 1 p v 2 p v 3 p v 4 p v 5 p v 6 PIDs S 1 = {1} S 2 = {2} S 3 = {3} … S 6 = {6} n = 8 k = 6

The PID (imposter) graph Given: V=(v 1, v 2, … v k )   n S=(S 1, S 2,…,S k )  [ n ] n Construct G (V;S) by connecting v j  v i iff v i is an imposter of v j : v i [ S j ] = v j [ S j ] v v v v v 5 PIDs S 1 = {1} S 2 = {2} S 3 = {3} … S 5 = {5} width = max i |S i | depth = depth ( G ) Want: PIDs w/small width and depth for all V v i  v j iff i > j

Constructing cheap PID graphs Theorem: For every V=(v 1, v 2, … v k ), v i   n we can efficiently find PIDs S=(S 1,S 2,…,S k ), S i  [ n ] of width and depth at most log k Algorithm: Initialize S i =  for all i Invariant: |imposters( v i ;S i )| ≤ k /2 | Si | Repeat: (1) Make S i maximal if not, add minority coordinates to S i (2) Make chains monotone: v j  v i then | S j |<| S i | (so G acyclic) if not, set S i to S j ( and apply (1) to S i ) v v v v v v 6

Analysis of the algorithm Theorem: For every V=(v 1, v 2, … v k )   n we can efficiently find PIDs S=(S 1,S 2,…,S k )  [ n ] n of width and depth at most log k Algorithm: Initialize S i =  for all i Invariant: |imposters( v i ;S i )| ≤ k /2 | Si | Repeat: (1) Make S i maximal (2) Make chains monotone (v j  v i then | S j |<| S i |) Analysis: - | S i |  log k throughout for all i -  i | S i | increases each step - Termination in k log k steps. - width  log k and so depth  log k

Conclusions - Restriction access: a new, general model of “gray box” access (largely unexplored!) - A general problem of population recovery - Efficient reconstruction from loss & noise - Partial IDs, a new dimension reduction technique for databases. Open: polynomial time algorithm in k ? (currently k log k, PIDs can’t beat k loglog k ) Open: Handle unknown errors ?