How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.

Slides:



Advertisements
Similar presentations
Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
Advertisements

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO Based on a paper in STOC, 2012.
Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO.
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.
Online Performance Guarantees for Sparse Recovery Raja Giryes ICASSP 2011 Volkan Cevher.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Sketching for M-Estimators: A Unified Approach to Robust Regression
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Ron Lavi Presented by Yoni Moses.  Introduction ◦ Combining computational efficiency with game theoretic needs  Monotonicity Conditions ◦ Cyclic Monotonicity.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.
Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth Clarkson David Woodruff IBM Almaden.
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
Chapter 5: The Orthogonality and Least Squares
C&O 355 Mathematical Programming Fall 2010 Lecture 4 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Finding Frequent Items in Data Streams [Charikar-Chen-Farach-Colton] Paper report By MH, 2004/12/17.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Issues on the border of economics and computation נושאים בגבול כלכלה וחישוב Speaker: Dr. Michael Schapira Topic: Dynamics in Games (Part III) (Some slides.
1 Embedding and Similarity Search for Point Sets under Translation Minkyoung Cho and David M. Mount University of Maryland SoCG 2008.
X y x-y · 4 -y-2x · 5 -3x+y · 6 x+y · 3 Given x, for what values of y is (x,y) feasible? Need: y · 3x+6, y · -x+3, y ¸ -2x-5, and y ¸ x-4 Consider the.
Lattice-based cryptography and quantum Oded Regev Tel-Aviv University.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
The Message Passing Communication Model David Woodruff IBM Almaden.
Beating CountSketch for Heavy Hitters in Insertion Streams Vladimir Braverman (JHU) Stephen R. Chestnut (ETH) Nikita Ivkin (JHU) David P. Woodruff (IBM)
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.
New Algorithms for Heavy Hitters in Data Streams David Woodruff IBM Almaden Joint works with Arnab Bhattacharyya, Vladimir Braverman, Stephen R. Chestnut,
Stochastic Streams: Sample Complexity vs. Space Complexity
New Characterizations in Turnstile Streams with Applications
Lecture 13 Compressive sensing
Understanding Generalization in Adaptive Data Analysis
Background: Lattices and the Learning-with-Errors problem
Sublinear Algorithmic Tools 2
Lecture 10: Sketching S3: Nearest Neighbor Search
Lecture 4: CountSketch High Frequencies
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
Y. Kotidis, S. Muthukrishnan,
The Curve Merger (Dvir & Widgerson, 2008)
Overview Massive data sets Streaming algorithms Regression
Linear sketching over
Linear sketching with parities
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
The Byzantine Secretary Problem
Lecture 15: Least Square Regression Metric Embeddings
CIS 700: “algorithms for Big Data”
Sublinear Algorihms for Big Data
Introduction to Machine Learning
Presentation transcript:

How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden

Two Aspects of Coping with Big Data Efficiency Handle enormous inputs Robustness Handle adverse conditions Big Question: Can we have both?

Algorithmic paradigm: Linear Sketches Linear map A Applications: Compressed sensing, data streams, distributed computation,... Unifying idea: Small number of linear measurements applied to data Data vector x in R n Output Sketch y = Ax in R r r << n For this talk: output can be any (not necessarily efficient) function of y

“For each” correctness For each x: Pr { Alg(x) correct } > 1 – 1/poly(n) Pr over randomly chosen matrix A Does this imply correctness on many inputs? No guarantee if input x 2 depends on Alg(x 1 ) for earlier input x 1 Only under modeling assumption: Inputs are non-adaptively chosen Only under modeling assumption: Inputs are non-adaptively chosen Why not?

Example: Johnson-Lindenstrauss Sketch Goal: estimate |x| 2 from |Ax| 2 JL Sketch: if A is a k x n matrix of i.i.d. N(0, 1/k) random variable with k > log n, then Pr[|Ax| 2 = (1±1/2)|x| 2 ] > 1-1/poly(n) Attack: 1. Query x = e i and x = e i + e j for all standard unit vectors e i and e j Learn |A i | 2, |A j | 2, |A i + A j | 2, so learn 2. Hence, learn A T A, and learn kernel of A 3. Query a vector x 2 kernel(A)

Benign/Natural Monitor traffic using sketch, re-route traffic based on output, affects future inputs. Adversarial DoS attack on network monitoring unit Correlations arise in nearly any realistic setting In this work: Broad impossibility results Can we thwart the attack? Can we prove correctness?

Benchmark Problem GapNorm(B): Given decide if (YES) (NO) Easily solvable for B = 1+ε using “for each” guarantee by sketch with O(log n/ε 2 ) rows using JL. Goal: Show impossibility for very basic problem.

Main Result Theorem. For every B, given oracle access to a linear sketch using dimension r · n – log(Bn), we can find in time poly(r,B) a distribution over inputs on which sketch fails to solve GapNorm(B) Corollary. Same result for any l p -norm. Corollary. Same result even if algorithm uses internal randomness on each query. Efficient attack (rules out crypto), even slightly non-trivial sketching dimension impossible

Application to Compressed Sensing Theorem. No linear sketch with o(n/C 2 ) rows gurantees l2/l2 sparse recovery with approximation factor C on a polynomial number of adaptively chosen inputs. l2/l2 recovery: on input x, output x’ for which: Note: Impossible to achieve with deterministic matrix A, but possible with “for each” guarantee with r = k log(n/k). [Gilbert-Hemenway-Strauss-W-Wootters12] has some positive results

Outline Proof of Main Theorem for GapNorm – Proved using “Reconstruction Attack” Sparse Recovery Result – By Reduction from GapNorm – Not in this talk

Computational Model Definition (Sketch) An r-dimensional sketch is any function satisfying for some subspace Sketch has unbounded computational power on top of P U x 1.Sketches Ax and U T x are equivalent, where U T has orthonormal rows and row-span(U T ) = row-span(A) 2.Sketch U T x equivalent to P U x = UU T x 1.Sketches Ax and U T x are equivalent, where U T has orthonormal rows and row-span(U T ) = row-span(A) 2.Sketch U T x equivalent to P U x = UU T x Why?

Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V 0 = {0}, subspace of 0 dimension For t = 1 to t = r: (Correlation Finding) Find vectors x 1,...,x m weakly correlated with unknown subspace U, orthogonal to V t-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to V t-1 (Progress) Put V t = span{V t-1, x} Output: Subspace V r

Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V 0 = {0}, empty subspace For t = 1 to t = r: (Correlation Finding) Find vectors x 1,...,x m weakly correlated with unknown subspace U, orthogonal to V t-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to V t-1 (Progress) Put V t = V t-1 + span{x} Output: Subspace V r

Conditional Expectation Lemma Moreover, 1. Δ ≥ poly(1/d) 2. g = N(0,σ) n for a carefully chosen σ unknown to sketching algorithm Lemma. Given d-dimensional sketch f, we can find using poly(d) queries a distribution g such that: “Advantage over random”

Simplification Fact: If g is Gaussian, then P U g = UU T g is Gaussian as well Hence, can think of query distribution as choosing random Gaussian g to be inside subspace U. We drop the P U projection operator for notational simplicity.

The three step intuition (Symmetry) Since the queries are random Gaussian inputs g with an unknown variance, by spherical symmetry, sketch f learns nothing more about query distribution than norm |g| (Averaging) If |g| is larger than expected, the sketch is “more likely” to output 1 (Bayes) Hence, by sort-of-Bayes-Rule, conditioned on f(g)=1, expectation of |g| is likely to be larger

Def. Let p(y) = Pr{ f(y) = 1 } y in U uniformly random with |y| 2 = s Fact. If g is Gaussian with E|g| 2 = t, then, density of χ 2 -distribution with expectation t and d degrees of freedom density of χ 2 -distribution with expectation t and d degrees of freedom

p(s) = Pr(f(y) = 1) y in U unif. random with |y| 2 = s ? Norm s 1 0 d/nBd/n By correctness of sketch By correctness of sketch l r

Sliding χ 2 -distributions ¢ (s) = s l r (s-t) v t (s) dt ¢ (s) r – O(r 1/2 log r) s 0 1 ¢ (s) ds = s 0 1 s l r (s-t) v t (s) dt ds = 0

Averaging Argument h(s) = E[f(g t ) | |g| 2 = s] Correctness: – For small s, h(s) ¼ 0, while for large s, h(s) ¼ 1 s 0 1 h(s) ¢ (s) ds = s 0 1 s l r h(s) (s-t) v t (s) dt ds ¸ d 9 t so that s 0 1 h(s) s v t (s) ds ¸ t s 0 1 v t (s) ds + d/(l-r) For this t, E[|g t | 2 | f(g t ) = 1] ¸ t + ¢

Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V 0 = {0}, empty subspace For t = 1 to t = r: (Correlation Finding) Find vectors x 1,...,x m weakly correlated with unknown subspace U, orthogonal to V t-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to V t-1 (Progress) Put V t = V t-1 + span{x} Output: Subspace V r

Boosting small correlations x1x2x3...xmx1x2x3...xm n poly(r) M = 1.Sample poly(r) vectors using CoEx Lemma 2.Compute top singular vector x of M Lemma: |P U x| > 1-poly(1/r) Lemma: |P U x| > 1-poly(1/r) Proof: Discretization + Concentration

Implementation in poly(r) time W.l.og. can assume n = r + O(log nB) – Restrict host space to first r + O(log nB) coordinates Matrix M is now O(r) x poly(r) Singular vector computation poly(r) time

Iterating previous steps Generalize Gaussian to “subspace Gaussian” = Gaussian vanishing on maintained subspace V t Intuition: Each step reduces sketch dimension by one. Intuition: Each step reduces sketch dimension by one. After r steps: 1. Sketch has no dimensions left! 2. Host space still has n – r > O(log nB) dimensions

Problem Top singular vector not exactly contained in U Formally, sketch still has dimension r Top singular vector not exactly contained in U Formally, sketch still has dimension r Can fix this by adding small amount of Gaussian noise to all coordinates

Algorithm (Reconstruction Attack) Input: Oracle access to sketch f using unknown subspace U of dimension r Put V 0 = {0}, empty subspace For t = 1 to t = r: (Correlation Finding) Find vectors x 1,...,x m weakly correlated with unknown subspace U, orthogonal to V t-1 (Boosting) Find single vector x strongly correlated with U, orthogonal to V t-1 (Progress) Put V t = V t-1 + span{x} Output: Subspace V r

Open Problems Achievable polynomial dependence still open Optimizing efficiency alone may lead to non- robust algorithms - What is the trade-off between robustness and efficiency in various settings of data analysis?