Sublinear Algorihms for Big Data

Slides:



Advertisements
Similar presentations
Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
Advertisements

Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Optimal Bounds for Johnson- Lindenstrauss Transforms and Streaming Problems with Sub- Constant Error T.S. Jayram David Woodruff IBM Almaden.
Numerical Linear Algebra in the Streaming Model
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.
Fast Johnson-Lindenstrauss Transform(s) Nir Ailon Edo Liberty, Bernard Chazelle Bertinoro Workshop on Sublinear Algorithms May 2011.
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Heavy Hitters Piotr Indyk MIT. Last Few Lectures Recap (last few lectures) –Update a vector x –Maintain a linear sketch –Can compute L p norm of x (in.
Use of moment generating functions. Definition Let X denote a random variable with probability density function f(x) if continuous (probability mass function.
Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
Dimensionality Reduction
Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform Nir Ailon, Bernard Chazelle (Princeton University)
Parametric Inference.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
The moment generating function of random variable X is given by Moment generating function.
Embedding and Sketching Alexandr Andoni (MSR). Definition by example  Problem: Compute the diameter of a set S, of size n, living in d-dimensional ℓ.
Distribution Function properties. Density Function – We define the derivative of the distribution function F X (x) as the probability density function.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Moment Generating Functions 1/33. Contents Review of Continuous Distribution Functions 2/33.
Moment Generating Functions
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Use of moment generating functions 1.Using the moment generating functions of X, Y, Z, …determine the moment generating function of W = h(X, Y, Z, …).
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Big Data Lecture 5: Estimating the second moment, dimension reduction, applications.
Random Variables By: 1.
An Optimal Algorithm for Finding Heavy Hitters
Charles University Charles University STAKAN III
New Characterizations in Turnstile Streams with Applications
Fast Dimension Reduction MMDS 2008
Estimating L2 Norm MIT Piotr Indyk.
Estimating Lp Norms Piotr Indyk MIT.
Sample Mean Distributions
Linear Combination of Two Random Variables
Sublinear Algorithmic Tools 2
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Lecture 10: Sketching S3: Nearest Neighbor Search
Lecture 4: CountSketch High Frequencies
Lecture 7: Dynamic sampling Dimension Reduction
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
CIS 700: “algorithms for Big Data”
CIS 700: “algorithms for Big Data”
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
Y. Kotidis, S. Muthukrishnan,
Near-Optimal (Euclidean) Metric Compression
Sublinear Algorihms for Big Data
Linear sketching over
The Gamma PDF Eliason (1993).
Tutorial 7: General Random Variables 3
CSCI B609: “Foundations of Data Science”
The Multivariate Normal Distribution, Part 2
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
Random WALK, BROWNIAN MOTION and SDEs
CSCI B609: “Foundations of Data Science”
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Charles University Charles University STAKAN III
Lecture 6: Counting triangles Dynamic graphs & sampling
Lecture 15: Least Square Regression Metric Embeddings
CIS 700: “algorithms for Big Data”
Discrete Random Variables: Joint PMFs, Conditioning and Independence
Moments of Random Variables
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Sublinear Algorihms for Big Data Slides are available at http://grigory.us/big-data.html Sublinear Algorihms for Big Data Lecture 4 Grigory Yaroslavtsev http://grigory.us

Today Dimensionality reduction Sublinear time algorithms AMS as dimensionality reduction Johnson-Lindenstrauss transform Sublinear time algorithms Definitions: approximation, property testing Basic examples: approximating diameter, testing properties of images Testing sortedness Testing connectedness

𝐿 𝑝 -norm Estimation Stream: 𝒎 updates 𝑥 𝑖 , Δ 𝑖 ∈ 𝑛 ×ℝ that define vector 𝑓 where 𝑓 𝑗 = 𝑖: 𝑥 𝑖 =𝑗 Δ 𝑖 . Example: For 𝑛=4 〈 1,3 , 3, 0.5 , 1,2 , 2,−2 , 2,1 , 1,−1 , (4,1)〉 𝑓=(4, −1, 0.5, 1) 𝐿 𝑝 -norm: 𝑓 𝑝 = 𝑖 𝑓 𝑝 1 𝑝

𝐿 𝑝 -norm Estimation 𝐿 𝑝 -norm: 𝑓 𝑝 = 𝑖 𝑓 𝑝 1 𝑝 Two lectures ago: 𝑓 0 = 𝐹 0 -moment 𝑓 2 2 = 𝐹 2 -moment (via AMS sketching) Space: O log 𝑛 𝜖 2 log 1 𝛿 Technique: linear sketches 𝑓 0 : 𝑖∈𝑆 𝑓 𝑖 for random set s 𝑆 𝑓 2 2 : 𝑖 𝜎 𝑖 𝑓 𝑖 for random signs 𝜎 𝑖

AMS as dimensionality reduction Maintain a “linear sketch” vector 𝒁= 𝑍 1 , …, 𝑍 𝑘 =𝑅𝑓 𝑍 𝑖 = 𝑗∈[𝑛] 𝜎 𝑖𝑗 𝑓 𝑗 , where 𝜎 𝑖𝑗 ∈ 𝑅 {−1,1} Estimator 𝒀 for 𝑓 2 2 : 1 𝑘 𝑖=1 𝑘 𝑍 𝑖 2 = 𝑅𝑓 2 2 𝑘 “Dimensionality reduction”: 𝑥→𝑅𝑥, “heavy” tail Pr 𝒀 − 𝑓 2 2 ≥𝑐 2 𝑘 1 2 𝑓 2 2 ≤ 1 𝑐 2

Normal Distribution Normal distribution 𝑁(0,1) Basic facts: Range: (−∞, +∞) Density: 𝜇 𝑥 = 2𝜋 − 1 2 𝑒 − 𝑥 2 2 Mean = 0, Variance = 1 Basic facts: If 𝑋 and 𝑌 are independent r.v. with normal distribution then 𝑋+𝑌 has normal distribution 𝑉𝑎𝑟 𝑐𝑋 = 𝑐 2 𝑉𝑎𝑟[𝑋] If 𝑋, 𝑌 are independent, then 𝑉𝑎𝑟 𝑋+𝑌 =𝑉𝑎𝑟 𝑋 +𝑉𝑎𝑟[𝑌]

Johnson-Lindenstrauss Transform Instead of ±1 let 𝜎 𝑖 be i.i.d. random variables from normal distribution 𝑁(0,1) 𝑍= 𝑖 𝜎 𝑖 𝑓 𝑖 We still have 𝔼 𝑍 2 = 𝑖 𝑓 𝑖 2 = 𝑓 2 2 because: 𝔼 𝜎 𝑖 𝔼 𝜎 𝑗 =0; 𝔼 𝜎 𝑖 2 = “variance of 𝜎 𝑖 “ = 1 Define 𝒁=( 𝑍 1 , …, 𝑍 𝑘 ) and define: 𝒁 2 2 = 𝑗 𝑍 𝑗 2 𝔼 𝒁 2 2 =𝑘 𝑓 2 2 JL Lemma: There exists 𝐶>0 s.t. for small enough 𝜖>0: Pr 𝒁 2 2 −𝑘 𝑓 2 2 >𝜖𝑘 𝑓 2 2 ≤ exp −𝐶 𝜖 2 𝑘

Proof of JL Lemma JL Lemma: ∃𝐶>0 s.t. for small enough 𝜖>0: Pr 𝒁 2 2 −𝑘 𝑓 2 2 >𝜖𝑘 𝑓 2 2 ≤ exp −𝐶 𝜖 2 𝑘 Assume 𝑓 2 2 =1. We have 𝒁 𝑖 = 𝑗 𝜎 𝑖𝑗 𝑓 𝑖 and 𝒁=( 𝒁 𝟏 , …, 𝒁 𝒌 ) 𝔼 𝒁 2 2 =𝑘 𝑓 2 2 =𝑘 Alternative form of JL Lemma: Pr 𝒁 2 2 >𝑘 1+𝜖 2 ≤ exp − 𝜖 2 𝑘+𝑂(𝑘 𝜖 3 )

Proof of JL Lemma Alternative form of JL Lemma: Pr 𝒁 2 2 >𝑘 1+𝜖 2 ≤ exp − 𝜖 2 𝑘+𝑂(𝑘 𝜖 3 ) Let 𝒀= 𝒁 2 2 and 𝛼=𝑘 1+𝜖 2 For every 𝒔>0 we have: Pr 𝒀>𝛼 =Pr⁡[ 𝑒 𝒔𝒀 > 𝑒 𝒔𝛼 ] By Markov and independence of 𝒁 𝑖 ′ 𝑠: Pr 𝑒 𝒔𝒀 > 𝑒 𝒔𝛼 ≤ 𝔼 𝑒 𝒔𝒀 𝑒 𝒔𝛼 = 𝑒 −𝒔𝛼 𝔼 𝑒 𝒔 𝑖 𝒁 𝑖 2 = 𝑒 −𝒔𝛼 𝑖=1 𝑘 𝔼 𝑒 𝒔 𝒁 𝑖 2 We have 𝑍 𝑖 ∈𝑁(0,1), hence: 𝔼 𝑒 𝒔 𝒁 𝑖 2 = 2𝜋 − 1 2 −∞ ∞ 𝑒 𝒔 𝑡 2 𝑒 − 𝑡 2 2 𝑑𝑡 = 1 1−2𝒔

Proof of JL Lemma Alternative form of JL Lemma: Pr 𝒁 2 2 >𝑘 1+𝜖 2 ≤ exp − 𝜖 2 𝑘+𝑂(𝑘 𝜖 3 ) For every 𝒔>0 we have: Pr 𝒀>𝛼 ≤ 𝑒 −𝒔𝛼 𝑖=1 𝑘 𝔼 𝑒 𝒔 𝒁 𝑖 2 = 𝑒 −𝒔𝛼 1 −2𝒔 − 𝑘 2 Let 𝒔= 1 2 1 − 𝑘 𝛼 and recall that 𝛼=𝑘 1+𝜖 2 A calculation finishes the proof: Pr 𝒀>𝛼 ≤ exp − 𝜖 2 𝑘+𝑂(𝑘 𝜖 3 )

Johnson-Lindenstrauss Transform Single vector: 𝑘=𝑂 log 1 𝛿 𝜖 2 Tight: 𝑘=Ω log 1 𝛿 𝜖 2 [Woodruff’10] 𝒏 vectors simultaneously: 𝑘=𝑂 log 𝒏 log 1 𝛿 𝜖 2 Tight: 𝑘=Ω log 𝒏 log 1 𝛿 𝜖 2 [Molinaro, Woodruff, Y. ’13] Distances between 𝒏 vectors = O( 𝒏 𝟐 ) vectors: 𝑘=𝑂 log 𝒏 log 1 𝛿 𝜖 2