Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.

Slides:



Advertisements
Similar presentations
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM.
Advertisements

Eigen Decomposition and Singular Value Decomposition
RandNLA: Randomized Numerical Linear Algebra
Generalised Inverses Modal Analysis and Modal Testing S. Ziaei Rad.
Uniform Sampling for Matrix Approximation Michael Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford M.I.T.
Linear Equations in Linear Algebra
Solving Linear Systems (Numerical Recipes, Chap 2)
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Randomized matrix algorithms and their applications
3D Geometry for Computer Graphics
Statistical Leverage and Improved Matrix Algorithms Michael W. Mahoney Stanford University.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Math for CSLecture 41 Linear Least Squares Problem Over-determined systems Minimization problem: Least squares norm Normal Equations Singular Value Decomposition.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Fast Approximation of Matrix Coherence and Statistical Leverage Michael W. Mahoney Stanford University ( For more info, see: cs.stanford.edu/people/mmahoney/
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
1 Neural Nets Applications Vectors and Matrices. 2/27 Outline 1. Definition of Vectors 2. Operations on Vectors 3. Linear Dependence of Vectors 4. Definition.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
3D Geometry for Computer Graphics
A relative-error CUR Decomposition for Matrices and its Data Applications Michael W. Mahoney Yahoo! Research (Joint.
Ordinary least squares regression (OLS)
Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden.
Linear Equations in Linear Algebra
Dirac Notation and Spectral decomposition
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Matrices CS485/685 Computer Vision Dr. George Bebis.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Summarized by Soo-Jin Kim
+ Review of Linear Algebra Optimization 1/14/10 Recitation Sivaraman Balakrishnan.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
Faster least squares approximation
Algebra 3: Section 5.5 Objectives of this Section Find the Sum and Difference of Two Matrices Find Scalar Multiples of a Matrix Find the Product of Two.
Orthogonality and Least Squares
Statistical Leverage and Improved Matrix Algorithms Michael W. Mahoney Yahoo Research ( For more info, see: )
Progress in identification of damping: Energy-based method with incomplete and noisy data Marco Prandina University of Liverpool.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
CPSC 491 Xin Liu November 19, Norm Norm is a generalization of Euclidean length in vector space Definition 2.
Elementary Linear Algebra Anton & Rorres, 9th Edition
Sampling algorithms and core-sets for L p regression and applications Michael W. Mahoney Yahoo Research ( For more info, see:
Matrices Matrices A matrix (say MAY-trix) is a rectan- gular array of objects (usually numbers). An m  n (“m by n”) matrix has exactly m horizontal.
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
Randomized Algorithms for Linear Algebraic Computations & Applications To access my web page: Petros Drineas Rensselaer Polytechnic Institute Computer.
Chapter 6 Systems of Linear Equations and Matrices Sections 6.3 – 6.5.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Review of Matrix Operations Vector: a sequence of elements (the order is important) e.g., x = (2, 1) denotes a vector length = sqrt(2*2+1*1) orientation.
Matrix Factorization and its applications By Zachary 16 th Nov, 2010.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Low Rank Approximation and Regression in Input Sparsity Time David Woodruff IBM Almaden Joint work with Ken Clarkson (IBM Almaden)
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Singular Value Decomposition and Numerical Rank. The SVD was established for real square matrices in the 1870’s by Beltrami & Jordan for complex square.
Chapter 13 Discrete Image Transforms
Camera Calibration Course web page: vision.cis.udel.edu/cv March 24, 2003  Lecture 17.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
A Story of Principal Component Analysis in the Distributed Model David Woodruff IBM Almaden Based on works with Christos Boutsidis, Ken Clarkson, Ravi.
Lecture XXVII. Orthonormal Bases and Projections Suppose that a set of vectors {x 1,…,x r } for a basis for some space S in R m space such that r  m.
REVIEW Linear Combinations Given vectors and given scalars
1.4 The Matrix Equation Ax = b
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Linear Equations in Linear Algebra
Streaming & sampling.
Singular Value Decomposition
Linear Equations in Linear Algebra
Overview Massive data sets Streaming algorithms Regression
Orthogonality and Least Squares
Singular Value Decomposition SVD
Outline Singular Value Decomposition Example of PCA: Eigenfaces.
Orthogonality and Least Squares
Presentation transcript:

Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas and S. (Muthu) Muthukrishnan) SODA 2006

Regression problems We seek sampling-based algorithms for solving l 2 regression. We are interested in overconstrained problems, n >> d. Typically, there is no x such that Ax = b.

sampled “rows” of b sampled rows of A scaling to account for undersampling “Induced” regression problems

Regression problems, definition There is work by K. Clarkson in SODA 2005 on sampling-based algorithms for l 1 regression (  = 1 ) for overconstrained problems.

Exact solution Pseudoinverse of A Projection of b on the subspace spanned by the columns of A

Singular Value Decomposition (SVD) U (V): orthogonal matrix containing the left (right) singular vectors of A.  : diagonal matrix containing the singular values of A.  rank of A. Computing the SVD takes O(d 2 n) time. The pseudoinverse of A is

Questions … Can sampling methods provide accurate estimates for l 2 regression? Is it possible to approximate the optimal vector and the optimal value value Z 2 by only looking at a small sample from the input? (Even if it takes some sophisticated oracles to actually perform the sampling …) Equivalently, is there an induced subproblem of the full regression problem, whose optimal solution and its value Z 2,s aproximates the optimal solution and its value Z 2 ?

Creating an induced subproblem Algorithm 1.Fix a set of probabilities p i, i=1…n, summing up to 1. 2.Pick r indices from {1…n} in r i.i.d. trials, with respect to the p i ’s. 3.For each sampled index j, keep the j-th row of A and the j-th element of b; rescale both by (1/rp j ) 1/2.

The induced subproblem sampled elements of b, rescaled sampled rows of A, rescaled

Our results If the p i satisfy certain conditions, then with probability at least 1- , The sampling complexity is

Our results, cont’d If the p i satisfy certain conditions, then with probability at least 1- , The sampling complexity is  A): condition number of A

Back to induced subproblems … sampled elements of b, rescaled sampled rows of A, rescaled The relevant information for l 2 regression if n >> d is contained in an induced subproblem of size O(d 2 )-by-d. (upcoming writeup: we can reduce the sampling complexity to r = O(d).)

Conditions on the probabilities, SVD Let U (i) denote the i-th row of U. Let U ? 2 R n £ (n -  ) denote the orthogonal complement of U. U (V): orthogonal matrix containing the left (right) singular vectors of A.  : diagonal matrix containing the singular values of A.  rank of A.

Conditions on the probabilities, interpretation What do the lengths of the rows of the n x d matrix U = U A “mean”? Consider possible n x d matrices U of d left singular vectors: I n | k = k columns from the identity row lengths = 0 or 1 I n | k x -> x H n | k = k columns from the n x n Hadamard (real Fourier) matrix row lengths all equal H n | k x -> maximally dispersed U k = k columns from any orthogonal matrix row lengths between 0 and 1 The lengths of the rows of U = U A correspond to a notion of information dispersal

Conditions for the probabilities The conditions that the p i must satisfy, for some  1,  2,  3 2 (0,1]: lengths of rows of matrix of left singular vectors of A Component of b not in the span of the columns of A The sampling complexity is: Small  i ) more sampling

Computing “good” probabilities Open question: can we compute “good” probabilities faster, in a pass efficient manner? Some assumptions might be acceptable (e.g., bounded condition number of A, etc.) In O(nd 2 ) time we can easily compute p i ’s that satisfy all three conditions, with  1 =  2 =  3 = 1/3. (Too expensive in practice for this problem!)

Critical observation sample & rescale

Critical observation, cont’d sample & rescale only U sample & rescale

Critical observation, cont’d Important observation: U s is almost orthogonal, and we can bound the spectral and the Frobenius norm of U s T U s – I. (FKV98, DK01, DKM04, RV04)

Application: CUR-type decompositions 1.How do we draw the rows and columns of A to include in C and R? 2.How do we construct U? Create an approximation to A, using rows and columns of A O(1) columns O(1) rows Carefully chosen U Goal: provide (good) bounds for some norm of the error matrix A – CUR