Matrix Completion IT530 Lecture Notes.

Slides:



Advertisements
Similar presentations
Shortest Vector In A Lattice is NP-Hard to approximate
Advertisements

Bregman Iterative Algorithms for L1 Minimization with
Compressive Sensing IT530, Lecture Notes.
Matrix Completion IT530 Lecture Notes. Matrix Completion in Practice: Scenario 1 Consider a survey of M people where each is asked Q questions. It may.
Linear Algebra Applications in Matlab ME 303. Special Characters and Matlab Functions.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
T HE POWER OF C ONVEX R ELAXATION : N EAR - OPTIMAL MATRIX COMPLETION E MMANUEL J. C ANDES AND T ERENCE T AO M ARCH, 2009 Presenter: Shujie Hou February,
N-view factorization and bundle adjustment CMPUT 613.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Segmentation and Fitting Using Probabilistic Methods
Eigenvalues and Eigenvectors
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Structure from motion.
Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.
Probabilistic video stabilization using Kalman filtering and mosaicking.
Structure from motion. Multiple-view geometry questions Scene geometry (structure): Given 2D point matches in two or more images, where are the corresponding.
Uncalibrated Geometry & Stratification Sastry and Yang
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Multiple-view Reconstruction from Points and Lines
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
© 2003 by Davi GeigerComputer Vision October 2003 L1.1 Structure-from-EgoMotion (based on notes from David Jacobs, CS-Maryland) Determining the 3-D structure.
Previously Two view geometry: epipolar geometry Stereo vision: 3D reconstruction epipolar lines Baseline O O’ epipolar plane.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Camera Calibration CS485/685 Computer Vision Prof. Bebis.
Matching Compare region of image to region of image. –We talked about this for stereo. –Important for motion. Epipolar constraint unknown. But motion small.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Computer vision: models, learning and inference
Feature and object tracking algorithms for video tracking Student: Oren Shevach Instructor: Arie nakhmani.
Target Tracking with Binary Proximity Sensors: Fundamental Limits, Minimal Descriptions, and Algorithms N. Shrivastava, R. Mudumbai, U. Madhow, and S.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Cs: compressed sensing
© 2005 Yusuf Akgul Gebze Institute of Technology Department of Computer Engineering Computer Vision Geometric Camera Calibration.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Recovering low rank and sparse matrices from compressive measurements Aswin C Sankaranarayanan Rice University Richard G. Baraniuk Andrew E. Waters.
CSCE 643 Computer Vision: Structure from Motion
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
Elementary Linear Algebra Anton & Rorres, 9th Edition
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
Section 2.3 Properties of Solution Sets
Direct Robust Matrix Factorization Liang Xiong, Xi Chen, Jeff Schneider Presented by xxx School of Computer Science Carnegie Mellon University.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
Affine Structure from Motion
E XACT MATRIX C OMPLETION VIA CONVEX OPTIMIZATION E MMANUEL J. C ANDES AND B ENJAMIN R ECHT M AY 2008 Presenter: Shujie Hou January, 28 th,2011 Department.
A Flexible New Technique for Camera Calibration Zhengyou Zhang Sung Huh CSPS 643 Individual Presentation 1 February 25,
EECS 274 Computer Vision Affine Structure from Motion.
Optical Flow. Distribution of apparent velocities of movement of brightness pattern in an image.
Motion Estimation Today’s Readings Trucco & Verri, 8.3 – 8.4 (skip 8.3.3, read only top half of p. 199) Newton's method Wikpedia page
Robust Principal Components Analysis IT530 Lecture Notes.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Lecture 9 Feature Extraction and Motion Estimation Slides by: Michael Black Clark F. Olson Jean Ponce.
Determining 3D Structure and Motion of Man-made Objects from Corners.
Structure from Motion Paul Heckbert, Nov , Image-Based Modeling and Rendering.
Affine Registration in R m 5. The matching function allows to define tentative correspondences and a RANSAC-like algorithm can be used to estimate the.
Parameter estimation class 5 Multiple View Geometry CPSC 689 Slides modified from Marc Pollefeys’ Comp
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
12. Principles of Parameter Estimation
Motion Segmentation with Missing Data using PowerFactorization & GPCA
Streaming & sampling.
Structure from motion Input: Output: (Tomasi and Kanade)
Nuclear Norm Heuristic for Rank Minimization
Rank-Sparsity Incoherence for Matrix Decomposition
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
12. Principles of Parameter Estimation
Structure from motion Input: Output: (Tomasi and Kanade)
Presentation transcript:

Matrix Completion IT530 Lecture Notes

Matrix Completion in Practice: Scenario 1 Consider a survey of M people where each is asked Q questions. It may not be possible to ask each person all Q questions. Consider a matrix of size M by Q (each row is the set of questions asked to any given person). This matrix is only partially filled (many missing entries). Is it possible to infer the full matrix given just the recorded entries?

Matrix Completion in Practice: Scenario 2 Some online shopping sites such as Amazon, Flipkart, Ebay, Netflix etc. have recommender systems. These websites collect product ratings from users (especially Netflix). Based on user ratings, these websites try to recommend other products/movies to the user that he/she will like with a high probability. Consider a matrix with the number of rows equal to the number of users, and number of columns equal to the number of movies/products. This matrix will be HIGHLY incomplete (no user has the patience to rate too many movies!!) – maybe only 5% of the entries will be filled up. Can the recommender system infer user preferences from just the defined entries?

Matrix Completion in Practice: Scenario 2 Read about the Netflix Prize to design a better recommender system: http://en.wikipedia.org/wiki/Netflix_Prize

Matrix Completion in Practice: Scenario 3 Consider an image or a video with several pixel values missing. This is not uncommon in range imagery or remote sensing applications! Consider a matrix whose each column is a (vectorized) patch of M pixels. Let the number of columns be K. This M by K matrix will have many missing entries. Is it possible to infer the complete matrix given just the defined pixel values? If the answer were yes, note the implications for image compression!

Matrix Completion in Practice: Scenario 4 Consider a long video sequence of F frames. Suppose I mark out M salient (interesting points) {Pi}, 1<=i<=M, in the first frame. And try to track those points in all subsequent frames. Consider a matrix F of size M x 2F where row j contains the X and Y coordinates of points on the motion trajectory of initial point Pj. Unfortunately, many salient points may not be trackable due to occlusion or errors from the tracking algorithms. So F is highly incomplete. Is it possible to infer the true matrix from only the available measurements?

A property of these matrices Scenario 1: Many people will tend to give very similar or identical answers to many survey questions. Scenario 2: Many people will have similar preferences for movies (only a few factors affect user choices). Scenario 3: Non-local self-similarity! This makes the matrices in all these scenarios low in rank!

A property of these matrices Scenario 4: The true matrix underlying F in question has been PROVED to be of low rank (in fact, rank 3) under orthographic projection (ref: Tomasi and Kanade, “Shape and Motion from Image Streams Under Orthography: a Factorization Method”, IJCV 1992) and a few other more complex camera models (up to rank 9). F can be expressed as a product of two matrices – a rotation matrix of size 2F x 3, and a shape matrix of size 3 x P. F is useful for many computer vision problems such as structure from motion, motion segmentation and multi-frame point correspondences.

Low-rank matrices are cool! The answer to the four questions/scenarios is a big NO in the general case. But it’s a big YES if we assume that the underlying matrix has low rank (and which, as we have seen, is indeed the case for all four scenarios).

Theorem 1 (Informal Statement) Consider an unknown matrix F of size n1 by n2 having rank r < min(n1, n2). Suppose we observe only a fraction of entries of F in the form of matrix G, where G(i,j) = F(i,j) for all (i,j) belonging to some set W and G(i,j) undefined elsewhere. If (1) F has row and column spaces that are “sufficiently incoherent” with the canonical basis (i.e. identity matrix), (2) r is “sufficiently small”, and (3) W is “sufficiently large”, then we can accurately recover F from G by solving the following rank minimization problem:

Cool theorem, but …  The afore-mentioned optimization problem is NP-hard (in fact, it is known to have double exponential complexity!)

Theorem 2 (Informal Statement) Consider an unknown matrix F of size n1 by n2 having rank r < min(n1, n2). Suppose we observe only a fraction of entries of F in the form of matrix G, where G(i,j) = F(i,j) for all (i,j) belonging to some set W and G(i,j) undefined elsewhere. If (1) F has row and column spaces that are “sufficiently incoherent” with the canonical basis (i.e. identity matrix), (2) r is “sufficiently small”, and (3) W is “sufficiently large”, then we can accurately recover F from G by solving the following “trace-norm” minimization problem:

What is the trace-norm of a matrix? The trace-norm of a matrix is the sum of its singular values. It is also called nuclear norm. It is a softened version of the rank of a matrix, just like the L1-norm of a vector is a softened version of the L0-norm of the vector. Minimization of the trace-norm (even under the given constraints) is a convex optimization problem and can be solved efficiently (no local minima issues). This is similar to the L1-norm optimization (in compressive sensing) being efficiently solvable.

More about trace-norm minimization The efficient trace-norm minimization procedure is provably known to give the EXACT SAME result as the NP-hard rank minimization problem (under the same constraints and same conditions on the unknown matrix F and the sampling set W). This is analogous to the case where L1-norm optimization yielded the same result as L0-norm optimization (under the same set of constraints and conditions). Henceforth we will concentrate only on Theorem 2 (and beyond).

The devil is in the details Beware: Not all low-rank matrices can be recovered from partial measurements! Example consider a matrix containing zeroes everywhere except the top-right corner. This matrix is low rank, but it cannot be recovered from knowledge of only a fraction of its entries! Many other such examples exist. In reality, Theorems 1 and 2 work for low-rank matrices whose singular vectors are sufficiently spread out, i.e. sufficiently incoherent with the canonical basis (i.e. with the identity matrix).

Coherence of a basis The coherence of subspace U of Rn and having dimension r with respect to the canonical basis {ei} is defined as:

Formal definition of key assumptions Consider an underlying matrix M of size n1 by n2. Let the SVD of M be given as follows: We make the following assumptions about M: (A0) (A1) The maximum entry in the n1 by n2 matrix is upper bounded by

Theorem 2 (Formal Statement) the trace-norm minimizer (in the informal statement of theorem 2)

Comments on Theorem 2 Theorem 2 states that more entries of M must be known (denoted by m) for accurate reconstruction if (1) M has larger rank r, (2) greater value of m0 in (A0), (3) greater value of m1 in (A1). Example: If m0 = O(1) and the rank r is small, the reconstruction is accurate with high probability provided .

Comments on Theorem 2 It turns out that if the singular vectors of matrix M have bounded values, the condition (A1) almost always holds for the value m1 = O(log n)

Matrix Completion under noise Consider an unknown matrix F of size n1 by n2 having rank r < min(n1, n2). Suppose we observe only a fraction of entries of F in the form of matrix G, where G(i,j) = F(i,j) + Z(i,j) for all (i,j) belonging to some set W and G(i,j) undefined elsewhere. Here Z refers to a white noise process which obeys the constraint that:

Matrix Completion under noise In such cases, the unknown matrix F can be recovered by solving the following minimization procedure (called as a semi-definite program):

Theorem 3 (informal statement) The reconstruction result from the previous procedure is accurate with an error bound given by:

A Minimization Algorithm Consider the minimization problem: There are many techniques to solve this problem (http://perception.csl.illinois.edu/matrix-rank/sample_code.html) Out of these, we will study one method called “singular value thresholding”.

Singular Value Thresholding (SVT) Ref: Cai et al, “A singular value thresholding algorithm for matrix completion”, SIAM Journal on Optimization, 2010. Singular Value Thresholding (SVT) The soft-thresholding procedure obeys the following property (which we state w/o proof).

Properties of SVT (stated w/o proof) The sequence {Fk} converges to the true solution of the main problem provided the step-sizes {dk} all lie between 0 and 2. This happens typically for large values of t.

Results The SVT algorithm works very efficiently and is easily implementable in MATLAB. The authors report reconstruction of a 30,000 by 30,000 matrix in just 17 minutes on a 1.86 GHz dual-core desktop with 3 GB RAM and with MATLAB’s multithreading option enabled.

Results (Data without noise)

Results (Noisy Data)

Results on real data Dataset consists of a matrix M of geodesic distances between 312 cities in the USA/Canada. This matrix is of approximately low-rank (in fact, the relative Frobenius error between M and its rank-3 approximation is 0.1159). 70% of the entries of this matrix (chosen uniformly at random) were blanked out.

Results on real data The underlying matrix was estimated using SVT. In just a few seconds and a few iterations, the SVT produces an estimate that is as accurate as the best rank-3 approximation of M.

Results on real data