Subspace Expanders and Low Rank Matrix Recovery

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Expander codes and pseudorandom subspaces of R n James R. Lee University of Washington [joint with Venkatesan Guruswami (Washington) and Alexander Razborov.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Online Performance Guarantees for Sparse Recovery Raja Giryes ICASSP 2011 Volkan Cevher.
T HE POWER OF C ONVEX R ELAXATION : N EAR - OPTIMAL MATRIX COMPLETION E MMANUEL J. C ANDES AND T ERENCE T AO M ARCH, 2009 Presenter: Shujie Hou February,
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
ECE Department Rice University dsp.rice.edu/cs Measurements and Bits: Compressed Sensing meets Information Theory Shriram Sarvotham Dror Baron Richard.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
A Sparse Solution of is Necessarily Unique !! Alfred M. Bruckstein, Michael Elad & Michael Zibulevsky The Computer Science Department The Technion – Israel.
CS774. Markov Random Field : Theory and Application Lecture 10 Kyomin Jung KAIST Oct
Compressed Sensing Compressive Sampling
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Analysis of Iterative Decoding
AMSC 6631 Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images: Midyear Report Alfredo Nava-Tudela John J. Benedetto,
Wei Wang Xi’an Jiaotong University Generalized Spectral Characterization of Graphs: Revisited Shanghai Conference on Algebraic Combinatorics (SCAC), Shanghai,
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
Game Theory Meets Compressed Sensing
Cs: compressed sensing
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Least SquaresELE Adaptive Signal Processing 1 Method of Least Squares.
Method of Least Squares. Least Squares Method of Least Squares:  Deterministic approach The inputs u(1), u(2),..., u(N) are applied to the system The.
Section 2.3 Properties of Solution Sets
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
High-dimensional Error Analysis of Regularized M-Estimators Ehsan AbbasiChristos ThrampoulidisBabak Hassibi Allerton Conference Wednesday September 30,
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Approximation Algorithms based on linear programming.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Approximation algorithms
Compressive Coded Aperture Video Reconstruction
Lap Chi Lau we will only use slides 4 to 19
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
12. Principles of Parameter Estimation
Hans Bodlaender, Marek Cygan and Stefan Kratsch
Topics in Algorithms Lap Chi Lau.
Sequential Algorithms for Generating Random Graphs
Sam Hopkins Cornell Tselil Schramm UC Berkeley Jonathan Shi Cornell
The Elements of Statistical Learning
Nonnegative polynomials and applications to learning
Background: Lattices and the Learning-with-Errors problem
LOCATION AND IDENTIFICATION OF DAMPING PARAMETERS
Polynomial DC decompositions
Nuclear Norm Heuristic for Rank Minimization
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
The Curve Merger (Dvir & Widgerson, 2008)
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
4.6: Rank.
Informed search algorithms
Bounds for Optimal Compressed Sensing Matrices
On the effect of randomness on planted 3-coloring models
Singular Value Decomposition SVD
Rank-Sparsity Incoherence for Matrix Decomposition
I.4 Polyhedral Theory (NW)
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Maths for Signals and Systems Linear Algebra in Engineering Lecture 6, Friday 21st October 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
I.4 Polyhedral Theory.
CIS 700: “algorithms for Big Data”
Unfolding with system identification
Chapter 2. Simplex method
12. Principles of Parameter Estimation
16. Mean Square Estimation
Vector Spaces RANK © 2012 Pearson Education, Inc..
Chapter 2. Simplex method
Presentation transcript:

Subspace Expanders and Low Rank Matrix Recovery Babak Hassibi joint work with Samet Oymak Amin Khajehnejad Duke Workshop on Sensing and Analysis of High-Dimensional Data Monday, November 11, 2019

Outline Expanders - coding, compressed sensing Subspace expanders - definition, existence Application to low rank matrix recovery - fast recovery algorithm Applications? Further work…. Monday, November 11, 2019

Expanders Regular bipartite (𝑛,𝑚,𝑑,𝛼,𝜖) expander: 𝑛 nodes on the left, 𝑚 on the right each node on the left has degree d each node group of size 𝑘≤𝑛𝛼 has at least 1−𝜖 𝑑𝑘 right neighbors The existence of expander graphs known since the works of Pinsker and Bassylago and Margulis random d-regular graphs are expanders with high probability early explicit constructions used Cayley graphs and yielded expanders with 𝜖 >1/2 recently explicit constructions have been given for all 𝜖 >0 Monday, November 11, 2019

Applications of Expanders Coding: The adjacency matrix of an expander graph can be used as the parity check matrix in an LDPC code - bit flipping (Sipser and Spielman), LP decoding (Feldman) Compressed Sensing: Recover sparse vector 𝑥∈ ℝ 𝑛 from measurements 𝑦=𝐴𝑥 where ||𝑥| | 0 ≤𝑘 - suitable for sparse measurements (DNA microarrays) High quality expanders (small 𝜖) is the key to success when 𝜖< 1 4 a bit flipping algorithm requires 𝑂 𝑛 operations for recovery. (Xu and Hassibi) deterministic guarantees more results exist: RIP properties (Indyk et al), minimal expansion (Khajehnejad et al) Monday, November 11, 2019

Bit Flipping Construct a (𝑛,𝑚,𝑑,𝛼,𝜖) expander with 𝜖< 1 4 and assume 𝑘≤𝑛𝛼 . Furthermore, given an estimate , define the gap in the i-th equation as: Algorithm: 1. Start with 2. If exit. Else, find a variable node such that of the d equations it participates in have a constant gap . 3. Set and go to 2. Bit flipping greedily reduces support of . Monday, November 11, 2019

Another Fast Algorithm Useful when the signal is non-negative Main idea: Let A∈ ℝ 𝑚×𝑛 be adjacency matrix of an expander graph. Observe 𝑦=𝐴𝑥 where 𝑥 is nonnegative. + + + + + Monday, November 11, 2019

Another Fast Algorithm Useful when the signal is non-negative Main idea: Let A∈ ℝ 𝑚×𝑛 be adjacency matrix of an expander graph. Observe 𝑦=𝐴𝑥 where 𝑥 is nonnegative. If , then the system of equations involving the green nodes is overdetermined (Khajehnejad, Dimakis and Hassibi) + + + + + Monday, November 11, 2019

Expanders for Low Rank Matrix Recovery? Natural questions: Can we extend expanders from CS to LRMR? Is there any algorithm to solve the LRMR problem based on expansion? Monday, November 11, 2019

Expanders for Low Rank Matrix Recovery? Natural questions: Can we extend expanders from CS to LRMR? Is there any algorithm to solve the LRMR problem based on expansion? Main Contributions: Existence of rank-expanders Novel algorithms for the LRMR problem Monday, November 11, 2019

Why Low Rank? Low rank matrices are abundant. System identification Multitask learning Graph clustering Collaborative filtering A highly related notion is sparsity In general, many problems have low dimensional underlying structures/just a few atoms Monday, November 11, 2019

LRMR Problem Objective: Given measurements , and X0 is low rank, recover X0. Problem is hard as rank function is nonconvex. A common approach: Relax and convexify. First suggested in Fazel’s PhD thesis. Typical measurements: Matrix completion: Entries are observed at random Inner product with a matrix of iid entries 𝑦 𝑖 =< 𝐴 𝑖 , 𝑋 0 > Monday, November 11, 2019

RM Problem Objective: Given measurements , and X0 is low rank, recover X0. Problem is nontrivial as rank function is nonconvex. A common approach: Relax and convexify! First suggested by in Fazel’s PhD thesis. Typical measurements: Matrix completion: Entries are observed at random Inner product with a matrix of i.i.d. entries 𝑦 𝑖 =< 𝐴 𝑖 , 𝑋 0 > Monday, November 11, 2019

Other Types of Measurements? The second class of measurements seems unreasonable. Can we study measurements that are linear combinations of only a few entries of the matrix? How to construct such measurements? We shall construct expander-based measurements. Monday, November 11, 2019

How to construct? A natural approach inspired from the vector case: Map 𝑛×𝑛 matrices to 𝑚×𝑚 ones (Almost) all rank one matrices should be mapped to rank 𝑑 matrices Any matrix with sufficiently small rank 𝑟, should be mapped to a rank between 𝑑𝑟 1−𝜖 and 𝑑𝑟 Preserve positivity 𝑥 nonnegative ⇒ 𝐴𝑥 nonnegative (for vectors) Try to keep 𝑚 small for LRMR purposes 𝑚=𝑂( 𝑛𝑟 ) for optimal rate Monday, November 11, 2019

Proposed Method Let 𝐴 1 , 𝐴 2 ,…, 𝐴 𝑑 }∈ ℝ 𝑚×𝑛 , we propose A reasonable choice since this already maps to rank at most 𝑑𝑟 preserves positivity If 𝐴 𝑖 𝑋 𝐴 𝑖 𝑇 𝑖=1 𝑑 is sufficiently incoherent, rank should add up Added complication compared to the vector case is that some dimensions of X could be killed by the 𝐴 i Monday, November 11, 2019

Expander Structure + X X X 𝐴 1 𝐴 1 𝑇 𝐴 2 𝐴 2 𝑇 𝐴 𝑑 𝐴 𝑑 𝑇 A_i’ler sanki kenarlar gibi yani her subspace baska bir subspace’lerin union’ina map oluyor 𝐴 𝑑 X 𝐴 𝑑 𝑇 Monday, November 11, 2019

Who are the neighbors? Let X be rank 1 i.e. 𝑋=𝑣 𝑣 𝑇 X corresponds to a left node in the graph Neighbors are rank one (or zero) matrices of type 𝐴 𝑖 𝑣 𝑣 𝑇 𝐴 𝑖 𝑇 𝐴 1 𝑣 𝑣 𝑇 𝐴 1 𝑇 𝑋=𝑣 𝑣 𝑇 𝐴 2 𝑣 𝑣 𝑇 𝐴 2 𝑇 𝐴 3 𝑣 𝑣 𝑇 𝐴 3 𝑇 Nodes are no longer discrete. 𝐴 𝑑 𝑣 𝑣 𝑇 𝐴 𝑑 𝑇 Monday, November 11, 2019

Definition Definition (Rank Expander) We say linear operator is an (𝜖,𝑑, 𝑟 0 ) rank expander if it satisfies For any PSD 𝑋, is PSD. For any PSD 𝑋 with 𝑟𝑎𝑛𝑘 𝑋 =𝑟≤ 𝑟 0 we have Monday, November 11, 2019

An Initial Recovery Result Theorem Assume, is an (𝜖,𝑑, 𝑟 0 ) rank expander with 𝜖< 1 2 . Then for any PSD matrix 𝑋 0 of size 𝑛 with rank at most 𝑟 0 2 , 𝑋 0 is the unique PSD solution of . Monday, November 11, 2019

An Initial Recovery Result Theorem Assume, is an (𝜖,𝑑, 𝑟 0 ) rank expander with 𝜖< 1 2 . Then for any PSD matrix 𝑋 0 of size 𝑛 with rank at most 𝑟 0 2 , 𝑋 0 is the unique PSD solution of . Algorithm 1 Monday, November 11, 2019

Existence Theorem Theorem (Existence of Rank Expanders) For any 0<𝜖<1 there are constants 𝑐 1 , 𝑐 2 so that for any 𝑛 and 𝑟 0 ≤𝑛wwhen we set 𝑚= 𝑐 1 𝑐 2 𝑛 𝑟 0 and 𝑑= 𝑐 2 𝑛 𝑐 1 𝑟 0 and choose 𝐴 𝑖 𝑖=1 𝑑 as iid 𝑚×𝑛 Gaussian matrices, the linear operator is an (𝜖,𝑑, 𝑟 0 ) rank expander with high probability (in 𝑛). Monday, November 11, 2019

Proof of Existence Key points Singular values are Lipschitz Concentration for Lipschitz functions of Gaussians Small probability of deviation 𝜖-cover over the spaces of rank 𝑟≤ 𝑟 0 projections Union bounding Deal with the perturbation Monday, November 11, 2019

Optimality Existence Thm achieves optimum rates Degrees of freedom (DoF) for rank r matrix: 𝑂(𝑛𝑟) DoF for an m x m matrix: 𝑂( 𝑚 2 ) Hence 𝑚≥ 𝑛 𝑟 0 for recoverability. By counting DoF one can similarly obtain Monday, November 11, 2019

Fast Algorithm Algorithm 2 Reconstruct a low rank PSD matrix from under-determined linear measurements Inputs Constant integer 𝑑≥1 Matrices A i 𝑖=1 𝑑 ∈ ℝ 𝑚×𝑛 , Y∈ ℝ 𝑚×𝑚 measurements Output Low rank PSD matrix 𝑋 Monday, November 11, 2019

Fast Algorithm Algorithm 2 Reconstruct a low rank PSD matrix from under-determined linear measurements Initialize Compute 𝑌=𝑆Σ 𝑆 𝑇 with 𝑆 full column rank (SVD) + + + + + Monday, November 11, 2019

Fast Algorithm Algorithm 2 Reconstruct a low rank PSD matrix from under-determined linear measurements Initialize Compute 𝑌=𝑆Σ 𝑆 𝑇 with 𝑆 full column rank (SVD) Set 𝑃=𝐼−𝑆 𝑆 𝑇 + + + + + Monday, November 11, 2019

Fast Algorithm Algorithm 2 Reconstruct a low rank PSD matrix from under-determined linear measurements Initialize Compute 𝑌=𝑆Σ 𝑆 𝑇 with 𝑆 full column rank (SVD) Set 𝑃=𝐼−𝑆 𝑆 𝑇 Set 𝑄=𝑁𝑢𝑙𝑙 𝑃 𝐴 1 𝑇 , . . . . , 𝑃 𝐴 𝑑 𝑇 T + + + + + Monday, November 11, 2019

Fast Algorithm Algorithm 2 Reconstruct a low rank PSD matrix from under-determined linear measurements Initialize Compute 𝑌=𝑆Σ 𝑆 𝑇 with 𝑆 full column rank (SVD) Set 𝑃=𝐼−𝑆 𝑆 𝑇 Set 𝑄=𝑁𝑢𝑙𝑙 𝑃 𝐴 1 𝑇 , . . . . , 𝑃 𝐴 𝑑 𝑇 T Aims to find span(𝑋) Neighborhood of vector 𝑥 𝑖 is Hence 𝑥 𝑖 and 𝑃 is neighbor iff + + + + + Monday, November 11, 2019

Fast Algorithm Algorithm 2 Reconstruct a low rank PSD matrix from under-determined linear measurements Initialize Compute 𝑌=𝑆Σ 𝑆 𝑇 with 𝑆 full column rank (SVD) Set 𝑃=𝐼−𝑆 𝑆 𝑇 Set 𝑄=𝑁𝑢𝑙𝑙 𝑃 𝐴 1 𝑇 , . . . . , 𝑃 𝐴 𝑑 𝑇 T Compute 𝐵 𝑖 = 𝐴 𝑖 𝑄 and set 𝑀= Σ 𝑖=1 𝑑 𝐵 𝑖 ⊗𝐵 𝑖 Find 𝑋∈ ℝ 𝑛×𝑛 with 𝑣𝑒𝑐(𝑋)=(𝑄⊗𝑄) 𝑀 † 𝑣𝑒𝑐(𝑌 + + + + + Monday, November 11, 2019

Fast Algorithm Theorem (PSD Recovery) If the operator is an (𝜖,𝑑, 𝑟 0 ) rank expander with 𝜖<1/2, then for every 𝑘≤ 𝑟 0 1−𝜖 every PSD matrix 𝑋 0 of rank 𝑘 can be recovered from using Algorithm 2. Monday, November 11, 2019

Simulation Results (n=50) Monday, November 11, 2019

Future Work Sparse measurements Analysis in the presence of noise Gaussians behave nice but computationally inefficient; also not very reasonable measurements random sparse measurements do not work very well (unless 𝑚 is large). Needs a more systematic construction. Analysis in the presence of noise Is there a matrix counterpart for the greedy bit-flipping or algorithm? Monday, November 11, 2019

Matrix Bit-Flipping? In principle, this would require recovering 𝑋 one rank-one vector at a time. In particular, we should be able to find a vector 𝑣 such that the matrix drops rank. Not sure how to do this. Monday, November 11, 2019

Applications? Do these types of measurements come up anywhere? One possible area is quantum measurements the quantum state is a low rank matrix (the convex combination of a small number of pure states) measurements are inner products with certain Pauli matrices (which have only a few non-zero entries) given the high dimensions, fast algorithms are a must needs to be studied Other applications? Monday, November 11, 2019