Download presentation
Presentation is loading. Please wait.
Published byDina Ball Modified over 9 years ago
1
Optimal Column-Based Low-Rank Matrix Reconstruction SODA’12 Ali Kemal Sinop Joint work with Prof. Venkatesan Guruswami
2
Outline Introduction – Notation – Problem Definition – Motivations – Results Upper Bound Randomized Algorithm Summary 11:24:39 PM2SODA 2012: Ali Kemal Sinop
3
Notation: Vectors and Matrices X: m-by-n real matrix. X i :i th column of X. C: Subset of columns of X. X C : sub-matrix of X on C. 11:24:39 PMSODA 2012: Ali Kemal Sinop3
4
Formal Problem Definition Given m-by-n matrix X = [X 1 X 2... X n ], Find r columns, C, which minimizes: which is equal to: 11:24:39 PMSODA 2012: Ali Kemal Sinop4 Projection distance of X i to X C.
5
What is Distance to Span? Given any matrix A, and vector x, (Pythagorean Theorem) Thus 11:24:39 PMSODA 2012: Ali Kemal Sinop5 Orthonormal projection matrix onto null space of A.
6
Back to Formal Problem Definition Given m-by-n matrix X = [X 1 X 2... X n ], Find r columns, C, which minimizes: No books, No web pages, No images. Only geometry. 11:24:39 PMSODA 2012: Ali Kemal Sinop6
7
Problem Formulation Given m-by-n matrix X = [X 1 X 2... X n ], Find r columns, C, which minimizes: No books, No web pages, No images. Only geometry. 11:24:39 PMSODA 2012: Ali Kemal Sinop7 is the orthonormal projection matrix onto null space of X C. Reconstruction Error =
8
An Example n=2, m=2, r=1: 11:24:39 PMSODA 2012: Ali Kemal Sinop8 X1X1 X2X2 Origin 135 o For C={1},For C={2},
9
What is the minimum possible? X is m-by-n. X C is m-by-r: Rank of X C is at most |C|=r. Replace column restriction with rank restriction: – Choose any matrix X(r) of rank-r – Minimizing 11:24:39 PMSODA 2012: Ali Kemal Sinop9 |C|≤r implies rank≤r
10
Low Rank Matrix Approximation Therefore X(r): Can be found by Singular Value Decomposition (SVD). 11:24:39 PMSODA 2012: Ali Kemal Sinop10 X(r): a rank-r matrix minimizing
11
Singular Values of X There exists m unique non-negative reals, Best rank-r reconstruction error: “Smooth rank” of X. – For example, if rank(X) = k, then 11:24:39 PMSODA 2012: Ali Kemal Sinop11
12
First Example n=2, m=2, r=1. Remember 11:24:39 PMSODA 2012: Ali Kemal Sinop12 X1X1 X2X2 Origin 135 o Quick check: Worst Possible?
13
Our Goal: Do as well as best rank-k Given target rank k, Allowed error ε>0, Choose smallest C:|C|=r, such that How does r depend on k and ε? 11:24:39 PMSODA 2012: Ali Kemal Sinop13 Best possible rank-k approximation error.
14
Practical Motivations [Drineas, Mahoney’09] DNA microarray: – Unsupervised feature selection for cancer detection. – Column Selection + K-means: Better classification. Many classification problems – Same idea. 11:24:39 PMSODA 2012: Ali Kemal Sinop14
15
Theoretical Applications Our motivation. [Guruswami, S’11] Approximation schemes for many graph partitioning problems. – Running time: Exponential in r=r(k,ε) where k=number of eigenvalues < 1-ε. [Guruswami, S’12] Significantly faster algorithm for sparsest cut and etc... – Running time: Exponential in r=r(k, ε) where k=number of eigenvalues < Φ/ε. 11:24:40 PMSODA 2012: Ali Kemal Sinop15 r = Number of columns needed to get within (1+ ε) factor of best rank-k approximation
16
Previous Results [Frieze, Kannan, Vempala’04] Introduced this problem. [Deshpande, Vempala’06] [Sarlos’06] [Deshpande, Rademacher’10] r=k when ε=k+1. 11:24:40 PMSODA 2012: Ali Kemal Sinop16
17
Recent Results [This paper] We showed – r=k/ε+k-1 columns suffice and r ≥ k/ε-o(k) necessary. – A randomized algorithm in time, – A deterministic algorithm in time Using [Deshpande, Rademacher’10]. ω=matrix multiplication. (Independently) [Boutsidis, Drineas, Magdon- Ismail’11] – r≤2 k / ε columns, – In randomized time O(knm/ε + k 3 ε -2/3 n) 11:24:40 PMSODA 2012: Ali Kemal Sinop17 r is optimal (up to low order terms).
18
Outline Introduction Upper Bound – Strategy – An Algebraic Expression – Eliminating Min – Wrapping Up Randomized Algorithm Summary 11:24:40 PM18SODA 2012: Ali Kemal Sinop
19
Upper Bound Input: m-by-n matrix X, target rank k, number of columns r. Problem: Relate to Our Approach: – Represent in an algebraic form. – Eliminate minimum by randomly sampling C. – Represent error as a function of σ’s. – Bound it in terms of 11:24:40 PMSODA 2012: Ali Kemal Sinop19 Best possible rank-k approximation error.
20
An Algebraic Expression Remember, our problem is: is hard to manipulate. An equivalent algebraic expression? 11:24:40 PMSODA 2012: Ali Kemal Sinop20
21
Base Case: r=1 A simple case. When C={c}: 11:24:40 PMSODA 2012: Ali Kemal Sinop21 XcXc XiXi
22
Case of r=2 Consider C={c,d} in 3-dimensions: 11:24:40 PMSODA 2012: Ali Kemal Sinop22 XdXd XiXi XcXc
23
General Case Fact: Volume 2 = determinant. Using Volume = Base-Volume * Height formula, Hence 11:24:40 PMSODA 2012: Ali Kemal Sinop23
24
Eliminating Min Volume Sampling [Deshpande, Rademacher, Vempala, Wang’06] – Choose C with probability 11:24:40 PMSODA 2012: Ali Kemal Sinop24
25
Symmetric Forms 11:24:40 PMSODA 2012: Ali Kemal Sinop25 Fact: For any k, k th elementary symmetric polynomial:
26
Schur Concavity 11:24:40 PMSODA 2012: Ali Kemal Sinop26 Hence This ratio is Schur-concave. In other words: << σ1σ1 σ2σ2 σkσk σ k+1 σmσm... σ1σ1 σ2σ2 σkσk σ k+1 σmσm... σ1σ1 σ2σ2 σkσk σ k+1 σmσm
27
Wrapping Up 11:24:40 PMSODA 2012: Ali Kemal Sinop27 For r=k/ε+k-1, this is (1+ε). QED
28
Algorithms for Choosing C (Main idea) A nice recursion: Randomized Algorithm 1.Choose j wp a.Can be done in time 2. 3.For all i, 4.Choose r-1 columns on these vectors. 11:24:40 PMSODA 2012: Ali Kemal Sinop28
29
Outline Introduction Upper Bound Randomized Algorithm Summary 11:24:40 PM29SODA 2012: Ali Kemal Sinop
30
Summary (Upper Bound) r=k/ε+k-1 columns suffice to achieve (1+ε)*best rank-k error. (Randomized) Such columns can be found in time (r T SVD ) = O(k) T SVD. (Lower Bound) k/ε-o(k) columns needed. 11:24:40 PMSODA 2012: Ali Kemal Sinop30 Thanks! Job market alert. Thanks! Job market alert.
31
11:24:40 PMSODA 2012: Ali Kemal Sinop31
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.