A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel

A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel e-mail: dax20@water.gov.il

The Symmetric Case S = ( s ij ) a symmetric positive semi-definite n x n matrix With eigenvalues     n  and eigenvectors v 1, v 2, …, v n S v j = j v j, j = 1, …, n. S V = V D V = [v 1, v 2, …, v n ], V T V = V V T = I D = diag { 1, 2, …, n } S = V D V T =  j v j v j T

Low-Rank Approximations S = 1 v 1 v 1 T + … + n v n v n T T 1 = 1 v 1 v 1 T T 2 = 1 v 1 v 1 T + 2 v 2 v 2 T. T k = 1 v 1 v 1 T + 2 v 2 v 2 T + … + k v k v k T T k is a low - rank approximation of order k.

The Rayleigh Quotient  =  (v, S) = v T S v / v T v  = arg min f (  ) = || S v -  v || 2  estimates an eigenvalue corresponding to V

The Power Method Starting with some unit vector p 0. The k th iteration, k = 1, 2, 3, …, Step 1: Compute w k = S p k-1 Step 2: Compute  k = ( p k-1 ) T w k Step 3: Normalize p k = w k / || w k || 2

THE POWER METHOD Asymptotic Rates of Convergence ( Assuming 1 > 2 ) { p k }   v 1 at a linear rate, proportional to 2 / 1 {  k }  1 at a linear rate, proportional to ( 2 / 1 ) 2 Monotony : 1  …   k  …   2   1 > 0

THE POWER METHOD The asymptotic rates of convergence depend on the ratio 2 / 1 and can be arbitrarily slow. Yet  k provides a fair estimate of 1 within a few iterations ! For a “worst case analysis” see D.P. O’Leary, G.W. Stewart and J.S. Vandergraft, “Estimating the largest eigenvalue of a positive definite matrix”, Math. of Comp., 33(1979), pp. 1289 – 1292.

THE POWER METHOD An eigenvector v j is called “large” if j  1 / 2 and “small” if j < 1 / 2. In most of the practical situations, for “small” eigenvectors p k T v j becomes negligible after a small number of iterations. Thus, after a few iterations p k actually lies in a subspace spanned by “large” eigenvectors.

Deflation by Subtraction S = 1 v 1 v 1 T + … + n v n v n T. S 1 = S - 1 v 1 v 1 T = 2 v 2 v 2 T + … + n v n v n T. S 2 = S 1 - 2 v 2 v 2 T = 3 v 3 v 3 T + … + n v n v n T.. S n-1 = n v n v n T. S n = 0. Hotelling (1933, 1943)

The Frobenius norm A = ( a ij ), || A || F ] =  | a ij | 2½ [

The Minimum Norm Approach Let the vector v* solve the minimum norm problem minimize E (v) = || S - v v T || F 2. Then v 1 = v* / || v* || 2 and 1 = (v*) T v*.

The Symmetric Quotient Given any vector u, the Symmetric Quotient  (u) = u T S u / ( u T u ) 2 solves the one parameter problem minimize f (  ) = || S -  u u T || F 2 That is,  (u) = arg min f (  ). If || u || 2 = 1 then  (u) =  (u) = u T S u

The Symmetric Quotient Equality || S -  (u) u u T || F 2 = || S || F 2 - (  (u) ) 2 means that solving minimize F (u) = || S - u u T || F 2 is equivalent to solving maximize  (u ) = u T S u / u T u

Can we extend these tools to rectangular matrices?

The Rectangular Case A = ( a ij ) a real m x n matrix, p = min {m, n} With singular values  1   2  …   p  0, Left singular vectors u 1, u 2, …, u p Right singular vectors v 1, v 2, …, v p A v j =  j u j, A T u j =  j v j = 1, …, p.

The Singular Value Decomposition A = U  V T  = diag {  1,  2, …,  p }, p = min { m, n } U = [u 1, u 2, …, u p ], U T U = I V = [v 1, v 2, …, v p ], V T V = I A V = U  A T U = V  A v j =  j u j, A T u j =  j v j j = 1, …, p.

Low - Rank Approximations A = U  V T =   j u j v j T A =  1 u 1 v 1 T +  2 u 2 v 2 T + … +  p u p v p T. B 1 =  1 u 1 v 1 T B 2 =  1 u 1 v 1 T +  2 u 2 v 2 T. B k =  1 u 1 v 1 T +  2 u 2 v 2 T + … +  k u k v k T B k is a low - rank approximation of order k. (Also called "truncated SVD“ or “filtered SVD”.)

The Minimum Norm Approach Let the vectors u* and v* solve the problem minimize F ( u, v) = || A - u v T || F 2 then u 1 = u* / || u* || 2, v 1 = v* / || v* || 2, and  1 = || u* || 2 || v* || 2 ( See the Eckhart-Young, Schmidt-Mirsky Theorems.)

The Rectangular Quotient Given any vectors, u and v, the Rectangular Quotient  (u, v) = u T A v / ( u T u ) ( v T v ) solves the one parameter problem minimize f (  ) = || A -  u v T || F 2 That is,  (u, v) = arg min f (  )

The Rectangular Rayleigh Quotient Given two vectors, u and v, the Rectangular Rayleigh Quotient  (u, v) = u T A v / || u || 2 || v || 2 estimates the “corresponding” singular value.

The Rectangular Rayleigh Quotient Given two unit vectors, u and v, the Rectangular Rayleigh Quotient  (u, v) = u T A v / || u || 2 || v || 2 solves the following three problems minimize f 1 (  ) = || A -  u v T || F minimize f 2 (  ) = || A v -  u || 2 minimize f 3 (  ) = || A T u -  v || 2

The Rectangular Quotients Equality Given any pair of vectors, u and v, the Rectangular Quotient   (u,v) = u T A v / ( u T u ) ( v T v ) satisfies || A –   (u,v) u v T || F 2 = || A || F 2 - (  (u,v) ) 2

The Rectangular Quotients Equality Solving the least norm problem minimize F ( u, v ) = || A - u v T || F 2 is equivalent to solving maximizing  (u, v) = u T A v / || u || 2 || v || 2

Approximating a left singular vector Given a right singular vector, v 1, the corresponding left singular vector, u 1, is attained by solving the least norm problem minimize g ( u ) = || A - u v 1 T || F 2 That is, u 1 = A v 1 / v 1 T v 1. ( The rows of A are orthogonalized against v 1 T.)

Approximating a right singular vector Given a left singular vector, u 1, the corresponding right singular vector, v 1, is attained by solving the least norm problem minimize h ( v ) = || A – u 1 v T || F 2 That is, v 1 = A T u 1 / u 1 T u 1. (The columns of A are orthogonalized against u 1.)

Rectangular Iterations - Motivation The k th iteration, k = 1, 2, 3, …, starts with u k-1 and v k-1 and ends with u k and v k. Given v k-1 the vector u k is obtained by solving the problem minimize g(u) = || A - u v k-1 T || F 2. That is, u k = A v k-1 / v k-1 T v k-1. Then, v k is obtained by solving the problem minimize h(v) = || A - u k v T || F 2, which gives v k = A T u k / u k T u k.

Rectangular Iterations – Implementation The k th iteration, k = 1, 2, 3, …, u k = A v k-1 / v k-1 T v k-1, v k = A T u k / u k T u k. The sequence { v k / || v k || 2 } is obtained by applying the Power Method on the matrix A T A. The sequence { u k / || u k || 2 } is obtained by applying the Power Method on the matrix AA T.

Left Iterations u k = A v k-1 / v k-1 T v k-1, v k = A T u k / u k T u k. ------------------------------------------------------------------------------------------------------- v k T v k = v k T A T u k / u k T u k Right Iterations v k = A T u k-1 / u k-1 T u k-1, u k = A v k / v k T v k. ------------------------------------------------------------------------------------------------------ u k T u k = u k T A v k / v k T v k Can one see a difference?

Some Useful Relations In both cases we have u k T u k v k T v k = u k T A v k, || u k || 2 || v k || 2 = u k T A v k / || u k || 2 || v k || 2 =  (u k, v k ), and  (u k, v k ) = u k T A v k / u k T u k v k T v k = 1. The objective function F ( u, v ) = || A - u v T || F 2 satisfies F ( u k, v k ) = || A || F 2 - u k T u k v k T v k and F ( u k, v k ) - F ( u k+1, v k+1 ) = = u k+1 T u k+1 v k+1 T v k+1 - u k T u k v k T v k > 0

Convergence Properties Inherited from the Power Method, assuming  1 >  2. The sequences { u k / || u k || 2 } and { v k / || v k || 2 } converge at a linear rate, proportional to (  2 /  1 ) 2. { u k T u k v k T v k }  (  1 ) 2 at a linear rate, proportional to (  2 /  1 ) 4 Monotony : (  1 ) 2  u k+1 T u k+1 v k+1 T v k+1  u k T u k v k T v k > 0

Convergence Properties  k = || u k || 2 || v k || 2 provides a fair estimate of  1 within a few rectangular iterations !

Convergence Properties After a few rectangular iterations {  k, u k, v k } provides a fair estimate of a dominant triplet {  1, u 1, v 1 }.

Deflation by Subtraction A 1 = A =  1 u 1 v 1 T + … +  p u p v p T. A 2 = A 1 -  1 u 1 v 1 T =  2 u 2 v 2 T + … +  p u p v p T A 3 = A 2 -  2 u 2 v 2 T =  3 u 3 v 3 T + … +  p v p v p T. A k+1 = A k -  k u k v k T =  k+1 u k+1 v k+1 T +…+  p u p v p T.

Deflation by Subtraction A 1 = A A 2 = A 1 -  1 u 1 v 1 T A 3 = A 2 -  2 u 2 v 2 T. A k+1 = A k -  k u k v k T. where {  k, u k, v k } denotes a computed dominant singular triplet of A k.

The Main Motivation At the k th stage, k = 1, 2, …, a few rectangular iterations provide a fair estimate of a dominant triplet of A K.

Low - Rank Approximation Via Deflation  1   2  …   p  0, A =  1 u 1 v 1 T +  2 u 2 v 2 T + … +  p u p v p T. B 1 =  * 1 u * 1 v * 1 T ( * means computed values ) B 2 =  * 1 u * 1 v * 1 T +  * 2 u * 2 v * 2 T. B =  * 1 u * 1 v * 1 T +  * 2 u * 2 v * 2 T + …+  * u * v * T B is a low - rank approximation of order. ( Also called "truncated SVD“ or the “filtered part” of A. )

Low - Rank Approximation of Order A =  1 u 1 v 1 T +  2 u 2 v 2 T + … + s p u p v p T. B =  * 1 u * 1 v * 1 T +  * 2 u * 2 v * 2 T + …+  * u * v * T B = U  V T U = [u * 1, u * 2, …, u * ], V = [v * 1, v * 2, …, v * ],  = diag {   1,   2, …,   } ( * means computed values )

What About Orthogonality ? Does U T U = I and V T V = I ? The theory behind the Power Method suggests that the more accurate are the computed singular triplets the smaller is the deviation from orthogonality. Is there a difference ( regarding deviation from orthogonality ) between U and V ?

Orthogonality Properties ( Assuming exact arithmetic. ) Theorem 1 : Consider the case when each singular triplet, {  * j, u * j, v * j }, is computed by a finite number of "Left Iterations". ( At least one iteration for each triplet. ) In this case U T U = I and U T A = 0 regardless the actual number of iterations !

Left Iterations u k = A v k-1 / v k-1 T v k-1, v k = A T u k / u k T u k. Right Iterations v k = A T u k-1 / u k-1 T u k-1, u k = A v k / v k T v k. Can one see a difference?

Orthogonality Properties ( Assuming exact arithmetic. ) Theorem 2 : Consider the case when each singular triplet, {  * j, u * j, v * j }, is computed by a finite number of “Right Iterations". ( At least one iteration for each triplet. ) In this case V T V = I and A V = 0 regardless the actual number of iterations !

Finite Termination Assuming exact arithmetic, r = rank ( A ). Corollary : In both cases we have A = B r =  * 1 u * 1 v * 1 T + … +  * r u * r v * r T, regardless the number of iterations per singular triplet !

A New QR Decomposion Assuming exact arithmetic, r = rank ( A ). In both cases we obtain an effective “rank – revealing” QR decomposition A = U r  r V r T. In “Left Iterations” U r T U r = I. In “Right Iterations” V r T V r = I.

The Orthogonal Basis Problem Is to compute an orthogonal basis of Range ( A ). The Householder and Gram-Schmidt orthogonalizations methods use a “column pivoting for size” policy, which completely determine the basis.

The Orthogonal Basis Problem The new method, “Orthogonalization via Deflation”, has larger freedom in choosing the basis. At the k th stage, the ultimate choice for a new vector to enter the basis is u k, the k th left singular vector of A. ( But accurate computation of u k can be “too expensive”. )

The Main Theme At the kth stage, a few rectangular iterations are sufficient to provide a fair subtitute of u k.

Applications in Missing Data Reconstruction Consider the case when some entries of A are missing. * Missing Data in DNA Microarrays * Tables of Annual Rain Data * Tables of Water Levels in Observation Wells * Web Search Engines Standard SVD algorithms are unable to handle such matrices. The Minimum Norm Approach is easily adapted to handle matrices with missing entries.

A Modified Algorithm The objective function F ( u, v ) = || A - u v T || F 2 is redefined as F ( u, v ) =   ( a ij – u i v j ) 2, where the sum is restricted to known entries of A. ( As before, u = (u 1, u 2, …, u m ) T and v = (v 1, v 2, …, v n ) T denote the vectors of unknowns. )

The minimum norm approach Concluding Remarks : * Adds new insight into ‘old’ methods and concepts. * Fast Power methods. ( Relaxation methods, line search acceleration, etc. ) * Opens the door for new methods and concepts. ( The rectangular quotients equality, rectangular iterations, etc. ) * Orthogonalization via Deflation: A new QR decomposition. ( Low - rank approximations, Rank revealing. ) Capable of handling problems with missing data.

A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel

Similar presentations

Presentation on theme: "A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel

Similar presentations

Presentation on theme: "A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel"— Presentation transcript:

Similar presentations

About project

Feedback