Lecture 20 Empirical Orthogonal Functions and Factor Analysis.

Slides:



Advertisements
Similar presentations
Chapter 4 Euclidean Vector Spaces
Advertisements

CS 450: COMPUTER GRAPHICS LINEAR ALGEBRA REVIEW SPRING 2015 DR. MICHAEL J. REALE.
3D Geometry for Computer Graphics
Ch 7.7: Fundamental Matrices
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?
Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
Lecture 18 Varimax Factors and Empircal Orthogonal Functions.
Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.
Control of Multiple-Input, Multiple-Output Processes
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Lecture 17 Factor Analysis. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Symmetric Matrices and Quadratic Forms
Chapter 4.1 Mathematical Concepts
Chapter 5 Orthogonality
Principal Component Analysis
Computer Graphics Recitation 5.
3D Geometry for Computer Graphics
Chapter 4.1 Mathematical Concepts. 2 Applied Trigonometry Trigonometric functions Defined using right triangle  x y h.
Ch 7.8: Repeated Eigenvalues
CSCE 590E Spring 2007 Basic Math By Jijun Tang. Applied Trigonometry Trigonometric functions  Defined using right triangle  x y h.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
CSci 6971: Image Registration Lecture 2: Vectors and Matrices January 16, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI.
3D Geometry for Computer Graphics
Boyce/DiPrima 9th ed, Ch 11.2: Sturm-Liouville Boundary Value Problems Elementary Differential Equations and Boundary Value Problems, 9th edition, by.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
CS 450: Computer Graphics 2D TRANSFORMATIONS
Correlation. The sample covariance matrix: where.
Stats & Linear Models.
Separate multivariate observations
Basics of Linear Algebra A review?. Matrix  Mathematical term essentially corresponding to an array  An arrangement of numbers into rows and columns.
SVD(Singular Value Decomposition) and Its Applications
Chapter 10 Review: Matrix Algebra
Summarized by Soo-Jin Kim
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
BMI II SS06 – Class 3 “Linear Algebra” Slide 1 Biomedical Imaging II Class 3 – Mathematical Preliminaries: Elementary Linear Algebra 2/13/06.
Chapter 4.1 Mathematical Concepts
Modern Navigation Thomas Herring
Geometric Transformation. So far…. We have been discussing the basic elements of geometric programming. We have discussed points, vectors and their operations.
Elementary Linear Algebra Anton & Rorres, 9th Edition
D. van Alphen1 ECE 455 – Lecture 12 Orthogonal Matrices Singular Value Decomposition (SVD) SVD for Image Compression Recall: If vectors a and b have the.
Elementary Linear Algebra Anton & Rorres, 9th Edition
Scientific Computing Singular Value Decomposition SVD.
CSE 185 Introduction to Computer Vision Face Recognition.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
A function is a rule f that associates with each element in a set A one and only one element in a set B. If f associates the element b with the element.
Feature Extraction 主講人:虞台文.
Chapter 13 Discrete Image Transforms
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Unsupervised Learning II Feature Extraction
Chapter 4.1 Mathematical Concepts. 2 Applied Trigonometry "Old Henry And His Old Aunt" Defined using right triangle  x y h.
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Unsupervised Learning II Feature Extraction
Sect. 4.5: Cayley-Klein Parameters 3 independent quantities are needed to specify a rigid body orientation. Most often, we choose them to be the Euler.
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
CSE 554 Lecture 8: Alignment
Postulates of Quantum Mechanics
Lecture 8:Eigenfaces and Shared Features
Two-view geometry Computer Vision Spring 2018, Lecture 10
Principal Component Analysis
Feature space tansformation methods
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Presentation transcript:

Lecture 20 Empirical Orthogonal Functions and Factor Analysis

Motivation in Fourier Analysis the choice of sine and cosine “patterns” was prescribed by the method. Could we use the data itself as a source of information about the shape of the patterns?

Example maps of some hypothetical function, say, sea surface temperature forming a sequence in time

the data time

the data

pattern number pattern importance

pattern number 3 Choose just the most important patterns

3 most important patterns

comparison original reconstruction using only 3 patterns Note that this process has reduced the noise (since noise has no pattern common to all the images)

amplitudes of patterns time

Note: no requirement that pattern is periodic in time amplitudes of patterns

Discussion: mixing of end members

A B C Useful tool for data that has three “components” ternary diagram

B C 100% A 75% A 50% A 25% A 0% A works for 3 end-members, as long as A+B+C=100% … similarly for B and C

B C Suppose data fall near line on diagram A = data

B C Suppose data fall near line on diagram A = end-members or factors f1f1 f2f2

B C Suppose data fall near line on diagram A = end-members or factors f1f1 f2f2

B C Suppose data fall near line on diagram A = end-members or factors f1f1 f2f2 mixing line 50%

data idealize as being on mixing line B C A f1f1 f2f2

B C You could represent the data exactly with a third ‘noise’ factor A f1f1 f2f2 f3f3 doesn’t much matter where you put f 3, as long as it’s not on the line

S: components (A, B, C, …) in each sample, s (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) S = Note: a sample is along a row in S N samples M components S is N  M

F: components (A, B, C, …) in each factor, f (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) (A in f 3 ) (B in f 3 ) (C in f 3 ) F = M components M factors F is M  M

C: coefficients of the factors (f 1 in s 1 ) (f 2 in s 1 ) (f 3 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 3 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) (f 3 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (f 3 in s N ) C = N samples M factors C is N  M

Samples N  M (f 1 in s 1 ) (f 2 in s 1 ) (f 3 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 3 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) (f 3 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (f 3 in s N ) (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) (A in f 3 ) (B in f 3 ) (C in f 3 ) S = C F Coefficients N  M Factors M  M

Samples N  M (f 1 in s 1 ) (f 2 in s 1 ) (f 1 in s 2 ) (f 2 in s 2 ) (f 1 in s 3 ) (f 2 in s 3 ) … (f 1 in s N ) (f 2 in s N ) (A in s 1 ) (B in s 1 ) (C in s 1 ) (A in s 2 ) (B in s 2 ) (C in s 2 ) (A in s 3 ) (B in s 3 ) (C in s 3 ) … (A in s N ) (B in s N ) (C in s N ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) S  C’ F’ selected coefficients N  p selected factors p  M ignore f 3 data approximated with only most important factors p most important factors = those with the biggest coefficients

view samples as vectors in space A B C s1s1 s2s2 s3s3 f Let the factors be unit vectors … … then the coefficients are the projections (dot products) of the sample onto the factors

Suggests a method of choosing factors so that they have large coefficients: A B C s1s1 s2s2 s3s3 f Find the factor f that maximizes E =  i [ s i  f ] 2 with the constraint that f  f =1 Note: square the dot product since it can be negative

Find the factor f that maximizes E =  i [ s i  f ] 2 with the constraint that L = f  f – 1 = 0 E =  i [ s i  f ] 2 =  i [  j S ij f j ] [  k S ik f k ] =  j  k [  i S ij S ik ] f j f k =  j  k M jk f j f k with M jk =  i S ij S ik or M=S T S L =  i f i 2 – 1 Use Lagrange Multipliers, minimizing  =E- 2 L, where 2 is the Lagrange Multiplier. We solved this problem 2 lectures ago. It’s solution is the algebraic eigenvalue problem Mf = 2 f. Recall that the eigenvalue is the corresponding value of E. symmetric Write as square for reasons that will become apparent later

So factors solve the algebraic eigenvalue problem: [S T S] f = 2 f. [S T S] is a square matrix with the same number of rows and columns as there are components. So there are as many factors as there are components. The factors must span a space of the same dimension as the components. If you sort the eigenvectors by the size of their eigenvectors, then the ones with the largest eigenvalue have the largest components. So selecting the most important factors is easy.

An important tidbit from the theory of eigenvalues and eigenvectors that we’ll use later on … [S T S] f = 2 f. Let  2 be a diagonal matrix of eigenvalues, i 2 and let V be a matrix whose columns are the corresponding factors, f (i) Then [S T S] = V  2 V T

Note also that the factors are orthogonal f (i)  f (j) = 0 if i  j This is a mathematically pleasant property But it may not always be the physically most-relevant choice B C A f1f1 B C A f2f2 not orthogonal orthogonal f1f1 f2f2 contains negative A close to mean of data

Upshot eigenvectors of [S T S] f = 2 f with the p eigenvalues identify a p-dimensional sub-space in which most of the data lie you can use those eigenvectors as factors Or You can chose any other p factors that span that subspace In the ternary diagram example, they must lie on the line connecting the two SVD factors

Singular Value Decomposition (SVD) Any N  M matrix S and be written as the product of three matrices S = U  V T where U is N  N and satisfies U T U = UU T V is M  M and satisfies V T V = VV T and  is an N  M diagonal matrix of singular values

Now note that it S = U  V T then S T S = [U  V T ] T [U  V T ] = V  U T U  V T = V  2 V T Compare with the tidbit mentioned earlier S T S=V  2 V T The SVD V is the same V we were talking about earlier The columns of V are the eigenvectors f, so F = V T So we can use the SVD to calculate the factors, F

But its even better than that! Write S = U  V T as S = U  V T = [U  ] [V T ] = C F So the coefficients are C = U  and, as shown previously, the factors are F = V T So we can use the SVD to calculate the coefficients, C, and the factors, F

MatLab Code for computing C and F [U,LAMBDA,V] = svd(S); C = U*LAMBDA; F = V’;

MatLab Code approximating S  S p using only the p most important factors p = (whatever); Up=U(:,1:p); LAMBDAp=LAMBDA(1:p,1:p); Cp = Up*LAMBDAp; Vp = V(:,1:p); Fp = (Vp)’; Sp = Cp * Fp;

back to my example

Each pixel is a component of the image and the patters are factors our derivation assumed that the data (samples, s (i) ) were vectors However, in this example, the data are images (matrices) so what I had to do was to write out the pixels of each image as a vector

Steps 1) load images 2) reorganize images into S 3) SVD of S to get U  and V 4) Examine  to identify number of significant factors 5) Build S’, using only significant factors 6) reorganize S’ back into images

MatLab code for reorganizing a sequence of images D(p,q,r) (p=1 …N x ) (q=1 …N x ) (r=1 …N t ) into the sample matrix, S(r,s) (r=1 …N t ) (q=1 …N x 2 ) for r = [1:Nt] % time r for p = [1:Nx] % row p for q = [1:Nx] % col q s = Nx*(p-1)+q; % index s S(r,s) = D(p,q,r); end

MatLab code for reorganizing the sample matrix S(r,s) (r=1 …N t ) (s=1 …N x 2 ) back into a sequence of images D(p,q,r) (p=1 …N x ) (q=1 …N x ) (r=1 …N t ) for r = [1:Nt] % time p for s = [1:Nx*Nx] % index s p = floor( (s-1)/Nx+0.01 ) + 1; % row p q = s - Nx*(p-1); % col q D(p,q,r) = S(r,s); end

Reality of Factors are factors intrinsically meaningful, or just a convenient way of representing data? Example: Suppose the samples are rocks and the components are element concentrations then thinking of the factors as minerals might make intuitive sense Minerals: fixed element composition Rock: mixture of minerals

Many rocks – but just a few minerals mineral (factor) 1 mineral (factor) 2 mineral (factor) 3 rock 1rock 2 rock 3 rock 4 rock 5 rock 6 rock 7

Possibly Desirable Properties of Factors Factors are unlike each other different minerals typically contain different elements Factor contains either large or near-zero components a mineral typically contains only a few elements Factors have only positive components minerals composed of positive amount of chemical elements Coefficient of factors are positive rocks composed of positive amount of minerals Coefficient typically either large or near-zero rocks composed of just a few major minerals

Transformations of Factors S = C F Suppose we mix factors together to get new factors set of factors New Factors M  M (f 1 in f’ 1 ) (f 2 in f’ 1 ) (f 3 in f’ 1 ) (f 1 in f’s 2 ) (f 2 in f’ 2 ) (f 3 in f’ 2 ) (f 1 in f’ 3 ) (f 2 in f’ 3 ) (f 3 in f’ 3 ) = (A in f 1 ) (B in f 1 ) (C in f 1 ) (A in f 2 ) (B in f 2 ) (C in f 2 ) (A in f 3 ) (B in f 3 ) (C in f 3 ) Transformation M  M Old Factors M  M (A in f’ 1 ) (B in f’ 1 ) (C in f’ 1 ) (A in f’ 2 ) (B in f’ 2 ) (C in f’ 2 ) (A in f’ 3 ) (B in f’ 3 ) (C in f’ 3 ) F new = T F old

Transformations of Factors F new = T F old A requirement is that T -1 exists, else F new will not span the same space as F old S = C F = C I F = (C T -1 ) (T F)= C new F new So you could try to implement the desirable factors by designing an appropriate transformation matrix, T A somewhat restrictive choice of T is T=R, where R is a rotation matrix (rotation matrices satisfy R -1 =R T )

A method for implementing this property Factors are unlike each other different minerals typically contain different elements Factor contains either large or near-zero components a mineral typically contains only a few elements Factors have only positive components minerals composed of positive amount of chemical elements Coefficient of factors are positive rocks composed of positive amount of minerals Coefficient typically either large or near-zero rocks composed of just a few major minerals

Factor contains either large or near-zero components More-or-less equivalent to Lots of variance in the amounts of components contained in the factor

Usual formula for variance for data, x  d 2 = N -2 [ N  i x i 2 - (  i x i ) 2 ] Application to factor, f  f 2 = N -2 [ N  i f i 4 - (  i f i 2 ) 2 ] Note that we are measuring the variance of the squares of the elements of, f. Thus a factor has large  f 2 if the absolute-value of its elements has a lot of variation. The sign of the elements is irrelevant.

Varimax Factors Procedure for maximizing the variance of the factors while still preserving their orthogonality

Based on rotating pairs of factors in their plane f 1 old f 2 old f 1 new f 2 new 

f1f1 f2f2 f3f3 f4f4 f1f1 cos(  )f 2 + sin(  )f 3 f4f4 -sin(  )f 2 + cos(  )f 3 = R rotating a pair of factors in their plane by an amount  cos(  ) sin(  ) 0 0 -sin(  ) cos(  ) R = Called a Givens rotation, by the way

Varimax Procedure for a pair of factors f s and f t find  that maximizes the sum of their variances    f’s 2 +  f’t 2 ) = N  i f’ i s4 -(  i f’ i s2 ) 2 +N  i f’ i t4 -(  i f’ i t2 ) 2 where f i ’s = cos(  f i s + sin(  f i t where f i ’t = -sin(  f i s + cos(  f i t Just solve dE/d  = 0

After much algebra  = ¼ tan -1 2N  i u i v i –  i u i  i v i where u i = f i s2 - f i t2 and v i = 2 f i s2 f i t2 N  i (u i 2 -v i 2 ) – (  i u i ) 2 (  i v i ) 2

Then just apply this rotation to every pair of factors* the result is a new set of factor that are mutually orthogonal but that have maximal variance hence the name Varimax *Actually, you need to do the whole procedure multiple times to get convergence, since subsequent rotations to some extent undo the work of previous rotations

Example 1 f s = [ ½, ½, ½, ½ ] T and f t = [ ½, -½, -½, -½ ] T  = 45° f’ s = [ 1/  2, 0, 1/  2, 0 ] T and f’ t = [ 0, - 1/  2, 0, - 1/  2 ] T rotation angle,   fs 2 +  ft 2 sum of variances  ° worst case: zero variance

Example 2 f s = [0.63, 0.31, 0.63, 0.31] T f t = [0.31, , 0.31, -0.63] T  = 26.56° f s = [0.71, 0.00, 0.71, 0.00] T f t = [0.00, -0.71, 0.00, -0.71] T rotation angle,   fs 2 +  ft 2 sum of variances  °