Environmental Data Analysis with MatLab Lecture 15: Factor Analysis.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. *See PowerPoint Lecture Outline for a complete, ready-made.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 5 Author: Julia Richards and R. Scott Hawley.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 116.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 107.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 40.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 28.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 38.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 75.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Chapter 1 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
ZMQS ZMQS
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Break Time Remaining 10:00.
Pearls of Functional Algorithm Design Chapter 1 1 Roger L. Costello June 2011.
ABC Technology Project
Environmental Data Analysis with MatLab Lecture 10: Complex Fourier Series.
15. Oktober Oktober Oktober 2012.
“Start-to-End” Simulations Imaging of Single Molecules at the European XFEL Igor Zagorodnov S2E Meeting DESY 10. February 2014.
Environmental Data Analysis with MatLab
Squares and Square Root WALK. Solve each problem REVIEW:
We are learning how to read the 24 hour clock
Lecture 1 Describing Inverse Problems. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability.
Chapter 5 Test Review Sections 5-1 through 5-4.
SIMOCODE-DP Software.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
: 3 00.
Week 1.
Visions of Australia – Regional Exhibition Touring Fund Applicant organisation Exhibition title Exhibition Sample Support Material Instructions 1) Please.
We will resume in: 25 Minutes.
Clock will move after 1 minute
1 Unit 1 Kinematics Chapter 1 Day
Lecture 20 Continuous Problems Linear Operators and Their Adjoints.
How Cells Obtain Energy from Food
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Depositional Environments (Paleogeography)
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Environmental Data Analysis with MatLab Lecture 9: Fourier Series.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 13: Filter Theory.
Environmental Data Analysis with MatLab Lecture 16: Orthogonal Functions.
Environmental Data Analysis with MatLab Lecture 23: Hypothesis Testing continued; F-Tests.
Environmental Data Analysis with MatLab Lecture 11: Lessons Learned from the Fourier Transform.
Environmental Data Analysis with MatLab
Environmental Data Analysis with MatLab Lecture 12: Power Spectral Density.
Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 14: Applications of Filters.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Environmental Data Analysis with MatLab
Presentation transcript:

Environmental Data Analysis with MatLab Lecture 15: Factor Analysis

Lecture 01Using MatLab Lecture 02Looking At Data Lecture 03Probability and Measurement Error Lecture 04Multivariate Distributions Lecture 05Linear Models Lecture 06The Principle of Least Squares Lecture 07Prior Information Lecture 08Solving Generalized Least Squares Problems Lecture 09Fourier Series Lecture 10Complex Fourier Series Lecture 11Lessons Learned from the Fourier Transform Lecture 12Power Spectral Density Lecture 13Filter Theory Lecture 14Applications of Filters Lecture 15Factor Analysis Lecture 16Orthogonal functions Lecture 17Covariance and Autocorrelation Lecture 18Cross-correlation Lecture 19Smoothing, Correlation and Spectra Lecture 20Coherence; Tapering and Spectral Analysis Lecture 21Interpolation Lecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-Tests Lecture 24 Confidence Limits of Spectra, Bootstraps SYLLABUS

purpose of the lecture introduce Factor Analysis a method of detecting patterns in data

source A ocean sediment source B s4s4 s2s2 s3s3 s1s1 s5s5 example: sediment samples are a mix of several sources

e1e1 e2e2 e3e3 e4e4 e5e5 e1e1 e2e2 e3e3 e4e4 e5e5 s1s1 s2s2 ocean sediment what does the composition of the samples tell you about the composition of the sources?

another example Atlantic Rock Dataset chemical composition for several thousand rocks

Rocks are a mix of minerals, and … mineral 1 mineral 2 mineral 3 rock 1rock 2 rock 3 rock 4 rock 5 rock 6 rock 7 …minerals have a well-defined composition

Which simpler? rocks have a chemical composition or rocks contain minerals and minerals have chemical compositions

answer will depend on how many minerals are involved and how many elements are in each mineral

representing mixing with matrices

the sample matrix, S N samples by M elements e.g. sediment samples rock samples word element is used in the abstract sense and may not refer to actual chemical elements

the factor matrix, F P factors by M elements e.g. sediment sources minerals note that there are P factors a simplification if P<M

the loading matrix, C N samples by P factors specifies the mix of factors for each sample

summary samples contain factors factors contain elements

an important issue how many factors are needed to represent the samples? need at most P=M but is P < M ?

simple example using ternary diagrams

samples element element B

samples element element B line of samples implies only 2 factors, so P=2

factors samples element element B

A) B) factor, f’ 2 factor, f’ 1 factor, f 1 factor, f 2 data do not uniquely determine factors two bracketing factorsmost typical factor and deviation from it

mathematically S = CF = C’ F’ with F’ = M F and C’ = C M -1 where M is any P×P matrix with an inverse must rely on prior information to choose M

a method to determine the minimum number of factors, P and one possible set of factors

a digression, but an important one suppose that we have an N×N square matrix, M and we experiment with it by multiplying “input” vectors, v, by it to create “output” vectors, w w = Mv

surprisingly, the answer to the question when is the output parallel to the input ? tells us everything about the matrix

if w is parallel to v then w = λ v where λ is a proportionality factor the equation w = Mv is then λ v = Mv or (M - λ I)v=0

but if (M - λ I)v=0 then it would seem that v = (M - λ I) -1 0 = 0 which is not a very interesting solution w is parallel to v when v is zero

to make an interesting solution you must choose λ so that (M - λ I) -1 doesn’t exist which is equivalent to choosing λ so that det(M - λ I)=0

since a matrix with zero determinant has no inverse

in the 2×2 case … this is a quadratic equation in λ and so has two solutions λ 1 and λ 2

in the N×N case det(M - λ I)=0 is an N -order polynomial equation and so has N solutions λ 1, λ 2, … λ N each corresponds to a different v v (1), v (2), … v (N)

“eigenvalues” “eigenvectors”

N × N matrix, M w = Mv when is the output parallel to the input ? N different cases Mv (1) = λ 1 v (1) Mv (2) = λ 2 v (2) … Mv (N) = λ N v (N)

Mv (1) = λ 1 v (1) Mv (2) = λ 2 v (2) … Mv (N) = λ N v (N) simplify notation MV = V Λ

In the text its shown that if M is symmetric then all λ ’s are real v ’s are orthonormal v (i)T v (j) = 1 if i=j 0 if i ≠ j

In the text its shown that if M is symmetric then all λ ’s are real v ’s are orthonormal v (i)T v (j) = 1 if i=j 0 if i ≠ j implies V T V = VV T = I

MV = V Λ post-multiply by V T M = V Λ V T M can be constructed from V and Λ so when is the output parallel to the input ? tells you everything about M

now here’s what this has to do with factors

suppose S is square and symmetric then S = CF = V Λ V T

C F

C F S can be represented by M mutually-perpendicular factors, F

furthermore, suppose that only P eigvenvalues are nonzero the eigenvectors with zero eigenvalues can be thrown out of the equation

we can reduce the number of factors from M to P S = CF = V P Λ P V P T C F S can be represented by P mutually-perpendicular factors, F P

unfortunately … S is usually neither square nor symmetric so a patch in the methodology is needed

the trick … S T S is an M × M square matrix

suppose S T S has eigenvalues Λ P and eigenvectors V P

S T S written in terms of its eigenvalues and eigenvectors

write Λ P as product of its square roots

S T S written in terms of its eigenvalues and eigenvectors write Λ P as product of its square roots insert identity matrix, I

S T S written in terms of its eigenvalues and eigenvectors write Λ P as product of its square roots write I = U p T U p, with U p as yet unknown insert identity matrix, I

S T S written in terms of its eigenvalues and eigenvectors write Λ P as product of its square roots write I = U p T Up, with U p as yet unknown insert identity matrix, I group and write first group as transpose of transpose

S T S written in terms of its eigenvalues and eigenvectors write Λ P as product of its square roots write I = U p T Up, with U p as yet unknown insert identity matrix, I group and write first group as transpose of transpose compare

so

and so

and so called the “singular value decomposition” of S now the non-square, non-symmetric matrix, S, is represented as a mix of P mutually perpendicular factors called the “singular values”

the matrix of loadings, C. the matrix of factors, F since C depends on Σ, the samples contains more of the factors with large singular values than of the factors with the small singular values

in MatLab svd() computes all M factors (you must decide how many to use)

singular values,  ii index, i singular values of the Atlantic Rock dataset (sorted into order of size)

singular values,  ii index, i singular values of the Atlantic Rock dataset (sorted into order of size) discard, since close to zero

factors of the Atlantic Rock dataset

factor of the Atlantic Rock dataset factor 1 is the “typical factor”

factor of the Atlantic Rock dataset factor 2 as MgO increases, Al 2 O 3 and CaO decreases

factor of the Atlantic Rock dataset factor 3: as Al 2 O 3 increases, FeO and CaO increase

graphical representation of factors 2 through 5 f5f5 f2f2 f3f3 f4f4 SiO 2 TiO 2 Al 2 O 3 FeO total MgO CaO Na 2 O K2OK2O

C2C2 C3C3 C4C4 factor loadings C 2 through C 4 plotted in 3D factors 2 through 4 capture most of the variability of the rocks

Al Ti0 2 Al Si0 2 K20K20 Fe0 Mg0 Al A)B) C)D)