Laboratory in Oceanography: Data and Methods MAR599, Spring 2009 Anne-Marie E.G. Brunner-Suzuki Empirical Orthogonal Functions.

Slides:



Advertisements
Similar presentations
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Advertisements

Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?
Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.
PCA + SVD.
Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Variance and covariance M contains the mean Sums of squares General additive models.
3D Geometry for Computer Graphics
Dimensional reduction, PCA
Spike-triggering stimulus features stimulus X(t) multidimensional decision function spike output Y(t) x1x1 x2x2 x3x3 f1f1 f2f2 f3f3 Functional models of.
Principle Component Analysis What is it? Why use it? –Filter on your data –Gain insight on important processes The PCA Machinery –How to do it –Examples.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
3D Geometry for Computer Graphics
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Ordinary least squares regression (OLS)
Course AE4-T40 Lecture 5: Control Apllication
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
What is EOF analysis? EOF = Empirical Orthogonal Function Method of finding structures (or patterns) that explain maximum variance in (e.g.) 2D (space-time)
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Chapter 2 Dimensionality Reduction. Linear Methods
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
A biologist and a matrix The matrix will follow. Did you know that the 20 th century scientist who lay the foundation to the estimation of signals in.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Additive Data Perturbation: data reconstruction attacks.
1 The Venzke et al. * Optimal Detection Analysis Jeff Knight * Venzke, S., M. R. Allen, R. T. Sutton and D. P. Rowell, The Atmospheric Response over the.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Paper review EOF: the medium is the message 報告人:沈茂霖 (Mao-Lin Shen) 2015/11/10 Seminar report.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
Modern Navigation Thomas Herring MW 11:00-12:30 Room
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
MULTIVARIATE REGRESSION Multivariate Regression; Selection Rules LECTURE 6 Supplementary Readings: Wilks, chapters 6; Bevington, P.R., Robinson, D.K.,
Eigenvalues The eigenvalue problem is to determine the nontrivial solutions of the equation Ax= x where A is an n-by-n matrix, x is a length n column.
NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.
Christina Bonfanti University of Miami- RSMAS MPO 524.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principal Component Analysis (PCA)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
3 “Products” of Principle Component Analysis
Feature Extraction 主講人:虞台文.
Oceanography 569 Oceanographic Data Analysis Laboratory Kathie Kelly Applied Physics Laboratory 515 Ben Hall IR Bldg class web site: faculty.washington.edu/kellyapl/classes/ocean569_.
Chapter 13 Discrete Image Transforms
Central limit theorem revisited Throw a dice twelve times- the distribution of values is not Gaussian Dice Value Number Of Occurrences.
Principal Components Analysis ( PCA)
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
PRINCIPAL COMPONENT ANALYSIS(PCA) EOFs and Principle Components; Selection Rules LECTURE 8 Supplementary Readings: Wilks, chapters 9.
CSE 554 Lecture 8: Alignment
Application of Independent Component Analysis (ICA) to Beam Diagnosis
Multivariate Analysis: Theory and Geometric Interpretation
Principal Component Analysis
Feature space tansformation methods
Principal Components What matters most?.
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Lecture 13: Singular Value Decomposition (SVD)
Feature Selection Methods
Lecture 16. Classification (II): Practical Considerations
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Laboratory in Oceanography: Data and Methods MAR599, Spring 2009 Anne-Marie E.G. Brunner-Suzuki Empirical Orthogonal Functions

Motivation Miles’s class Distinguish patterns/noise Reduce dimensionality Prediction Smoothing

The Goal 1. Separate time and space of the data: 1. Filter out the noise and reveal “hidden” structure

Matlab Example 1

Data t: time; xt is one “map” in time. There are n timesteps and p different measurements at each timestep.

Matlab Example 2 – artificial signal

Summary EOF let’s us separate an ensemble of data into k different modes. Each mode has a ‘space’ (EOF=u) and ‘time’ (EC =c) component Pre-treating the data can be useful in finding “hidding” structures (taking out the temporal/spatial mean) But all the information is contained in the data It is “just” a mathematical construct. We, the researchers, are responsible for finding appropriate explanations.

Naming convention Empirical Orthogonal Functions Analysis Principal Component Analysis Discrete Karhunen–Loève Functions Hotelling transform Proper orthogonal decomposition

How to deal with gaps? Ignore them; leave them be. Introduce randomly generated data to fill gaps and test for M realizations Fill the gaps in each data series using e.g. optimal interpolation

Next time Some math: What happens inside the black box? How do we know how many modes are significant? Some problems and pitfalls More advanced EOF Matlab’s own function

References Preisendorfer Storch Hannachi

Laboratory in Oceanography: Data and Methods MAR599, Spring 2009 Anne-Marie E.G. Brunner-Suzuki Empirical Orthogonal Functions Part II

X has n timesteps and p different measurements. [n,p] = size(X); Use ‘reshape’ to convert from 3D to 2D: X=reshape(X3D, [nx*ny ntimes]); 2.Remove the mean from the data, so each column (=timeseries) has zero mean: X=detrend(X,o); Pre-treating the data X: 1. Shaping the data set:

1. Form a covariance matrix: 2. Solve the eigenvalue Problem: Cx R = R Λ. Λ is a diagonal matrix containing all the eigenvalues λ of Cx. The columns ri in R are the eigenvectors of Cx. Each corresponding to its λi. We pick the ri to be our EOF patterns: R= EOFs 3. We arrange the: λ1 > λ2….> λp and the ri correspondingly. How to do it.

Eigenvectors & Eigenvalues Cx R = R Λ Here, R is a set of vectors, that are transformed by Cx into the same vectors except a multiplication factor Λ. R changes in length, but not in direction. These R are called eigenvectors. The Λ are called eigenvalues. Also, because Cx is hermitian (diagonally symmetric: Cx’=Cx) and Cx has rank p, there will be p eigenvectors. Eigenvectors are always orthogonal.

4. All EOFs explain 100% of the variance. Each mode explains part of the total variance. 5. All eigenvectors are orthogonal to each other; Hence Empiriral ORTHOGONAL Functions. 6. To see how the EOFs evolve in time, we compute the ‘expansion coefficients ‘or amplitudes: ECi = X EOFi;

In Matlab: 1. Shape your data into time x space 2. Demean your data: X = detrend (X,o); 3. Compute the Covariance: Cx = cov(X); 4. Compute Eigenvectors, Eigenvalues: [EOFs, l] = eig(Cx); 5. Sort according to size. Matlab sorts in ascending order. 6. Compute EC: EC1 = X * EOFs(:, 1); 7. Compute variance explained: Var_expl = diag(l)/trace(l);

Normalization Often the EOFs are normalized, so that highest value is 1 or 100. As X = EOF *EC the EC will need to be adjusted correspondingly, as has to be valid.

How to understand this? Let’s assume we only have 2 samples xa and ya that evolve in time: If the all observations are random, there would be a blob in space. Any regularities would show up as directionalities in the blob. EOF Analysis aims to find these new directionalities, by defining a new coordinate system, where the new axis goes right along these dimensionalities

With p observations, we have p-dimensional space, and hence we want to find every cluster, by laying a new coordinate system (basis) through the data. EOF method takes all the variability in a time evolving field and breaks it into a (a few) standing oscillations and a time series to go with each oscillation. The EC show how the EOF modes vary in time.

A word about removing the mean Removing the time means has nothing to do with the process of finding eigenvectors, but it allows us to interpret Cx as a covariance matrix, and hence, we can understand our results. Strictly speaking one can find EOFs without removing any mean.

EOF via SVD SVD : Singular Value Decomposition It decomposes any n x p matrix X into the form: X = U S V’, U is a n x n orthonormal matrix S is a diagnoal n x p matrix with si,i elements on the diagonal. s are called singular values. The columns of U and V contain the singular vectors of X.

Connecting SVD and EOF X is the demeaned data matrix as before. 1. Cx = X’X = (U S V’)’ (U S V’) = VS’ U’ U S V’ = V S’S V’ 2. Cx = EOFs Λ EOFs’ (rewritten eigenvalue problem) Comparing 1. & 2.: EOFs = V (at least almost) Λ = S’ S: the squared singular values are the eigenvalues. The columns of V contain the eigenvectors of Cx= X’ X; our EOFs. The columns of U contain the eigenvectors of X’ X. Which is also the normalized time series.

How to do it 1. Use SVD to find U S and V such that X = U S V’ 2. Compute the eigenvalues of Cx. 3. The eigenvectors of Cx are the column vectors of V. We never have to actually compute Cx!

In Matlab 1. Shape your data into time x space 2. Demean your data: X = detrend (X,o); 3. Perform SVD: [ U, S, V ] = svd(X); 4. Compue Eigenvalues: EVal = diag(S.^2); 5. Compute explained variance: expl_var = EVal/sum(EVal); 6. EOFs are the column vectors of V’: EOFs = V’; 7. Compute Expansion Coefficients: EC = U*S;

There are basically two techniques: 1. Computing Eigenvector and Eigenvalues of the Covariance Matrix 2. Singular Value Decomposition (SVD) of the data. Both Methods give similar results. Check it out! However, 1. There are some differences in dimesionality. 2. SVD is much faster – especially when your data are above 1000 x 1000 points. The two techniques

Testing Domain Dependency If the first EOF is unimodal, the second bimodal, the EOF analysis might be domain dependent. Testing: Split your domain into two sections ( e.g. North and South) Repeat EOF for each domain Are the same results (unimodal and bi-modal structures) are obtained for each sub-domain? If yes: The EOF analysis is domain dependent. Interpretation becomes difficult or impossible A possibly solution are “rotated EOFs” (REOF): After a EOF analysis some of the Eigenvectors are rotated.

EOF from Hannachi example Winter (DJF) monthly SLP over the Northern Hemisphere (NH) from NCEP/NCAR reanalyses January 1948 to December The mean annual cycle was removed

Positive contours solid, negative contours dashed. EOFs have been multiplied by 100.

Selection Rules Visual.

North’s Rule of Thumb North et al defined “typical errors” between two neighboring eigenvalues λ: “typical errors” between neighboring eigenvectors ψ: n is the number of degrees of freedom, which is generally less than the number of data points. Are two modes two close, they are called degenerate.

ComplexEOF Allows to analyze propagating signals. Analyze a set of time series by creating a phase lag among between them by adding a 90degree phase shift. This is done in complex space using the Hilbert transform. Is cool technique, but pretty complex.

Monte Carlo Create surrogate data – a randomized data set by scrambling the monthly maps in the time domain, in order to break the chronological order. Compute EOF of scrambled dataset and analyze EOFs.

Matlab’s own functions PRINCOMP [COEFF,SCORE,latent] = princomp(X) [EOFs,EC, EigVal] = princomp (data); The EOFs are columns and so are the ECs. PCACOV [COEFF,latent,explained] = pcacov(V); [EOFs, EigVal, expl_var] = pcacov(data); I believe this uses svd

Assumptions we made Orthogonal Normal distributed data High signal to noise ratio Standing Patterns only “The mean” Problems that might occur: No physical interpretation possible Degenerate Modes Domain Dependency

A warning from von Storch and Navarra: “I have learned the following rule to be useful when dealing with advanced methods. Such methods are often needed to find a signal in a vast noisy space, i.e. the needle in the haystack. But after having the needle in our hand, we should be able to identify the needle by simply looking at it. Whenever you are unable to do so there is a good chance that something is rotten in the analysis.”

References R. W. Preisendorfer. Principal component analysis in meteorology and oceanography. Elsevier. Science, 1988 Hans v. Storch and Francis W. Zwiers: Statistical Analysis in Climate Research. Cambridge University Press, North, G.R., T.L. Bell, R.F. Cahalan, and F.J. Moeng, Sampling errors in the estimation of empirical orthogonal functions, Mon. Wea. Rev., 110, , Hannachi, A., I. T. Jolliffe and D. B. Stephenson: Empirical orthogonal functions and related techniques in atmospheric science: A review. International Journal of Climatology, 27, 1119–1152, 2007.