Tables, Figures, and Equations

Slides:



Advertisements
Similar presentations
Tables, Figures, and Equations
Advertisements

CHAPTER 27 Mantel Test From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Managerial Economics in a Global Economy
Lesson 10: Linear Regression and Correlation
Exploratory Factor Analysis
CHAPTER 24 MRPP (Multi-response Permutation Procedures) and Related Techniques From: McCune, B. & J. B. Grace Analysis of Ecological Communities.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
P M V Subbarao Professor Mechanical Engineering Department
Response Surface Method Principle Component Analysis
Lecture 7: Principal component analysis (PCA)
Eigenvalues and eigenvectors
Principal Component Analysis
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 22 Reliability of Ordination Results From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Curve-Fitting Regression
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
CHAPTER 17 Bray-Curtis (Polar) Ordination From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Indicator Species Analysis
Chapter 7 Data Screening From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 30 Structural Equation Modeling From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
CHAPTER 20 Detrended Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach,
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
CHAPTER 23 Multivariate Experiments From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
CHAPTER 18 Weighted Averaging From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. by Lale Yurttas, Texas A&M University Chapter 171 CURVE.
Separate multivariate observations
Eigenvectors and Eigenvalues
Empirical Modeling Dongsup Kim Department of Biosystems, KAIST Fall, 2004.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Statistics and Linear Algebra (the real thing). Vector A vector is a rectangular arrangement of number in several rows and one column. A vector is denoted.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Curve-Fitting Regression
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Discriminant Function Analysis Mechanics. Equations To get our results we’ll have to use those same SSCP matrices as we did with Manova.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Presented by: Muhammad Wasif Laeeq (BSIT07-1) Muhammad Aatif Aneeq (BSIT07-15) Shah Rukh (BSIT07-22) Mudasir Abbas (BSIT07-34) Ahmad Mushtaq (BSIT07-45)
Principal Components Analysis ( PCA)
Unsupervised Learning II Feature Extraction
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Introduction to Vectors and Matrices
Part 5 - Chapter
Information Management course
Part 5 - Chapter 17.
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Part 5 - Chapter 17.
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
5.4 General Linear Least-Squares
Principal Component Analysis (PCA)
Principal Components What matters most?.
Introduction to Vectors and Matrices
Principal Component Analysis
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
Regression and Correlation of Data
Presentation transcript:

Tables, Figures, and Equations From: McCune, B. & J. B. Grace. 2002. Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon http://www.pcord.com

Figure 14.1. Comparison of the line of best fit (first principal component) with regression lines. Point C is the centroid (from Pearson 1901).

Figure 14. 2. The basis for evaluating “best fit” (Pearson 1901) Figure 14.2. The basis for evaluating “best fit” (Pearson 1901). In contrast, least-squares best fit is evaluated from vertical differences between points and the regression line.

Figure 14.3. Outliers can strongly influence correlation coefficients.

Step by step 1. From a data matrix A containing n sample units by p variables, calculate a cross-products matrix: The dimensions of S are p rows  p columns.

Step by step 1. From a data matrix A containing n sample units by p variables, calculate a cross-products matrix: The dimensions of S are p rows  p columns. The equation for a correlation matrix is the same as above except that each difference is divided by the standard deviation, sj.

│S - lI│ = 0 2. Find the eigenvalues. Each eigenvalue (= latent root) is a lambda (l) that solves: │S - lI│ = 0 I is the identity matrix (App. 1). This is the "characteristic equation."

The coefficients in the polynomial are derived by expanding the determinant:

[S - lI]y = 0 Then find the eigenvectors, Y. For every eigenvalue li there is a vector y of length p, known as the eigenvector. Each eigenvector contains the coefficients of the linear equation for a given component (or axis). Collectively, these vectors form a p  p matrix, Y. To find the eigenvectors, we solve p equations with p unknowns: [S - lI]y = 0

4. Then find the scores for each case (or object) on each axis: Scores are the original data matrix post-multiplied by the matrix of eigenvectors: X = B Y n  p n  p p  p Y is the matrix of eigenvectors B is the original data matrix X is the matrix of scores on each axis (component)

For eigenvector 1 and entity i ... Score1 xi= y1ai1 + y2ai2 + ... + ypaip This yields a linear equation for each dimension.

5. Calculate the loading matrix. The p  k matrix of correlations between each variable and each component is often called the principal components loading matrix. These correlations can be derived by rescaling the eigenvectors or they can be calculated as correlation coefficients between each variable and scores on the components.

Geometric analog 1. Start with a cloud of n points in a p-dimensional space. 2. Center the axes in the point cloud (origin at centroid) 3. Rotate axes to maximize the variance along the axes. As the angle of rotation (q) changes, the variance (s2) changes. Variance along axis v = s2v = y' S y 1p pp p1

At the maximum variance, all partial derivatives will be zero (no slope in all dimensions). This is another way of saying that we find the angle of rotation q such that: for each component (the lower case delta () indicates a partial derivative).

Figure 14.4. PCA rotates the point cloud to maximize the variance along the axes.

Figure 14.5. Variance along an axis is maximized when the axis skewers the longest dimension of the point cloud. The axes are formed by the variables (attributes of the objects).

Example calculations, PCA Start with a 2  2 correlation matrix, S, that we calculated from a data matrix of n  p items where p = 2 in this case:

We need to solve for the eigenvalues, l, by solving the characteristic equation: Substituting our correlation matrix, S:

Now expand the determinant:

We then solve this polynomial for the values of l that will satisfy this equation. Since a = 1, b = -2, and c = 0.84, then Solving for the two roots gives us l1 = 1.4 and l2 = 0.6.

Now find the eigenvectors, Y. For each l there is a y: For the first root we substitute l1 = 1.4, giving:

Multiplying this out gives two equations with two unknowns: Solve these simultaneous equations (y1 = 1 and y2 = 1). Setting up and solving the equations for the second eigenvector yields y1 = 1 and y2 = -1.

We now normalize the eigenvectors, rescaling them so that the sum of squares = 1 for each eigenvector. In other words, the eigenvectors are scaled to unit length. The scaling factor k for each eigenvector i is So for the first eigenvector,

Then multiply this scaling factor by all of the items in the eigenvector: The same procedure is repeated for the second eigenvector, then the eigenvectors are multiplied by the original data matrix to yield the scores (X) for each of the entities on each of the axes (X = A Y).

The broken stick eigenvalue is where p is the number of columns and j indexes axes k through p.

Addendum on randomization tests for PCA (not in McCune & Grace 2002, but in PC-ORD version 5; evaluation based on Peres-Neto et al. (2005)) The randomization: shuffle values within variables (columns), then recompute the correlation matrix and eigenvalues. Repeat many times. Compare the actual eigenvalues in several ways with the eigenvalues from the randomizations. Calculate p value as: where n = number of randomizations where the test statistic ≥ observed value N = the total number of randomizations.

Rnd-Lambda – Compare eigenvalue for an axis to the observed eigenvalue for that axis. fairly conservative and generally effective criterion more effective with uncorrelated variables included in the data, than Avg-Rnd performs better than other measures with strongly non-normal data. Rnd-F – Compare pseudo-F-ratio for an axis to the observed pseudo-F for that axis. Pseudo-F-ratio is the eigenvalue for an axis divided by the sum of the remaining (smaller) eigenvalues. particularly effective against uncorrelated variables performs poorly with grossly nonnormal error structures Avg-Rnd – Compare observed eigenvalue for a given axis to the average eigenvalue obtained for that axis after randomization good when the data did not contain uncorrelated variables. less stringent, too liberal when the data contain uncorrelated variables.