1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.

Slides:



Advertisements
Similar presentations
Step three: statistical analyses to test biological hypotheses General protocol continued.
Advertisements

Krishna Rajan Data Dimensionality Reduction: Introduction to Principal Component Analysis Case Study: Multivariate Analysis of Chemistry-Property data.
Canonical Correlation
Covariance Matrix Applications
An Introduction to Multivariate Analysis
Multivariate analysis of community structure data Colin Bates UBC Bamfield Marine Sciences Centre.
Dimension reduction (1)
Correlation and regression
Chapter 17 Overview of Multivariate Analysis Methods
Lecture 7: Principal component analysis (PCA)
Multivariate Methods Pattern Recognition and Hypothesis Testing.
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Principal Component Analysis
Dimensional reduction, PCA
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
Multivariate Analysis Techniques
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Community Ordination and Gamma Diversity Techniques James A. Danoff-Burg Dept. Ecol., Evol., & Envir. Biol. Columbia University.
Factor Analysis Psy 524 Ainsworth.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Summarized by Soo-Jin Kim
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
Multivariate Data Analysis Chapter 8 - Canonical Correlation Analysis.
Canonical Correlation Analysis, Redundancy Analysis and Canonical Correspondence Analysis Hal Whitehead BIOL4062/5062.
Review of Statistics and Linear Algebra Mean: Variance:
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Canonical Correlation Analysis and Related Techniques Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Available at Chapter 13 Multivariate Analysis BCB 702: Biostatistics
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Multivariate Data Analysis Chapter 1 - Introduction.
ORDINATION What is it? What kind of biological questions can we answer? How can we do it in CANOCO 4.5? Some general advice on how to start analyses.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Lecture 12 Factor Analysis.
Environmental Remote Sensing GEOG 2021 Lecture 3 Spectral information in remote sensing.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Selection and Extraction Michael J. Watts
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Principal Component Analysis
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Canonical Correlation Analysis (CCA). CCA This is it! The mother of all linear statistical analysis When ? We want to find a structural relation between.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Historical Vegetation Analysis
Descriptive Statistics vs. Factor Analysis
Multivariate Statistics
Principal Components Analysis
Principal Component Analysis (PCA)
Multivariate Analysis of a Carbonate Chemistry Time-Series Study
PCA of Waimea Wave Climate
Principal Component Analysis
Presentation transcript:

1 Multivariate Statistics ESM 206, 5/17/05

2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions with large datasets with many variables Ordination: find a (hopefully small) number of composite variables that capture most of the variability among data points Cluster Analysis: discover natural groupings of similar data points Discriminant Analysis: find a (hopefully small) number of composite variables that can be used to predict the levels of a categorical dependent variable Canonical Correlation Analysis: find relationships between two groups of variables –“Dependent variable” is multivariate

3 WHAT CAN MULTIVARIATE STATISTICS DO? Reflect more accurately the true multidimensional nature of environmental systems Provide a way to handle large datasets with large numbers of variables by summarizing the redundancy Provide rules for combining variables in an “optimal” way Provide a means of detecting and quantifying truly multivariate patterns that arise out of correlational structure of the variable set Provide a means of exploring complex data sets for patterns and relationships from which hypotheses can be generated and subsequently tested experimentally

4 DISTINGUISHING ECOLOGICAL NICHES OF 3 SPECIES

5

6

7 ORDINATION Simplify the interpretation of complex data by organizing sampling entities along independent gradients or factors defined by combinations of interrelated variables Uncover a more fundamental set of factors that account for the major patterns across all of the original variables If a few major gradients explain much of the variability in data, then data can be interpreted with respect to these gradients without loss of information

8 PRINCIPAL COMPONENTS ANALYSIS (PCA) Most commonly used ordination technique Given P correlated variables, extract P principal components –Linear combinations of the variables –Uncorrelated with one another –First PC is direction through data cloud that captures the most variance in data –Second PC is direction perpendicular to first that captures the most remaining variance –Etc. Assumptions of PCA: 1.Data are multivariate normal 2.Data are independent 3.Observed variables depend linearly on underlying factors May need to transform data to satisfy these Unless variables are all measured on same scale, use correlations rather than covariances –Gives equal weight to variability in all variables

9 EXAMPLE: CHEMICAL SOLUBILITY 72 chemical compounds tested for solubility in each of 6 solvents –Solubility measure on log scale Strong (but not perfect) correlations among the 6 solvents Can we use fewer than 6 variables to characterize each chemical?

10 SOLUBILITY PCA Eigenvalue indicates how much of the variability in data is explained by the PC –Magnitude depends on number of variables (and variances if done with covariance matrix) –Instead look at percents Eigenvector gives coefficients of linear relationship of PC to each variable –NOTE: some software scales the eignvectors differently Interpretation: PC1 is axis of overall increasing solubility PC2 is axis of differential solubility in 1-Ocatanol & Ether vs. other 4 solvents

11

12 CHARACTRISTICS OF ORDINATION Organizes sampling entities (e.g., species, sites, observations) along continuous environmental gradients Assesses relationships within single set of variables; doesn’t define relationship between a set of independent variables and one or more dependent variables –However, PC’s can be used as independent variables in a regression Reduces dimensionality of multivariate data set by condensing large # of original variables into smaller set of new composite variables with minimal loss of information Summarizes data redundancy by placing similar entities in proximity in ordination space Defines new composite variables (e.g., principal components) as weighted linear combinations of the original variables Eliminates noise from a multivariate data set by recovering patterns in first few composite dimensions and deferring noise to subsequent axes

13 OTHER ORDINATION TECHNIQUES Polar Ordination (PO) Factor Analysis (FA) –This is often used as a generic term meaning “ordination” in social sciences Nonmetric Multidimensional Scaling (NMMDS) –Relaxes normality and linearity assumptions by using ranks Correspondence Analysis (CA) –Allows data (e.g., species abundance) to take on peak values at intermediate levels of the gradient –Also called Reciprocal Averaging Detrended Correspondence Analysis (DCA) –Deals particularly well with nonlinear relationships Canonical Correspondence Analysis (CCA) –Like CA, but ordination of variables of interest (e.g., species abundance) is constrained to depend linearly on other variables (e.g., environmental characteristics) measured at same sites

14 FURTHER READING McGarigal, K., S. Cushman, and S. Stafford Multivariate Statistics for Wildlife and Ecology Research (Springer-Verlag, New York). Gotelli, H.J., and A.M. Ellison A Primer of Ecological Statistics (Sinauer, Sunderland, MA); Chapter 12. Spicer, J Making Sense of Multivariate Data Analysis (Sage Press, Thousand Oaks).