Download presentation
Presentation is loading. Please wait.
Published byArchibald Newman Modified over 9 years ago
1
Canonical Correlation Analysis, Redundancy Analysis and Canonical Correspondence Analysis Hal Whitehead BIOL4062/5062
2
Canonical Correlation Analysis Redundancy Analysis Canonical Correspondence Analysis
3
Multivariate Statistics with Two Groups of Variables Look at relationships between two groups of variables –species variables vs environment variables (community ecology) –genetic variables vs environmental variables (population genetics) Variables Units X’s Y’s
4
Canonical Correlation Analysis Multivariate extension of correlation analysis Looks at relationship between two sets of variables
5
Canonical Correlation Analysis Given a linear combination of X variables: F = f 1 X 1 + f 2 X 2 +... + f p X p and a linear combination of Y variables: G = g 1 Y 1 + g 2 Y 2 +... + g q Y q The first canonical correlation is: Maximum correlation coefficient between F and G, for all F and G F 1 ={f 11,f 12,...,f 1p } and G 1 ={g 11,g 12,...,g 1q } are corresponding canonical variates
6
Canonical Correlation Analysis X1X1 X2X2 Y1Y1 Y2Y2 F F(16) F(7) G G(16) G(7) Maximize r(F,G)
7
Canonical Correlation Analysis The first canonical correlation is: Maximum correlation coefficient between F and G, for all F and G F 1 ={f 11,f 12,...,f 1p } and G 1 ={g 11,g 12,...,g 1q } are corresponding first canonical variates The second canonical correlation is: Maximum correlation coefficient between F and G, for all F, orthogonal to F 1, and G, orthogonal to G 1 F 2 ={f 21,f 22,...,f 2p } and G 2 ={g 21,g 22,...,g 2q } are corresponding second canonical variates etc.
8
Canonical Correlation Analysis So each canonical correlation is associated with a pair of canonical variates Canonical correlations decrease Canonical correlations are higher than generally found with simple correlations –as coefficients are chosen to maximize correlations
9
Canonical Correlation Analysis Correlation Matrix: X 1 X 2 X 3... X p Y 1... Y q X 1 X 2. A (pxp) C (pxq). X p Y 1. C' (qxp) B (qxq). Y q Canonical correlations are: Squareroots of Eigenvalues of B -1 C' A -1 C Canonical variates for Y variables are Eigenvectors Number of canonical correlations = min(No. X’s, No. Y ’s) Can test whether canonical correlations are significantly different from 0
10
Canonical Correlation Analysis What are the canonical correlations? Are they, in toto, significantly different from zero? Are some significant, others not? Which ones? What are the corresponding canonical variates? How does each original variable contribute towards each canonical variate (use loadings)? How much of the joint covariance of the two sets of variables is explained by each pair of canonical variates?
11
Relationship to: Canonical Variate Analysis We can define dummy (1:0) variables to define groups of units: – 1 = in group; 0 = out of group A canonical correlation analysis between these dummy grouping variables and the original variables is equivalent to a canonical variate analysis
12
Redundancy Analysis y 1 y 2 Correlation Analysis x => y Simple Regression Analysis X => y Multiple Regression Analysis (X={x 1,x 2,...}) Y 1 Y 2 Canonical Correlation Analysis X => Y Redundancy Analysis How one set of variables (X) may explain another set (Y)
13
Redundancy Analysis “Redundancy” expresses how much of the variance in one set of variables can be explained by the other
14
Redundancy Analysis Output: canonical variates describing how X explains Y non-canonical variates (principal components of the residuals of Y) results may be presented as a biplot: two types of points representing the units and X-variables, vectors giving the Y-variables
15
Hourly records of sperm whale behaviour Variables: –Mean cluster size –Max. cluster size –Mean speed –Heading consistency –Fluke-up rate –Breach rate –Lobtail rate –Spyhop rate –Sidefluke rate –Coda rate –Creak rate –High click rate Data collected: –Off Galapagos Islands –1985 and 1987 Units: –hours spent following sperm whales –440 hours
16
Hourly records of sperm whale behaviour Variables: –Mean cluster size –Max. cluster size –Mean speed –Heading consistency –Fluke-up rate –Breach rate –Lobtail rate –Spyhop rate –Sidefluke rate –Coda rate –Creak rate –High click rate Data collected: –Off Galapagos Islands –1985 and 1987 Units: –hours spent following sperm whales –440 hours Physical Acoustic
17
Canonical Correlation Analysis: Physical vs. Acoustic Behaviour 1 2 3 Canonical correlations0.72 0.490.21 P-values0.000.000.06 Redundancies: V(Acoustic) | V(Physical)34%20%<1% V(Physical) | V(Acoustic)32% 8%<1%
18
Physical vs. Acoustic Behaviour Canonical correlations 1 2 Loadings: Mean cluster size-0.95 0.07 Max. cluster size-0.85 0.47 Mean speed 0.21 0.06 Heading consistency 0.32-0.27 Fluke-up rate 0.73 0.23 Breach rate-0.16 0.02 Lobtail rate-0.22 0.03 Spyhop rate-0.18 0.32 Sidefluke rate-0.21 0.35 Coda rate-0.64 0.64 Creak rate-0.50 0.79 High click rate 0.76 0.64
19
Canonical Correspondence Analysis Canonical correlation analysis assumes a linear relationship between two sets of variables In some situations this is not reasonable (e.g. community ecology) Canonical correspondence analysis assumes Gaussian (bell-shaped) relationship between sets of variables “Species” variables are Gaussian functions of “Environmental” variables CANOCO
20
Canonical Correlation Analysis Canonical Correspondence Analysis
24
Canonical correspondence analysis: Dutch spiders 26 environmental variables 12 spider species 100 samples (pit-fall traps) Axes 1 2 3 4 Eigenvalues.535.214.063.019 Species-environment correlations.959.934.650.782 Cumulative percentage variance of species data 46.6 65.2 70.7 72.3 of species-environment relation 63.2 88.5 95.9 98.2
25
Axis 1 Axis 2
26
Canonical correspondence analysis can be detrended
27
The ‘Horseshoe effect’ Environmental Gradient Sp A 0000 00011 Sp B000000110 Sp C000000110 Sp D 000001100 Sp E000011100 Sp F000111000 Sp G 000110000 Sp H001100000 Sp I111000000
28
Axis 1 Axis 2
29
Detrended Axis 1 Detrended Axis 2 Detrended Canonical Correspondence Analysis
30
Canonical Correlation Analysis –Examines relationship between two sets of variables Redundancy Analysis –Examines how set of dependent variables relates to set of independent variables Canonical Correspondence Analysis –Counterpart of Canonical Correlation and Redundancy Analyses when relationship between sets of variables is Gaussian not linear
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.