Download presentation
Presentation is loading. Please wait.
Published byGerard Parrish Modified over 8 years ago
1
Multivariate Data Analysis G. Quinn, M. Burgman & J. Carey 2003
2
Objects Things we wish to compare –sampling or experimental units –e.g. quadrats, animals, plants, cages etc.
3
Variables Characteristics measured from each object –usually continuous variables –e.g. counts of species, size of body parts etc.
4
Ecological data Objects: –sampling units (SU’s, e.g. quadrats, plots etc.) Variables: –species abundances and/or environmental data Common in community ecology
5
Wisconsin forests (Peet & Loucks 1977) Plots (quadrats) in Wisconsin forests Number of individuals of each species of tree recorded in each quadrat Objects: –quadrats Variables: –abundances of each tree species
6
PlotBur oakBlack oakWhite oakRed oaketc. 19853 28944 33890 45796 56079 60078 etc. Data
7
Garroch Head dumping ground (Clarke & Ainsworth 1993) Sewage sludge dumping ground in bay Transect across dumping ground Core of mud at each of 10 stations along transect Objects: –stations Variables: –metal concentrations in ppm
8
StationCuMnCoNiZnCdetc. 126247014341600 230117015321560.2 33739412381820.2 47434912412270.5 511531710373292.2 etc. Data
9
Morphological data Objects: –usually organisms or specimens Variables: –morphological measurements
10
Morphological data Morphological variation between dog species/types Objects: –dog types (7) Variables: –sizes of 6 different parts of mandible –mandible breadth, mandible height, etc.
11
Variable Dog type1234 5 6 Modern dog9.721.019.47.732.036.5 Jackal8.116.718.37.030.332.9 Chinese wolf13.527.326.810.641.948.1 Indian wolf11.524.324.59.340.044.6 Cuon10.723.521.48.528.837.6 Dingo 9.622.621.18.334.443.1 Prehistoric dog10.322.119.18.132.325.0 Data
12
Presentation of Multivariate Data V1 V2.......... Vn O1 O2. Op x x x x x x x x x Raw data matrix Resemblance matrix Ordination Classification created using correlations, covariances or dissimilarity indices O1 O2. Op O1 O2.. Op
13
Data Standardization Adjusting of data so that means and/or variances or totals are the same for each variable. examples: –1) centering + standardizing x i ' = –2) rescaling relative to the maximum x i ' = x i - x s x i x max
14
Mantel Test A statistical test of association between the corresponding elements of two matrices. 1)Calculate r 0, the correlation between the elements in the matrices (cophenetic correlation) 2)Randomly permute the rows and corresponding columns of one matrix. 3)Calculate r.
15
Matrix B unchanged Matrix A 1 2 3 4 5 1 0 2 20 0 3 41 39 0 4 12 25 53 0 5 13 14 45 17 0 Matrix B 1 2 3 4 5 1 0 2 84 0 3 26 51 0 4 10 17 45 0 5 22 35 28 32 0 Random permutation rows (and columns) of Matrix A 2 1 5 4 3 2 0 1 20 0 5 14 13 0 4 25 12 17 0 3 39 41 45 53 0 1) 2)
16
Mantel Test 4)Repeat steps 2 and 3 many times (at least 1000). 5)Estimate the likelihood of r 0 by comparing it to the randomization distribution of r.
17
Principal Components Analysis Aims to reduce large number of variable to smaller number of summary variables called Principal Components (or factors), that explain most of the variation in the data. Is basically a rotation of axes after centring to the means of the variables, the rotated axes being the Principal Components. Is usually carried out using a matrix algebra technique called eigenanalysis.
18
Steps in PCA 1)From raw data matrix, calculate correlation matrix, or covariance matrix on standardized variables NO 3 Total Total N.... Organic N Site 1 Site 2 Site 3 : Site Site Site.... 1 2 3 Site 1 1 Site 2 0.37 1 Site 3 0.84 0.13 1 :
19
Steps in PCA 2)Calculate eigenvectors (weightings of each original variable on each component) and eigenvalues (= "latent roots") (relative measures of the variation explained by each component)
20
Eigenvectors z ik = c 1 y i1 + c 2 y i2 +.. c j y ij +.. + c p y ip Where z ik = score for component k for object i y i = value of original variable for object i c j = factor score coefficient (weight) of variable for component k Example: soil chemistry in a forest z ik = c 1 (NO 3 ) + c 2 (total organic N) + c 3 (total N) +.. the objects are sampling sites the variables are chemical measurements, e.g. total N
21
Steps in PCA - continued 3)Decide how many components to retain (scree plot of eigenvalues) 12345678 Factor 0 1 2 3 4 5 Eigenvalue
22
Steps in PCA 4)Using factor score coefficients, calculate factor score = coefficient x (standardized) variable
23
Steps in PCA 5)Position objects on scatterplot, using factor scores on first two (or three) Principal Components -3-20123 FACTOR(1) -2 0 1 2 3 FACTOR(2) Site 1 Site 2 Site 3
24
Dissimilarity Indices Dissimilarity indices: –measure how different objects are in terms of their variable values –how different sampling units are in species composition –how different organisms are in morphological structure
25
Dissimilarity Indices Dissimilarity: –calculated for each pair of objects in data set –dissimilarity between 2 quadrats in terms of species composition –dissimilarity between 2 dogs in terms of morphological structure
26
Dissimilarity Consider 2 objects j and k (eg. 2 quadrats) Let y ij and y ik be values for variable i in objects j and k: QuadratSp1Sp2Sp3i = 1 to 3 j369 k61218
27
QuadratSp1Sp2Sp3i = 1 to 3 j369 k61218 For sp1, y 1j = 3 and y 1k = 6 For sp2, y 2j = 6 and y 2k = 12 For sp3, y 3j = 9 and y 3k = 18
28
Euclidean Distance (y ij - y ik ) 2 [(3-6) 2 +(6-12) 2 +(9-18) 2 ] = 11.2
29
Euclidean Distance Distance between objects when plotted in multidimensional (multivariable) space 100 50 0 0 100 Abundance of species 1 Abundance of species 2 Quadrat 1 Quadrat 2 Euclidean distance
30
-where min(y ij,y ik ) = sum of lesser abundance of each species when it occurs in both sampling units -note summation over species Bray-Curtis (Czekanowski) 2 min(y ij,y ik ) |y ij - y ik | 1 - = (y ij + y ik ) (y ij + y ik )
31
2 min(y ij,y ik ) |y ij - y ik | 1 - = (y ij + y ik ) (y ij + y ik ) 1 - [(2)(3+6+9)/(9+18+27)]= [(3+6+9)/(9+18+27)] = 0.33= 0.33
32
reach maximum value (eg. 1) when quadrats have no species in common QuadratSp1Sp2Sp3 1030 2204 Euclidean = 5.4 Bray-Curtis = 1 Dissimilarities in ecology
33
equal 0 when quadrats are identical in species abundances QuadratSp1Sp2Sp3 1247 2247 Euclidean =0 Bray-Curtis =0
34
Preferred dissimilarity indices Species abundance data: –zeros common –max. value when quadrats have no species in common –Bray-Curtis preferred Measurement data: –zeros uncommon –Euclidean OK
35
Cluster Analysis Agglomerative / divisive Hierarchical / non-hierarchical SAHN - Sequential Agglomerative Hierarchical Non-overlapping classification
36
Distance Matrix ABCDE A - B 2 - C 6 5 - D10 9 4 - E 9 8 5 3 -
37
Average Linkage (UPGMA) Unweighted Pair-Group Method of Arithmetic Averaging Distance measured using the average distance of a point to a cluster
38
From above, dist(AC) = 6 dist(BC) = 5 In new matrix, group AB is (6 + 5)/2 from C Shortest distance is now 3, between D and E 2) A/BCDE A/B - C 5.5 - D 9.5 4 - E 8.5 5 3 - Shortest distance is 2, between A and B 1) ABCDE A - B 2 - C 6 5 - D10 9 4 - E 9 8 5 3 -
39
3) A/B CD/E A/B - C 5.5 - D/E 94.5 - 4) A/B C/D/E A/B - C/D/E 7.83 - From Step 2, dist(CD) = 4 dist(CE) = 5 In new matrix, group DE is (4 + 5)/2 from C
40
Dendrograms Linkage values can be used to construct a dendrogram 2 4 6 8 Distance ABCDE DistanceGroups 0A, B, C, D, E 2(A, B), C, D, E 3(A, B), C, (D, E) 4.5(A, B), (C, D, E) 7.8(A, B, C, D, E)
41
Other Linkage Methods Single Linkage (Nearest Neighbour) distance measured to closest point in cluster Complete Linkage (Furthest Neighbour) distance between two clusters defined as the furthest distance between any two points in them
42
Minimum Spanning Trees Edges linked to nearest points (vertices) MST may be mapped onto eigenspace, showing which points are distorted in two dimensions
43
Minimum Spanning Trees Steps: Find minimum value in resemblance matrix. Draw the two points and join with a line. Write the distance value on the line. Find the next lowest value in the matrix. Draw these points and join them with a line. Repeat until all points have been drawn and connected to some other point. Redraw the whole plot to make the line lengths representative of the distances.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.