Download presentation
Presentation is loading. Please wait.
Published byLizbeth Payne Modified over 9 years ago
1
Introduction to Multivariate Analysis and Multivariate Distances Hal Whitehead BIOL4062/5062
2
Data matrices Problems with data matrices –missing values –outliers Matrices used in multivariate analysis Multivariate distances Association matrices
3
The Data Matrix Variables: Units:
4
The Data Matrix
5
Visualize Data Matrix as: Points in multidimensional space
6
Problems with Data Matrix Missing values Outliers Units not independent Many zeros Not multivariate normal
7
Missing Data Often present in ecological, or other biological, data delete columns of data matrix
8
Missing Data Often present in ecological, or other biological, data delete columns of data matrix delete rows of data matrix
9
Missing Data Often present in ecological, or other biological, data delete columns of data matrix delete rows of data matrix just delete pairs of elements where one is missing
10
Missing Data Often present in ecological, or other biological, data delete columns of data matrix delete rows of data matrix just delete pairs of elements where one is missing interpolate 0.12
11
Outliers Statistical packages often indicate “outliers” *** WARNING *** Case 86 has large leverage (Leverage = 0.252) If plausibly: –the result of biological, or other, processes outside the scope of the model being used, – or the results of measurement or coding error, – they may be discarded Otherwise they should be retained –(perhaps use a different model)
12
Problems with Data Matrix Missing values Outliers Units not independent –Not a problem unless doing tests Many zeros –Special methods (e.g. correspondence analysis) Not multivariate normal –Transform if possible
13
Uses of Multivariate Analysis Large data sets –simplify –summarize –find patterns Analyze groupings of units Find groupings of units Examine relationships between variables
14
Some Matrices Used in Multivariate Analysis Data matrix: rectangular –units i=1,…,n –variables j, k Covariance matrix between variables: symmetric (square/triangular) –c jk = Σ (x ij -x j ) · (x ik -x k ) / (n-1) [x k = mean(x ik )] Correlation matrix between variables: symmetric (square/triangular) –r jk =c jk /(S j S k ) [S k = SD(x ik )]
15
Data Matrix
16
Covariance Matrix
17
Correlation Matrix
18
Multivariate distances between units or groups of units 1. Euclidean distance p variables
19
Multivariate distances between units or groups of units 2. Penrose distance p variables S k 2 variance of x ik Corrects for different units, different ranges of units of variables
20
Multivariate distances between units or groups of units 3. Mahalanobis distance p variables v rs elements of inverse of covariance matrix Corrects for correlations between variables
21
3 species of iris; 4 measurements Euclidean distances: A 0 B 3.2 0 C 4.8 1.6 0 A B C Penrose distances: A 0 B 2.8 0 C 3.9 1.5 0 A B C Mahalanobis distances: A 0 B 89.9 0 C 179.4 17.2 0 A B C
22
The Standard Data Matrix Variables: Units:
23
The Association Matrix Units:
24
Association matrices Social structure –association between individuals Community ecology –similarity between species, sites –dissimilarities between species sites Genetic distances Correlation matrices Covariance matrices Distance matrices –Euclidean, Penrose, Mahalanobis Similarity Dissimilarity
25
Association matrices Dissimilarity/Similarity Genetic relatedness among bottlenose dolphins (Krutzen et al. 2003) Mahalanobis distances between iris species: A 0 B 89.9 0 C 179.4 17.2 0 A B C
26
Association matrices Symmetric/Asymmetric Genetic relatedness among bottlenose dolphins (Krutzen et al. 2003) Grooming rates of capuchin monkeys (Perry 1996)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.