Multivariate Statistical Methods Measuring and Testing Multivariate Distances by Jen-pei Liu, PhD Division of Biometry, Department of Agronomy, National Taiwan University and Division of Biostatistics and Bioinformatics National Health Research Institutes 2019/2/4 Copyright by Jen-pei Liu, PhD
Measuring and Testing Multivariate Distances Introduction Distances between Individual Observations Distances between Populations and Samples Distances Based on Proportions Presence-absence Data The Mantel Randomization Test Summary 2019/2/4 Copyright by Jen-pei Liu, PhD
Copyright by Jen-pei Liu, PhD Introduction Multivariate Problems in terms of distances Between single observations Between samples of observations Between populations of observations 2019/2/4 Copyright by Jen-pei Liu, PhD
Copyright by Jen-pei Liu, PhD Introduction Mandible measurements of canine groups Dogs, wolves, jackals, cuons, and dingos How far one of these groups from the other groups Two groups are close if two animals have similar mandible measurements Distance measures for representation of similar measurements 2019/2/4 Copyright by Jen-pei Liu, PhD
Copyright by Jen-pei Liu, PhD Introduction Different types of measurements Example of 16 colonies of a butterfly species Two sets of distances environmental genetic Relationship between these two sets of distances 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Individual Observations N objects with p variables X1,…,Xp Object i: Xi1, Xi2, …, Xip Object j: Xj1, Xj2, …, Xjp Distance measures between object i and object j Graphical presentation of two or three variables 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Individual Observations 2019/2/4 Copyright by Jen-pei Liu, PhD
Copyright by Jen-pei Liu, PhD 2019/2/4 Copyright by Jen-pei Liu, PhD
Copyright by Jen-pei Liu, PhD 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Individual Observations Example: Dogs and Related Species Standardized Variables Group X1 X2 X3 X4 X5 X6 Modern Dogs -0.46 -0.46 -0.68 -0.69 -0.45 -0.57 Golden Jackals -1.41 -1.79 -1.04 -1.29 -0.80 -1.21 Chinese Wolf 1.78 1.48 1.70 1.80 1.55 1.50 Indian Wolf 0.60 0.55 0.96 0.69 1.17 0.88 Cuon 0.13 0.31 -0.04 0.00 -1.10 -0.37 Dingo -0.52 0.03 -0.13 -0.17 0.03 0.61 Prehistoric Dog -0.11 -0.12 -0.78 -0.34 -0.41 -0.83 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Individual Observations Euclidean Distances between Seven Canine Group Modern Golden Chinese Indian Prehistoric dog jackal wolf wolf Cuon Dingo dog Modern dog -- Golden jackal 1.91 -- Chinese wolf 5.38 7.12 -- Indian wolf 3.38 5.06 2.14 -- Cuon 1.51 3.19 4.57 2.91 -- Dingo 1.56 3.18 4.21 2.20 1.67 -- Prehistoric dog 0.66 2.39 5.12 3.24 1.26 1.71 -- 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples Information about populations Means Variances Covariances Measures Between populations Penrose distance Mahalanobis distance Between populations and samples 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples Penrose Distance Variables: X1,…,Xp The ith population means:1i,…, pi The ith population variances: v1i,…,vpi 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples The Mahalanobis distance: takes correlation into consideration 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples The Mahalanobis distance between the samples and population 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples Example: Distances between Egyptian skulls Penrose’s distance between sample 1 and sample 2 P12 = (137.37-132.37)2/(4x21.112) + (133.60-132.70)2/(4x23.486) + (99.17-99.07)2/(4x24.180) + (50.53-50.23)2/(4x10.154) =0.023 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples The inverse of sample covariance matrix 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples The Mahalanobis distance between sample 1 and sample 2 D122 = (137.37-132.37)0.0483(137.37-132.37) +(137.37-132.37)0.0011(133.60-132.70) +…+(50.53-50.23)(-0.0022)(99.17-99.07) +(50.53-50.23)0.1041(50.53-50.23) =0.091 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples Penrose Distances (1) (2) (3) (4) (5) Early predynastic (1) -- Late predynastic (2) 0.023 -- 12-13th dynastic (3) 0.216 0.163 -- Ptolemaic (4) 0.493 0.404 0.108 -- Roman (5) 0.736 0.583 0.244 0.066 -- 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances between Populations and Samples Mahalanobis Distances (1) (2) (3) (4) (5) Early predynastic (1) -- Late predynastic (2) 0.091 -- 12-13th dynastic (3) 0.903 0.729 -- Ptolemaic (4) 1.881 1.594 0.443 Roman (5) 2.697 2.176 0.911 0.219 -- 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances Based on Proportions Animals of a certain species might be classified into K genetic classes Class Colony 1 Colony 2 Difference 1 p1 q1 p1-q1 2 p2 q2 p2-q2 . . . . k pk qk pk-qk 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances Based on Proportions Distance Measures 2019/2/4 Copyright by Jen-pei Liu, PhD
Distances Based on Proportions Distance Measures 2019/2/4 Copyright by Jen-pei Liu, PhD
Presence-absence Data Presence and absence of two species at 10 locations Site 1 2 3 4 5 6 7 8 9 10 Species 1 0 0 1 1 1 0 1 1 1 0 Species 2 1 1 1 1 0 0 0 0 1 1 2019/2/4 Copyright by Jen-pei Liu, PhD
Presence-absence Data Species 2 Species 1 Present Absent Total Present a b a+b Absent c d c+d Total a+c b+d n 2019/2/4 Copyright by Jen-pei Liu, PhD
Presence-absence Data Distance Measures Simple matching index: (a+d)/n Ochiai index: a/[(a+b)(a+c)]1/2 Dice-Sorensen index: 2a/(2a+b+c) Jaccard index: a/(a+b+c) 2019/2/4 Copyright by Jen-pei Liu, PhD
The Mantel Randomization Test Detection of space and time clustering of disease – whether cases of a disease that occur close in space also tend to be close in time Two 4x4 distance matrices of 4 objects Symmetric matrices 2019/2/4 Copyright by Jen-pei Liu, PhD
The Mantel Randomization Test 2019/2/4 Copyright by Jen-pei Liu, PhD
The Mantel Randomization Test 2019/2/4 Copyright by Jen-pei Liu, PhD
The Mantel Randomization Test Mantel Test Whether the elements in M and E show some significant correlation Matching m12 with e12, m13 with e13, etc. 2019/2/4 Copyright by Jen-pei Liu, PhD
The Mantel Randomization Test M stay as it is Random order chosen for E Order of 3,2,4,1 2019/2/4 Copyright by Jen-pei Liu, PhD
The Mantel Randomization Test Time Distances (1) (2) (3) (4) (5) 4000 - 3000 B.C. (1) -- 2999 – 2000 B.C. (2) 0.70 -- 1999 - 1000 B.C. (3) 2.15 1.45 -- 999 - 0 (4) 3.80 3.10 1.65 -- A.D (5) 4.15 3.45 2.00 0.35 -- 2019/2/4 Copyright by Jen-pei Liu, PhD
The Mantel Randomization Test A total of 5!=120 ways to re-order the five samples A total elements in the randomization distribution in the correlation The observed correlation is 0.954 There are only two correlations greater than or equal to 0.954 The p-value (1-sided) = 2/120 = 0.017 2019/2/4 Copyright by Jen-pei Liu, PhD
Copyright by Jen-pei Liu, PhD Summary Euclidean distances Penrose distance Mahalanobis distance Distance measures for proportions Mantel test for similarity between distance matrices 2019/2/4 Copyright by Jen-pei Liu, PhD