Multivariate Statistical Methods

Slides:



Advertisements
Similar presentations
CHAPTER 27 Mantel Test From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Advertisements

Lecture 3: A brief background to multivariate statistics
Permutation Tests Hal Whitehead BIOL4062/5062.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
CLUSTERING PROXIMITY MEASURES
Chapter 10 Curve Fitting and Regression Analysis
Terminology species data = the measured variables we want to explain (response or dependent variables) environmental data = the variables we use for explaining.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Descriptive Statistics: Numerical Measures
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000.
Sampling Prepared by Dr. Manal Moussa. Sampling Prepared by Dr. Manal Moussa.
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
SA basics Lack of independence for nearby obs
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Elements of cluster analysis Purpose of cluster analysis Various clustering techniques Agglomerative clustering Individual distances Group distances Other.
Separate multivariate observations
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Chapter 9 Two-Sample Tests Part II: Introduction to Hypothesis Testing Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social & Behavioral.
Chapter 2 Dimensionality Reduction. Linear Methods
The use of the Chi-square test when observations are dependent by Austina S S Clark University of Otago, New Zealand.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Principal Coordinate Analysis, Correspondence Analysis and Multidimensional Scaling: Multivariate Analysis of Association Matrices BIOL4062/5062 Hal Whitehead.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 1 Introduction to Statistics 1-4/1.5Collecting Sample Data.
Classification. Similarity measures Each ordination or classification method is based (explicitely or implicitely) on some similarity measure (Two possible.
Multivariate Data Analysis  G. Quinn, M. Burgman & J. Carey 2003.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
1 Matrix Algebra and Random Vectors Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Multivariate Data Analysis Chapter 2 – Examining Your Data
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L10.1 Lecture 10: Cluster analysis l Uses of cluster analysis.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Introduction to Multivariate Analysis and Multivariate Distances Hal Whitehead BIOL4062/5062.
Multivariate Statistics with Grouped Units Hal Whitehead BIOL4062/5062.
Pattern Recognition Mathematic Review Hamid R. Rabiee Jafar Muhammadi Ali Jalali.
Copyright © 2008 by Nelson, a division of Thomson Canada Limited Chapter 18 Part 5 Analysis and Interpretation of Data DIFFERENCES BETWEEN GROUPS AND RELATIONSHIPS.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Methods of Presenting and Interpreting Information Class 9.
Descriptive Statistics ( )
Moran’s I and Correlation Coefficient r Differences and Similarities
Spatial statistics: Spatial Autocorrelation
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Chapter 8: Introduction to Statistics CIS Computational Probability.
Association between two categorical variables
CH 5: Multivariate Methods
Applied Multivariate Quantitative Methods
Multivariate community analysis
Statistical Methods for Biotechnology Products
Clustering and Multidimensional Scaling
Chapter 12 Nonparametric Methods
Classification (Dis)similarity measures, Resemblance functions
Multivariate Statistical Methods
Math 145.
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
STAT 145.
Matrix Algebra and Random Vectors
Applied Multivariate Quantitative Methods
Multivariate Statistical Analysis
Multivariate Statistical Methods
Multivariate Statistical Methods
数据的矩阵描述.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
STAT 245.
Multivariate Methods Berlin Chen
Math 145 September 5, 2007.
Sampling Basics, Nonprobability and Simple Random Samples
Test #1 Thursday September 20th
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Representational similarity analysis based on the Euclidean distance.
Presentation transcript:

Multivariate Statistical Methods Measuring and Testing Multivariate Distances by Jen-pei Liu, PhD Division of Biometry, Department of Agronomy, National Taiwan University and Division of Biostatistics and Bioinformatics National Health Research Institutes 2019/2/4 Copyright by Jen-pei Liu, PhD

Measuring and Testing Multivariate Distances Introduction Distances between Individual Observations Distances between Populations and Samples Distances Based on Proportions Presence-absence Data The Mantel Randomization Test Summary 2019/2/4 Copyright by Jen-pei Liu, PhD

Copyright by Jen-pei Liu, PhD Introduction Multivariate Problems in terms of distances Between single observations Between samples of observations Between populations of observations 2019/2/4 Copyright by Jen-pei Liu, PhD

Copyright by Jen-pei Liu, PhD Introduction Mandible measurements of canine groups Dogs, wolves, jackals, cuons, and dingos How far one of these groups from the other groups Two groups are close if two animals have similar mandible measurements Distance measures for representation of similar measurements 2019/2/4 Copyright by Jen-pei Liu, PhD

Copyright by Jen-pei Liu, PhD Introduction Different types of measurements Example of 16 colonies of a butterfly species Two sets of distances environmental genetic Relationship between these two sets of distances 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Individual Observations N objects with p variables X1,…,Xp Object i: Xi1, Xi2, …, Xip Object j: Xj1, Xj2, …, Xjp Distance measures between object i and object j Graphical presentation of two or three variables 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Individual Observations 2019/2/4 Copyright by Jen-pei Liu, PhD

Copyright by Jen-pei Liu, PhD 2019/2/4 Copyright by Jen-pei Liu, PhD

Copyright by Jen-pei Liu, PhD 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Individual Observations Example: Dogs and Related Species Standardized Variables Group X1 X2 X3 X4 X5 X6 Modern Dogs -0.46 -0.46 -0.68 -0.69 -0.45 -0.57 Golden Jackals -1.41 -1.79 -1.04 -1.29 -0.80 -1.21 Chinese Wolf 1.78 1.48 1.70 1.80 1.55 1.50 Indian Wolf 0.60 0.55 0.96 0.69 1.17 0.88 Cuon 0.13 0.31 -0.04 0.00 -1.10 -0.37 Dingo -0.52 0.03 -0.13 -0.17 0.03 0.61 Prehistoric Dog -0.11 -0.12 -0.78 -0.34 -0.41 -0.83 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Individual Observations Euclidean Distances between Seven Canine Group Modern Golden Chinese Indian Prehistoric dog jackal wolf wolf Cuon Dingo dog Modern dog -- Golden jackal 1.91 -- Chinese wolf 5.38 7.12 -- Indian wolf 3.38 5.06 2.14 -- Cuon 1.51 3.19 4.57 2.91 -- Dingo 1.56 3.18 4.21 2.20 1.67 -- Prehistoric dog 0.66 2.39 5.12 3.24 1.26 1.71 -- 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples Information about populations Means Variances Covariances Measures Between populations Penrose distance Mahalanobis distance Between populations and samples 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples Penrose Distance Variables: X1,…,Xp The ith population means:1i,…, pi The ith population variances: v1i,…,vpi 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples The Mahalanobis distance: takes correlation into consideration 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples The Mahalanobis distance between the samples and population 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples Example: Distances between Egyptian skulls Penrose’s distance between sample 1 and sample 2 P12 = (137.37-132.37)2/(4x21.112) + (133.60-132.70)2/(4x23.486) + (99.17-99.07)2/(4x24.180) + (50.53-50.23)2/(4x10.154) =0.023 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples The inverse of sample covariance matrix 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples The Mahalanobis distance between sample 1 and sample 2 D122 = (137.37-132.37)0.0483(137.37-132.37) +(137.37-132.37)0.0011(133.60-132.70) +…+(50.53-50.23)(-0.0022)(99.17-99.07) +(50.53-50.23)0.1041(50.53-50.23) =0.091 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples Penrose Distances (1) (2) (3) (4) (5) Early predynastic (1) -- Late predynastic (2) 0.023 -- 12-13th dynastic (3) 0.216 0.163 -- Ptolemaic (4) 0.493 0.404 0.108 -- Roman (5) 0.736 0.583 0.244 0.066 -- 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances between Populations and Samples Mahalanobis Distances (1) (2) (3) (4) (5) Early predynastic (1) -- Late predynastic (2) 0.091 -- 12-13th dynastic (3) 0.903 0.729 -- Ptolemaic (4) 1.881 1.594 0.443 Roman (5) 2.697 2.176 0.911 0.219 -- 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances Based on Proportions Animals of a certain species might be classified into K genetic classes Class Colony 1 Colony 2 Difference 1 p1 q1 p1-q1 2 p2 q2 p2-q2 . . . . k pk qk pk-qk 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances Based on Proportions Distance Measures 2019/2/4 Copyright by Jen-pei Liu, PhD

Distances Based on Proportions Distance Measures 2019/2/4 Copyright by Jen-pei Liu, PhD

Presence-absence Data Presence and absence of two species at 10 locations Site 1 2 3 4 5 6 7 8 9 10 Species 1 0 0 1 1 1 0 1 1 1 0 Species 2 1 1 1 1 0 0 0 0 1 1 2019/2/4 Copyright by Jen-pei Liu, PhD

Presence-absence Data Species 2 Species 1 Present Absent Total Present a b a+b Absent c d c+d Total a+c b+d n 2019/2/4 Copyright by Jen-pei Liu, PhD

Presence-absence Data Distance Measures Simple matching index: (a+d)/n Ochiai index: a/[(a+b)(a+c)]1/2 Dice-Sorensen index: 2a/(2a+b+c) Jaccard index: a/(a+b+c) 2019/2/4 Copyright by Jen-pei Liu, PhD

The Mantel Randomization Test Detection of space and time clustering of disease – whether cases of a disease that occur close in space also tend to be close in time Two 4x4 distance matrices of 4 objects Symmetric matrices 2019/2/4 Copyright by Jen-pei Liu, PhD

The Mantel Randomization Test 2019/2/4 Copyright by Jen-pei Liu, PhD

The Mantel Randomization Test 2019/2/4 Copyright by Jen-pei Liu, PhD

The Mantel Randomization Test Mantel Test Whether the elements in M and E show some significant correlation Matching m12 with e12, m13 with e13, etc. 2019/2/4 Copyright by Jen-pei Liu, PhD

The Mantel Randomization Test M stay as it is Random order chosen for E Order of 3,2,4,1 2019/2/4 Copyright by Jen-pei Liu, PhD

The Mantel Randomization Test Time Distances (1) (2) (3) (4) (5) 4000 - 3000 B.C. (1) -- 2999 – 2000 B.C. (2) 0.70 -- 1999 - 1000 B.C. (3) 2.15 1.45 -- 999 - 0 (4) 3.80 3.10 1.65 -- A.D (5) 4.15 3.45 2.00 0.35 -- 2019/2/4 Copyright by Jen-pei Liu, PhD

The Mantel Randomization Test A total of 5!=120 ways to re-order the five samples A total elements in the randomization distribution in the correlation The observed correlation is 0.954 There are only two correlations greater than or equal to 0.954 The p-value (1-sided) = 2/120 = 0.017 2019/2/4 Copyright by Jen-pei Liu, PhD

Copyright by Jen-pei Liu, PhD Summary Euclidean distances Penrose distance Mahalanobis distance Distance measures for proportions Mantel test for similarity between distance matrices 2019/2/4 Copyright by Jen-pei Liu, PhD