Mahanalobis Distance Dr. A.K.M. Saiful Islam Source:

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Statistical Techniques I
Modeling species distribution using species-environment relationships Istituto di Ecologia Applicata Via L.Spallanzani, Rome ITALY
Objectives (BPS chapter 24)
QUANTITATIVE DATA ANALYSIS
Raster Data. The Raster Data Model The Raster Data Model is used to model spatial phenomena that vary continuously over a surface and that do not have.
Point estimation, interval estimation
Chapter 7 Sampling and Sampling Distributions
Multivariate Distance and Similarity Robert F. Murphy Cytometry Development Workshop 2000.
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Chapter 4 Probability Distributions
Lots of Ways to Measure Landscape Pattern (Hargis et al. 1997) Fig 9.1 here Amount of each class –Critical probability at point of percolation 50-65% of.
Chapter 6 Distance Measures From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Experimental Evaluation
Slide 1 Statistics Workshop Tutorial 7 Discrete Random Variables Binomial Distributions.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
Chapter 7 Probability and Samples: The Distribution of Sample Means
1. Normal Curve 2. Normally Distributed Outcomes 3. Properties of Normal Curve 4. Standard Normal Curve 5. The Normal Distribution 6. Percentile 7. Probability.
Chapter 11: Random Sampling and Sampling Distributions
The Multivariate Normal Distribution, Part 1 BMTRY 726 1/10/2014.
1 PREDICTION In the previous sequence, we saw how to predict the price of a good or asset given the composition of its characteristics. In this sequence,
Separate multivariate observations
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Hypothesis Testing:.
Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.
Intermediate Statistical Analysis Professor K. Leppel.
Principles of Pattern Recognition
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Beak of the Finch Natural Selection Statistical Analysis.
Measures of Variability Objective: Students should know what a variance and standard deviation are and for what type of data they typically used.
Geo597 Geostatistics Ch9 Random Function Models.
Yaomin Jin Design of Experiments Morris Method.
Chapter 26 Chi-Square Testing
The Semivariogram in Remote Sensing: An Introduction P. J. Curran, Remote Sensing of Environment 24: (1988). Presented by Dahl Winters Geog 577,
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.
Three Frameworks for Statistical Analysis. Sample Design Forest, N=6 Field, N=4 Count ant nests per quadrat.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
So, what’s the “point” to all of this?….
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
PCB 3043L - General Ecology Data Analysis.
381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
FUNCTIONS AND MODELS Exponential Functions FUNCTIONS AND MODELS In this section, we will learn about: Exponential functions and their applications.
CHAPTER – 1 UNCERTAINTIES IN MEASUREMENTS. 1.3 PARENT AND SAMPLE DISTRIBUTIONS  If we make a measurement x i in of a quantity x, we expect our observation.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 5 Probability Distributions 5-1 Overview 5-2 Random Variables 5-3 Binomial Probability Distributions.
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
Computer Graphics Lecture 07 Ellipse and Other Curves Taqdees A. Siddiqi
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
The normal distribution
Density Estimation Converts points to a raster
Estimating the Value of a Parameter Using Confidence Intervals
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
PCB 3043L - General Ecology Data Analysis.
Stochastic Hydrology Random Field Simulation
The normal distribution
Geology Geomath Chapter 7 - Statistics tom.h.wilson
The Multivariate Normal Distribution, Part I
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Test 2 Covers Topics 12, 13, 16, 17, 18, 14, 19 and 20 Skipping Topics 11 and 15.
MGS 3100 Business Analysis Regression Feb 18, 2016
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Mahanalobis Distance Dr. A.K.M. Saiful Islam Source:

Mahanalobis Distance Mahalanobis distances provide a powerful method of measuring how similar some set of conditions is to an ideal set of conditions, and can be very useful for identifying which regions in a landscape are most similar to some “ideal” landscape. For example, in the field of wildlife biology we might define an “ideal” landscape as that which best fits the niche of some wildlife species. Through observation, we may find that a wildlife species typically occurs within a particular elevation range, on slopes of a particular steepness, and perhaps within a certain vegetation density. Using Mahalanobis distances, we can quantitatively describe the entire landscape in terms of how similar it is to the ideal elevation, slope and vegetation density of that animal.

Moreover, Mahalanobis distances are based on both the mean and variance of the predictor variables, plus the covariance matrix of all the variables, and therefore take advantage of the covariance among variables. The region of constant Mahalanobis distance around the mean forms an ellipse in 2D space (i.e. when only 2 variables are measured), or an ellipsoid or hyperellipsoid when more variables are used.

Mahalanobis distances are calculated as:

For example, suppose we took a single observation from a bivariate population with Variable X and Variable Y, and that our two variables had the following characteristics:

If, in our single observation, X = 410 and Y = 400, we would calculate the Mahalanobis distance for that single value as: Therefore, our single observation would have a distance of standardized units from the mean (mean is at X = 500, Y = 500).

If we took many such observations, graphed them and colored them according to their Mahalanobis values, we can see the elliptical Mahalanobis regions come out. For example, the cloud of data points to the right are randomly generated from the bivariate population described above:

If we calculate Mahalanobis distances for each of these points and shade them according to their distance value, we see clear elliptical patterns emerge:

We can also draw actual ellipses at regions of constant Mahalanobis values One interesting feature to note from this figure is that a Mahalanobis distance of 1 unit corresponds to 1 standard deviation along both primary axes of variance

Chi-Square P-values: Mahalanobis distances are occasionally converted to Chi-square p-values for analysis (see Clark et al. 1993). When the predictor variables are normally distributed, the Mahalanobis distances do follow the Chi-square distribution with n-1 degrees of freedom (where n = # of habitat variables; 2 in the example above). However, Farber and Kadmon (2003) warn that wildlife habitat variables often fail to meet the assumption of normality. In cases where the predictor variables are not normally distributed, the conversion to Chi-square p-values serves to recode the Mahalanobis distances to a 0-1 scale. Mahalanobis distances themselves have no upper limit, so this rescaling may be convenient for some analyses.Clark et al Farber and Kadmon (2003)

In general, the p-value reflects the probability of seeing a Mahalanobis value as large or larger than the actual Mahalanobis value, assuming the vector of predictor values that produced that Mahalanobis value was sampled from a population with an ideal mean (i.e. equal to the vector of mean predictor variable values used to generate the Mahalanobis value). P-values close to 0 reflect high Mahalanobis distance values and are therefore very dissimilar to the ideal combination of predictor variables. P-values close to 1 reflect low Mahalanobis distances and are therefore very similar to the ideal combination of predictor variables. The closer the p-value is to 1, the more similar that combination of predictor values is to the ideal combination.

Applications to Landscape Analysis: A nice feature of ArcView Spatial Analyst is that we can use actual grids in the Mahalanobis Distance equation rather than numbers, so we can input a vector of habitat grids in place of the vector of input values. We still need the vector of mean values and the covariance matrix, but Spatial Analyst will treat each of these values as an individual landscape-scale grid of that value, and therefore the mathematical functions in Spatial Analyst will work correctly and produce a final grid of Mahalanobis values. Due to a limitation in Spatial Analyst, however, we are limited to 8 input grids for this analysis. Spatial Analyst v. 9 is supposed to fix this limitation. For example, suppose we have a grid of elevation values and a grid of slope values, and we are interested in identifying those regions on the landscape that have similar slopes and elevations to a mean slope and elevation preferred by some species of interest. Furthermore, we want to analyze the slope and elevations in combination so that if our species likes steep slopes at low elevations but shallow slopes at high elevations, then we won’t inadvertently select steep slopes at high elevations or shallow slopes at low elevations.

Additional Reading: The author recommends Clark et al. (1993), Knick & Dyer (1997), and Farber & Kadmon (2002) for a few good papers illustrating the use of Mahalanobis distances in ecological applications. For anyone interested in the details of matrix algebra and computational/statistical algorithms, the author recommends Conover (1980), Neter et al. (1990), Golub and Van Loan (1996), Draper and Smith (1998), Meyer (2000) and Press et al. (2002).Clark et al. (1993) Knick & Dyer (1997)Farber & Kadmon (2002)Conover (1980)Neter et al. (1990)Golub and Van Loan (1996)Draper and Smith (1998)Meyer (2000)Press et al. (2002)