1 Chapter 6 – Analysis of mapped point patterns This chapter will introduce methods for analyzing and modeling the spatial distribution of mapped point.

Slides:



Advertisements
Similar presentations
Spatial point patterns and Geostatistics an introduction
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 9 Inferences Based on Two Samples.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 7 Statistical Data Treatment and Evaluation
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Evaluating Hypotheses
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Continuous Random Variables and Probability Distributions
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Business Statistics - QBM117 Statistical inference for regression.
Tests of Hypothesis [Motivational Example]. It is claimed that the average grade of all 12 year old children in a country in a particular aptitude test.
Lecture II-2: Probability Review
Chapter 12 Section 1 Inference for Linear Regression.
Relationships Among Variables
Separate multivariate observations
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
ETM 607 – Random Number and Random Variates
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Continuous Probability Distributions  Continuous Random Variable  A random variable whose space (set of possible values) is an entire interval of numbers.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 9-2 Inferences About Two Proportions.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © Cengage Learning. All rights reserved.
Chapter 4 – Distance methods
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Question paper 1997.
Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
So, what’s the “point” to all of this?….
Point Pattern Analysis. Methods for analyzing completely censused population data F Entire extent of study area or F Each unit of an array of contiguous.
Correlation & Regression Analysis
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
Point Pattern Analysis
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Linear Correlation (12.5) In the regression analysis that we have considered so far, we assume that x is a controlled independent variable and Y is an.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Chapter 5 – Density estimation based on distances The distance measures were originally developed as an alternative to quadrat sampling for estimating.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
METU, GGIT 538 CHAPTER V MODELING OF POINT PATTERNS.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
Inference about the slope parameter and correlation
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
3. The X and Y samples are independent of one another.
Spatial Point Pattern Analysis
Presentation transcript:

1 Chapter 6 – Analysis of mapped point patterns This chapter will introduce methods for analyzing and modeling the spatial distribution of mapped point data in which the location of every individual in the population is known. Two types of analyses can be conducted with mapped point patterns: (1) detecting patterns (hypothesis test – if a pattern is at random, regular or aggregated distribution), and (2) model fitting (inference – e.g., fit point pattern models to an observed point pattern, see Chapter 7.) In this chapter, we will concentrate on the first type of analysis by introducing an important technique for detecting spatial patterns of mapped data.

2 Nearest-neighbor distribution functions: G(r) and F(r) The various distance methods presented in Chapter 4 only provide summary information on a spatial pattern at a particular distance (e.g., first nearest neighbor distance, etc.). We now present methods that actually describe the distribution of the nearest-neighbor distances, i.e., we model the nn distances by considering the distance as a random variable. G(r) is defined as a probability that the distance from a random chosen event to its nearest neighbor is less than or equal to r: The estimator is: where r i is the nn distance for a randomly chosen event i (i = 1, 2, …, n ), I(r i  r) is an indicator function, I(r i  r) = 1 if (r i  r) is true, 0 otherwise. F(r) is a probability that the distance from a random chosen point to its nearest neighbor is less than or equal to r, also called “empty space function”. It has exactly the same expression as G(r), but r in F(r) is a point-to-event distance.

3 More on G(r) and F(r) Under csr, it can be shown that G(r) has the form To judge how far the empirical is from the csr, a simulation envelope could be computed for based on, say, 100 realizations of s 1, s 2, …, s 652 from a uniform distribution in a study area (i.e., assume the 652 Douglas-firs follow the Poisson distribution). The estimator is calculated from each realization and for each distance r, the largest and smallest values define the simulation envelope. (The envelope is not shown in the figure here.) dist Gr× Douglas-fir (n = 652)

4 R package spatstat for point pattern analysis Developed by Adrian Baddeley and Rolf Turner The package supports: 1.creation, manipulation and plotting of point patterns 2.exploratory data analysis 3.simulation of point process models 4.parametric model-fitting 5.hypothesis tests and diagnostics The first thing to do for all these analyses is to create a ppp object! Use Douglas-fir data as example: > df.dat=subset(victoria.dat,victoria.dat$sp==“DF”) > df.ppp=ppp(df.dat$x,df.dat$y,c(0,103),c(0,87)) > df.ppp=ppp(df.dat$x,df.dat$y,window=owin(c(0,103),c(0,87)) > df.ppp=ppp(df.dat$x,df.dat$y,poly=list(x=c(0,50,60,0),y=c(0,0,60,50))) # ploygon window

5 Baddeley, A.J. & Gill, R.D Kaplan-Meier estimators of interpoint distance distributions for spatial point processes. Annals of Statistics 25: Regular 1 st nn dist Aggregated R implementation 1.Prepare Douglas-fir and Hemlock data into ppp format (df.ppp, hl.ppp) 2.df.G=Gest(df.ppp) 3.plot(df.G) 4.plot(envelope(df.ppp,fun=Gest)) #generate envelope. 5.The pointwise envelopes are not “confidence bands” for the true value of the function! The test is constructed by choosing a fixed value of r, and rejecting the null hypothesis if the observed function value lies outside the envelope at this value of r. This test has exact significance level alpha = 2nrank/(1 + nsim). nrank = the rank of the envelope value amongst the nsim simulated values.

6 K-function It is the most important function for quantifying mapped point pattern, proposed by Ripley in 1976, often called Ripley’s K-function. K-function is a second-moment measure as it is closely related to the second-order intensity of a stationary isotropic point process. It captures the spatial (in)dependence between different regions of the point process. Let’s first look at the 1st- and 2nd-order properties of a spatial point process. 1 st -order property: where A x is an infinitesimal region which contains point x. For a stationary process, (x) = constant. 2 nd -order property: For a stationary + isotropic process,  (x, y) =  (h), where h = |x-y|. * Ripley, B. D The second-order analysis of stationary point process. J. of Appl. Prob. 13:

7 Definition of K-function K-function is defined as K(h) = -1 E(# of other events within distance h of an arbitrary event). E(# of other events within distance h of an arbitrary event) = K(h).. h

8 The relationship between K-function and 2 (x, y) where 2 (r)/ is interpreted as the conditional intensity of an event at x given an event at 0, i.e., 2 (0, x)/. This intensity corresponds to the intensity at the point x conditional on that there is an event at 0. For a Poisson process,  (r) = 2, then K(h) =  h 2. Use as a null model for csr: K(h) >  h 2 suggests aggregated pattern. K(h) =  h 2 suggests random pattern. K(h) <  h 2 suggests regular pattern.. h

9 The properties of K(h) 1. For a Poisson process,  (r) = 2, then K(h) =  h 2. Use as a null model for csr: K(h) >  h 2 suggests aggregated pattern. K(h) =  h 2 suggests random pattern. K(h) <  h 2 suggests regular pattern. 2. K-function is invariant under random thinning. By “random thinning”, we mean that if each event of a process is retained or not according to a series of Bernoulli trials. This property means that the K-function of the resulting thinned process is identical to that of the original, unthinned process.

10 A simple estimator of K(h) h sisi sjsj h. Edge effect: Those point close to the edges will have less # of points with the h circle than those points far from the edges.

11 Toroidal unbiased estimator of K(h) Because of edge effect, the simple estimator is not very efficient and is biased. An alternative is the estimator based on toroidal correction. N + is the number of points that fall within ||s i – s j ||  h. Toroidal edge correction, use only for stationary + isotropic patterns h Misuse of toroidal edge correction for non-stationary patterns

12 Weighted unbiased estimator of K(h) Another unbiased estimator, initially proposed by Ripley (1976), gives more weight to those points near the boundaries. where the weight w(s i, s j ) is the proportion of the circumference of a circle centered at s i, passing through s j (s i must be within the study area). w(s i, s j ) = 1 if the circle entirely locates within the study area h sisi sjsj h.

13 Computing w(s i, s j ) Assume the study area is [0, a]  [0, b] and s i has coordinates s i = (x, y). Rewrite w(s i, s j ) = w(s i, h), h is the radius for a circle centered at s i. Denote d 1 = min(x, a-x), and d 2 = min(y, b-y); thus d 1 and d 2 are the distances from s i to the nearest vertical and horizontal edges of A. w(s i, h) is calculated as follows: 1.If h 2  d d 2 2 (circle intersects with both vertical and horizontal edges): 2.If h 2  d d 2 2 (circle intersects with one edge): h sisi. sisi. sisi.

14 Variance and simulation envelopes As it was mentioned earlier that a csr has K(h) =  h 2. It is usual to express K(h) as (*) This transformation stabilizes variance for the transformed K 0 (h), which is approximately: To judge how far the observed K-function deviates from the csr, a simulation envelope could be constructed based on, say, 99 realizations of s 1, s 2, …, s 982 from a uniform distribution in a study area (i.e., the 982 western hemlock trees follow the Poisson distribution). The K-function is calculated from each realization, and for each distance h the largest and smallest values define the simulation envelope.

15 R implementation Let’s model the distribution of the 982 western hemlocks. The spatstat program computes the transformed K(h) presented on previous page. >hl.kest=Kest(hl.ppp) # hl.ppp = is ppp object of sptatstat >plot(hl.kest) >plot(hl.kest$r,sqrt(hl.kest$iso/pi)-hl.kest$r) >hl.env=envelope(hl.ppp) >plot(hl.kest$r,sqrt(hl.kest$iso/pi)) >lines(hl.env$r,sqrt(hl.env$lo/pi),col=2) >lines(hl.env$r,sqrt(hl.env$hi/pi),col=2)

16 L-function In practice, K-function is usually displayed in L-function, defined as For an aggregated distribution For a random distribution For a regular distribution Examples: Douglas-fir (n = 652) L(h) Hemlock (n = 982) hh

17 g-function (pair correlation function) g-function is derivative of K-function, defined as Obviously, g-function describes how K-function changes with spatial distance lag h. K- function is a cumulative function which may accumulate confounding large scale (large h) effect with the effect of small scales (small h). g-function is said to be able to separate these effects. R implementation: >pcf(hl.ppp). h

18 Bivariate spatial point patterns A bivariate spatial point pattern consists of the locations of two types of events in a bounded study area A, e.g., the distributions of two tree species (Douglas-fir and western hemlock). It can be defined as {s j (i) : i = 1, 2; j = 1, 2, …} of type i (i = 1, 2) species at j th location. The two species may or may not be spatially independent. A natural working hypothesis is that the patterns of the two species are independent. However, it is worth to note that the independence does not necessarily guarantee the csr for each of the species. Similar to the univariate case, the K-function can be extended to the bivariate case to quantify the relationship between the two species, defined as K 12 (h) =  -1 E(# of type 2 events within distance h of an arbitrary type 1 event). If both species are at csr, K 12 (h) (= K 21 (h)) has a simple result K 12 (h) =  h 2.

19 An unbiased estimator For a given data, K 12 (h) and K 21 (h) can be respectively estimated as where w(s i (1), s j (2) ) is the proportion of the circumference of the circle with centre s j (1) and radius h that lies within the study region A h s i (1) s j (2)...

20 An estimator of variance reduction When the underlying process for both species are independent Poisson, Lotwick & Silverman (1982) show that the most efficient estimator is a linear combination Because for csr K 12 (h) =  h 2, we can similarly define an L-function: For an aggregated distribution For a random distribution For a regular distribution * Lotwick, H. W. & Silverman, B. W Methods for analysing spatial processes of several types of points. J. R. Stat. Soc. B, 44:

21 R implementation Let’s use Splus to compute K 12 (h) for redcedar and western hemlock. >victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp) >cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”) >plot(cdhl.kcross) Also see Kmulti >plot(Kmulti(victoria.ppp,victoria.ppp$marks=="CD",victoria.ppp$marks=="HL"))

22 Hemlock-redcedar Douglas fir-hemlock

23 Assignment: Compute bivariate L function for CD and HL of Victoria.dat > victoria.ppp=ppp(victoria.dat$x,victoria.dat$y,c(0,103),c(0,87),marks=victoria.dat$sp) > cdhl.kcross=Kcross(victoria.ppp,”HL”,”CD”) > plot(cdhl.kcross) > cdhl.env=envelope(victoria.ppp, Kcross, i="HL", j="CD") > cdhl.lfn=sqrt(cdhl.kcross$iso/pi)-cdhl.kcross$r > plot(cdhl.kcross$r, cdhl.lfn, ylim=c(-0.25,1.1), xlab="h", ylab="L function") > lines(cdhl.env$r, sqrt(cdhl.env$hi/pi)-cdhl.env$r, col="red") > lines(cdhl.env$r, sqrt(cdhl.env$lo/pi)-cdhl.env$r, col="blue")