METU, GGIT 538 CHAPTER V MODELING OF POINT PATTERNS.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Chapter 10: The t Test For Two Independent Samples
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Evaluating Hypotheses
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Chapter 11 Multiple Regression.
Experimental Evaluation
Inferences About Process Quality
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Chapter 9 Hypothesis Testing.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
5-3 Inference on the Means of Two Populations, Variances Unknown
Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson6-1 Lesson 6: Sampling Methods and the Central Limit Theorem.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Lecture II-2: Probability Review
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Hypothesis Testing.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Chapter 9 Hypothesis Testing: Single Population
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Chapter 10 Hypothesis Testing
1 Hypothesis testing can be used to determine whether Hypothesis testing can be used to determine whether a statement about the value of a population parameter.
Theory of Probability Statistics for Business and Economics.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
1 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
Chapter 4 – Distance methods
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Point Pattern Analysis Point Patterns fall between the two extremes, highly clustered and highly dispersed. Most tests of point patterns compare the observed.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
© Copyright McGraw-Hill 2004
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Example The strength of concrete depends, to some extent on the method used for drying it. Two different drying methods were tested independently on specimens.
METU, GGIT 538 CHAPTER IV ANALYSIS OF POINT PATTERNS.
Chapter 10: The t Test For Two Independent Samples.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 9 -Hypothesis Testing
3. The X and Y samples are independent of one another.
Modeling and Simulation CS 313
Statistics for Business and Economics (13e)
Chapter 9 Hypothesis Testing.
CONCEPTS OF ESTIMATION
Chapter 9 Hypothesis Testing.
Spatial Point Pattern Analysis
Presentation transcript:

METU, GGIT 538 CHAPTER V MODELING OF POINT PATTERNS

METU, GGIT Introduction 4.2. Case Studies 4.3. Visualizing Spatial Point Patterns 4.4. Exploring Spatial Point Patterns Quadrat Methods Kernel Estimation Nearest Neighbor Distance The K Function OUTLINE(Last Week) ANALYSIS OF POINT PATTERNS

METU, GGIT 538 OUTLINE MODELING OF POINT PATTERNS 5.1. Complete Spatial Randomness (CSR) 5.2. Simple Quadrat Tests for CSR 5.3. Nearest Neighbor Tests for CSR Testing for CSR Based on Various Summary Statistics Testing for CSR Based on Distribution Function 5.4. The K Function Tests for CSR

METU, GGIT Introduction The exploratory analyses in most of the cases may be insufficient and it may be required to go further to consider the explicit tests of various hypotheses or construct specific models to explained the observed point pattern. The term modeling refers to statistically comparing various summary measures computed from the observed distribution of events, which leads to designing and testing hypotheses. The common model used is “complete spatial randomness (CSR)”.

METU, GGIT 538 Reasons for Testing Against CSR  Rejection of CSR is a prerequisite for any serious attempt to model an observed pattern  Tests are used to explore a set of data and assist in the formulations of alternatives to CSR  CSR operates as a dividing hypothesis between regular and clustered patterns

METU, GGIT Complete Spatial Randomness (CSR) CSR is a standard model and states that the events follow a homogeneous Poisson Process over the study region. In this model, point pattern is considered to be number of events occurring in arbitrary sub-regions or areas, A, of the whole study region R.  Spatial point process is defined by: Where; Y(A) is the number of events occurring in the area A.

METU, GGIT 538 A hypothesis of complete spatial randomness for a spatial point pattern {Y(A), A Є R} asserts that:  The number of events in any planar region with area A follows a Poisson distribution with mean λA.  Given n events in A, the events are an independent random sample from a uniform distribution on A  implies constant intensity – no first order effects  implies no spatial interaction

METU, GGIT 538 In other words: 1.Any event has an equal probability of occurring at any position in R. 2.The position of any event is independent of the position of any other, i.e. events do not interact with one another

METU, GGIT 538 Therefore, by simulating n events from such a process by enclosing R in a rectangle, i.e. generating events with x coordinates from a uniform distribution on (x 1,x 2 ) and y coordinates from a uniform distribution on (y 1,y 2 ), the observed pattern of points can be compared with the simulated ones based on CSR. i.e. CSR represents a baseline hypothesis against which to assess whether observed patterns are regular, clustered or random.

METU, GGIT Simple Quadrat Tests for CSR The quadrat counts can be tested for CSR by using the so called index of dispersion test. Let (x 1,…,x m ) be the counts of the number of events in m quadrats, either randomly scattered in R or forming a regular grid covering the whole of R. Then randomness can be tested based on the idea that if these counts follow a Poisson distribution, it is expected to achieve equal mean and variance of the counts (variance mean ration). H 0 : Point pattern is random and λ = s 2

METU, GGIT 538 When the test is applied to particular set of observations, the number of points and grid-squares are fixed, consequently the mean will be constant irrespective of whether the points are clustered, random or regular. It is therefore differences in the variance that indicate the nature of the point pattern.  If the VMR is significantly greater than 1.0 then clustering of the points is indicated whereas value lower than 1.0 denotes regularity.

METU, GGIT 538 is called index of dispersion (I) and is called index of cluster size (ICS) The index of dispersion test is advantageous since it can be applied in conjunction with the sampling of point patterns. In this case m quadrats will be randomly scattered in R and events exhaustively counted on each quadrat. Such a sampling scheme can be applied to estimate the intensity, λ of the events in R. E(ICS) = 0  CSR E(ICS) > 0  Clustering (extra events) E(ICS) < 0  Regularity (insufficient events)

METU, GGIT 538 The test statistic for I is defined as follows: Where; = Mean observed counts s 2 = Observed variance of the counts m = Number of grids Under CSR the theoretical chi-square distribution is: for m > 6 and > 1

METU, GGIT 538 Properties of Quadrat Tests for CSR  Under CSR the test statistic I is distributed as  Compare test statistic I with percentage points of  Significantly large values indicate clustering  Significantly small values indicate regularity

METU, GGIT 538 # of events/qua drat (n) # of quadrats with n events (q) Total # of events in quadrats (X) X2X su m

METU, GGIT 538 If λ is assumed to be constant and CSR holds the estimate of λ is given by: Where Q is the area of each quadrat. Then the 95 % confidence interval of λ can be estimated by: Where;

METU, GGIT 538 Problems Encountered 1.Problem of overlapping quadrats: If randomly scattered quadrats are to be used, they may overlap each other and produce a problem if occurs frequently, since the x i counts will not be independent. This can be overcome by using a sampling scheme that guaranties disjoint quadrats. 2.Problem of overlapping quadrats with the edge of R: If the quadrats overlap with the edge of R, introduction of a guard area inside the perimeter of R can be a solution. In this case only the quadrats randomly scatter throughout that part of R which is not in the guard area, allowing events in the guard area to be counted as in any quadrats which overlap into this area.

METU, GGIT Problem of choosing appropriate quadrat size: An empirical suggestion is to aim for a mean quadrat count of about Problem of quadrat position: Usually no account is taken care of the relative position of quadrats or the relative position of events within a quadrat. One common method to consider the relative position of quadrats is called Greig-Smith Procedure, which is given by: a.Calculate the variance of quadrat counts for the original grid b.Divide the grids into sub-grids each formed by successive combination of adjacent quadrats in the original grid into blocks of increasing size c.Plot the variance estimates at each block size, where the peaks and troughs indicate evidence of scale of pattern

METU, GGIT 538 Table 5.1. Available Indexes for testing CSR

METU, GGIT Nearest Neighbor Tests for CSR In order to test for CSR in nearest neighbor distances, the cumulative distributions of G(w) and F(x) must be known when dealing with any specific area. However, it is usually impossible to know G(w) and F(x) due to the edge effects, since they depend of the particular shape of R. On the other hand, it is possible to derive theoretical distribution results for W and X if the edge effects are ignored. There are two ways for testing for CSR in nearest neighbor distances: Testing based on various summary statistics Testing based on distribution function

METU, GGIT Testing for CSR Based on Various Summary Statistics Let the mean density of events / unit area be λ. If CSR holds, events are independent and the number of events in any area is Poisson distributed.  Probability that no events fall within a circle of radius x around any randomly chosen point is:  The distribution function F(x) of nearest neighbor point- event distances X for CSR is given by:, This implies that πX 2 follows an exponential distribution with parameter λ. i.e. 2πλX 2 is distributed as.

METU, GGIT 538 Then it may be deduced that: If X 1,…X n are independent nearest neighbor distances then is distributed as.

METU, GGIT 538 The same arguments apply to the nearest neighbor event- event distances for CSR process. i.e. Under CSR, the distribution function G(w) is:, E(W) and VAR(W) are the same for X. Now it is possible to derive sampling distributions under CSR of various summary statistics of the observed nearest neighbor distances.

METU, GGIT 538 Distribution theory for these tests is based on the assumption that n nearest neighbor measurements randomly sampled from the study region R is independent. This assumption of independence may be violated in case of small numbers of events and if the proportion of them used is large.

METU, GGIT 538 Basic Assumption: 1.The nearest neighbor distances used to compute the summary statistics must be independently sampled from the study region. Therefore independence is assured for large number of events.  Rule of thumb: The number m, of the nearest neighbor measurements sampled should be where n is the total number of events. !!!Remark: The general effect of lack of independence is that the test statistics will have a large variance than their theoretical values under independence. This implies that the standard test may show significant departure from CSR, which would not be so is the dependence is not taken into account.

METU, GGIT 538 There are various tests suggested to detect departures from CSR based on summary statistics of m randomly sampled nearest neighbor event-event distances (w 1,…,w m ) or point-event distances (x 1,…,x m ). The most commonly used are:  Clark-Evans  Hopkins  Byth and Ripley 2.The nearest neighbor distances used to compute the summary statistics have not been biased by edge effects.

METU, GGIT 538 Clark-Evans: It compares with percentage points of the distribution: Basic Properties: The test is based on event-event distances It requires enumerated point pattern to be available, from which events can be randomly sampled and their nearest neighbor distances determined. λ is unknown and needs to be replaced by appropriate estimate, which is λ = n/R (n is the number of events in R). If an estimate of λ is used it is desirable to use all n event-event distances, if possible, rather than a sample of m of them.

METU, GGIT 538 For the case m = n Where P is the perimeter of the study region which has area A.

METU, GGIT 538

Basic Properties:  The test requires complete enumeration of all n events in the study region since it uses w i, so that event-event distances can be randomly sampled.  The above rule can be relaxed an it can be applied in conjunction with sampling of point patterns if a “semi-systematic” sampling scheme is employed, whereby a regular grid of study points for calculating point-event distances x i. Hopkins: It compares with percentage points of the distribution. The physical implication of the test is that in clustered patterns the point-event distances x i will be large relative to event-event distances w i, vice versa in a regular pattern.

METU, GGIT 538 Byth & Ripley: It compares with percentage points:, where x i values are randomly paired with the w i values.

METU, GGIT 538 Table 5.1. Available statistics for testing CSR in nearest neighbor distances

METU, GGIT Testing for CSR Based on Distribution Function Looking at the complete estimated distribution function of W or X rather than just a single statistic is another alternative for testing CSR. The basic question is: ? Can we construct a formal method for comparing the whole of the distribution function with its theoretical form under CSR? The theoretical distributions for G(w) and F(x) under CSR are:

METU, GGIT 538 Then the plots of the theoretical distributions G(w) and F(x) are compared with the estimated and. Here there is still no formal way of assessing the significance of differences in the plots. A more satisfactory approach is to compare the estimated functions with a simulation estimate of their theoretical distributions.

METU, GGIT 538 The simulation estimate for G(w) under CSR is calculated as: Where; = Empirical distribution functions each of which is estimated from one of m independent simulations of n events under CSR (i = 1, …, m). i.e. n events independently and uniformly distributed in R.

METU, GGIT 538 For the purposes of assessing the significance of departures between the simulated CSR distribution, and that is actually observed, it is also necessary to define upper and lower simulation envelopes:

METU, GGIT 538 When is plotted against and U(w) and L(w) are added to the plot:  If the data are compatible with CSR  the plot vs should be roughly linear and at 45°.  If the clustering is present the plot will lie above the line.  If the regularity is present the plot will lie under the line.

METU, GGIT 538 U(w) and L(w) will help to assess the significance of departures from 45° line in the plot since they have the following property: This also indicates the required number of simulations in order to detect departure at a specified significance level.

METU, GGIT The K Function Tests for CSR Under CSR the expected number of events within a distance of h of a randomly chosen event is: Hence theoretically under CSR:

METU, GGIT 538 Hence the estimated K function from the observed data,, is compared with the theoretical one. One way of doing this is comparing theoretical value with the plot of against h Positive peaks  Clustering Negative troughs  Regularity

METU, GGIT 538 The formal assessment of the significance of observed peaks and troughs requires knowledge of sampling distribution of and under CSR. This is unknown and complex because of the edge corrections built into. However, it is possible to use an analogous approach to that used for nearest neighbor distances.

METU, GGIT 538 The method involves:  Obtaining a simulation estimate of the sampling distributions  Constructing upper and lower simulation envelopes:

METU, GGIT 538  Plotting vs h together with plots of and enveloped  Assessing the significance of peaks troughs on the basis of:

METU, GGIT 538 Alternate Models to CSR  For clustered patterns  First order effects only:  Heterogeneous Poisson Process  Cox Process  Second order effects only:  Poisson Cluster Process  For regular patterns  Simple Inhibition Process  Markov Point Processes  Either  Markov Point Processes