Pattern Statistics Michael F. Goodchild University of California Santa Barbara
Outline n Some examples of analysis n Objectives of analysis n Cross-sectional analysis n Point patterns
What are we trying to do? n Infer process –processes leave distinct fingerprints on the landscape –several processes can leave the same fingerprints enlist time to resolve ambiguity invoke Occam's Razor confirm a previously identified hypothesis
Alternatives n Expose aspects of pattern that are otherwise invisible –Openshaw –Cova n Expose anomalies, patterns n Convince others of the existence of patterns, problems, anomalies
Cross-sectional analysis n Social data collected in cross-section –longitudinal data are difficult to construct –difficult for bureaucracies to sustain –compare temporal resolution of process to temporal resolution of bureaucracy n Cross-sectional perspectives are rich in context –can never confirm process –though they can perhaps falsify –useful source of hypotheses, insights
What kinds of patterns are of interest? n Unlabeled objects –how does density vary? –do locations influence each other? –are there clusters? n Labeled objects –is the arrangement of labels random? –or do similar labels cluster? –or do dissimilar labels cluster?
First-order effects n Random process (CSR) –all locations are equally likely –an event does not make other events more likely in the immediate vicinity n First-order effect –events are more likely in some locations than others –events may still be independent –varying density
Second-order effects n Event makes others more or less likely in the immediate vicinity –clustering –but is a cluster the result of first- or second- order effects? –is there a prior reason to expect variation in density?
Testing methods n Counts by quadrat –Poisson distribution
Deaths by horse-kick in the Prussian army n Mean m = 0.61, n = 200 Deaths per yr01234 Probability Number of years expected Number of years observed
Towns in Iowa n 1173 towns, 154 quadrats 20mi by 10mi Chisquare with 8 df = 12.7 Accept H 0
Distance to nearest neighbor n Observed mean distance r o n Expected mean distance r e = 1/2 d –where d is density per unit area n Test statistic:
Towns in Iowa n 622 points tested n 643 per unit area n Observed mean distance 3.52 n Expected mean distance 3.46 n Test statistic 0.82 n Accept H 0
But what about scale? n A pattern can be clustered at one scale and random or dispersed at another n Poisson test –scale reflected in quadrat size n Nearest-neighbor test –scale reflected in choosing nearest neighbor –higher-order neighbors could be analyzed
Weaknesses of these simple methods n Difficulty of dealing with scale n Second-order effects only –density assumed uniform n Better methods are needed
K-function analysis n K(h) = expected number of events within h of an arbitrarily chosen event, divided by d n How to estimate K? –take an event i –for every event j lying within h of i: score 1
Allowing for edge effects score < 1
The K function n In CSR K(h) = h 2 n So instead plot:
What about labeled points? n How are the points located? –random, clustered, dispersed n How are the values assigned among the points? –among possible arrangments –random –clustered –dispersed
Moran and Geary indices