Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inferential Spatial Statistics: Introduction to Concepts

Similar presentations


Presentation on theme: "Inferential Spatial Statistics: Introduction to Concepts"— Presentation transcript:

1 Inferential Spatial Statistics: Introduction to Concepts
Population Infer Sample Today: Review standard statistical inference. Examine the concept of Spatial Randomness. Define a random point pattern. Next Time Using inferential spatial statistics to analyze point patterns Briggs Henan University 2010

2 Spatial Analysis: successive levels of sophistication
Spatial data description: classic GIS capabilities Spatial queries & measurement, buffering, map layer overlay Exploratory Spatial Data Analysis (ESDA): searching for patterns and possible explanations GeoVisualization through data graphing and mapping Descriptive spatial statistics Spatial statistical analysis and hypothesis testing Are data “to be expected” or are they “unexpected” relative to some statistical model, usually of a random process Spatial modeling or prediction Constructing models (of processes) to predict spatial outcomes (patterns) Briggs Henan University 2010

3 Descriptive & Inferential Statistical Analysis
Last time we discussed descriptive statistics for spatial analysis Concerned with obtaining summary measures to describe a set of data For example, the mean and the standard deviation, the centroid and the standard distance This time we will discuss inferential statistics begin by reviewing standard (non-spatial) inferential statistics then look at inferential spatial statistics Briggs Henan University 2010

4 Standard Statistical Inference:
Inferential statistics Concerned with making inferences: from a sample(s) about a population(s) from observed patterns about underlying processes I hope you are already familiar with standard (non-spatial) inferential statistics. I will quickly review the main ideas. Briggs Henan University 2010

5 Populations and Samples
Sample: a part (subset) of the population for which we have data. The sample is used to make inferences about the population. Population: all occurrences of a particular phenomena You are a sample of the population of all people in the world. Infer We draw conclusions about the population from the sample. Briggs Henan University 2010

6 From Lecture #2 on Spatial Analysis Process, Pattern and Analysis
Often, we cannot observe the process, so we have to infer the process by observing the pattern From the sample, we infer the process in the population. Create Processes Patterns Population Infer Sample Briggs Henan University 2010

7 The Importance of the Sample
It depends upon the sample! If we get sample, the conclusions are good. Sample is representative of the population If we get sample, the conclusions are not good. Sample is a not representative of the population. How “good “ (or “accurate” or “true”) are our inferences or conclusions? Briggs Henan University 2010

8 The Requirement of a Random Sample
All statistical inference is based on the assumption (requirement) that you have a random sample What is a random sample? A sample chosen such that every member of the population has an equal chance (probability) of being included Doesn’t guarantee a representative sample Could be really unlucky and get

9 statistics are estimates of parameters
Some Definitions Sample Subset of population for which we have data Statistics Numbers calculated from the sample Population All occurences Parameters Numbers calculated from the population statistics are estimates of parameters We can calculate the statistic because we have data for samples. We cannot calculate the parameter because we do not have data for entire population. Briggs Henan University 2010

10 Example: Are girls more intelligent than boys?
Sample of boys IQ* = 115 Sample of girls IQ* = 130 *IQ = Intelligence Quotient Ha! Ha! Girls are more intelligent than boys. Here is the proof! No! No! It depends on the samples we have. The sample statistics are different, but the population parameters may be the same! Who is correct? Briggs Henan University 2010

11 Briggs Henan University 2010
How do we decide who is correct? The Null Hypothesis and the Alternative Hypothesis Assume that in the population the average (mean) IQ of girls is the same as the average IQ of boys This is called the Null Hypothesis: --there is no difference between girls and boys in the population The Alternative Hypothesis: --in the population, girls are smarter than boys Briggs Henan University 2010

12 Choosing between Null and Alternative
In our two samples: The difference between the sample means was 15 Ask the question: if the population means are the same, how probable is it that, from sampling variation alone, I would get a difference of 15 points between sample means? If this is reasonable probable (or likely), accept the Null Hypothesis If this is highly improbable (highly unlikely), reject the Null and accept the Alternative Hypothesis Briggs Henan University 2010

13 Briggs Henan University 2010
How do I calculate the probability of getting a difference of 15? We use the sampling distribution. What is this? Briggs Henan University 2010

14 All girls All boys Random samples Random samples
(the population of girls) All boys (the population of boys) Random samples Random samples For every pair of samples, calculate the mean of each, and then the difference between these means. Briggs Henan University 2010

15 The Sampling Distribution
If we have a thousand sample pairs, we have a thousand values for We can draw a frequency distribution showing how often or frequently different values occur The sampling distribution is simply the frequency distribution for some value calculated each time from many, many, many samples. The calculated value is called the test statistic -1.96 2.5% 1.96 Briggs Henan University 2010

16 Using the Sampling Distribution
-1.96 2.5% 1.96 Here, a sample difference of 15 is quite likely: Conclusion: Accept the Null. Boys and Girls are the same 15 15 Here, a sample difference of 15 is very unlikely: Conclusion: Reject the Null Accept the Alternative Girls are smarter than boys The probability should be less than 5% (.05) to reject the null hypothesis. This probability is called the statistical significance of the test. Briggs Henan University 2010

17 Calculating a Test Statistic
To find the exact probability of getting a difference of 15 between the girls and boys we calculate a test statistic a test statistic is: a number, calculated from a sample statistic, whose sampling distribution is known That is, we know the shape of the frequency distribution of the test statistic when multiple samples are taken In the case of the difference between two sample means the test statistic is: Note: test statistics always have “degrees of freedom” which are calculated from the sample size (N) It is a Normal Frequency Distribution if the sample sizes are greater than 30. S2g =variance for girls S2b =variance for boys

18 Test Statistic for Normal Frequency Distribution
-1.96 2.5% 1.96 To reject the Null Hypothesis, the Z test statistic should have a value greater than (or less than -1.96). There is less than a 5% chance that, in the population, the means are the same. Conclusion: Reject the Null Accept the Alternative Girls are smarter than boys Briggs Henan University 2010

19 Standard Error: Standard Deviation of the Sampling Distribution
2.5% -1.96 1.96 Smaller standard error Test statistic for the difference between two means: Larger standard error Standard error for the difference between two means Standard error very important Approximately, it tells you how far, on average, the sample statistic is away from the population parameter Thus, it is a measure of sampling variability or error The larger the standard error, the more difficult it is to reject the Null Hypothesis Briggs Henan University 2010

20 Briggs Henan University 2010
Reporting the Results of a Statistical Significance Test: many ways to say the same thing! When we use a test statistic and its sampling distribution we say that we are conducting a statistical significance test We reject the null hypothesis if there are less than 5 chances in 100 that it is true We say the results are “statistically significant at the 5% level” Or we say the results are “significant at the 95% confidence level” Briggs Henan University 2010

21 The Normal or Gaussian Probability Distribution.
This is the sampling distribution for tests involving differences between means. Why is it this shape? -1.96 2.5% 1.96 If the null hypothesis is true, what would be the average value of the differences between the sample means? It would be zero (0) We expect many small difference values and few big differences Values would be concentrated around mean We expect as many negative differences as positive differences Symmetrical—same on each side of the mean Briggs Henan University 2010

22 How do we find the Sampling Distribution and Test Statistic?
Two methods: By mathematical theory: test statistics and sampling distributions already known through theory common distributions are Z (Normal), Chi-square, and F distributions By computer simulation The computer is used to “simulate” multiple samples, and we use these to draw a frequency distribution As with our “boys and girls” example Very common in spatial statistics Briggs Henan University 2010

23 Spatial Statistical Inference
Briggs Henan University 2010

24 Spatial Statistical Inference: Null and Alternative Hypotheses
Null Hypothesis: The spatial pattern is random IRP/CSR: independent random process/complete spatial randomness Alternative Hypothesis: The spatial pattern is not random It may be clustered or dispersed Briggs Henan University 2010

25 What do we mean by spatially random?
UNIFORM/DISPERSED CLUSTERED Random: a point is equally likely to occur at any location, and the position of a point is not affected by the position of any other point. Uniform: every point is as far from other points as possible: “likely to be distant” Clustered: every point is close to other points: “likely to be close”

26 Is it Spatially Random? Difficult to know!
Fact: Two times as many people sit “on the corners” rather than opposite at tables in a restaurant Conclusion: psychological preference for nearness In actuality: an outcome to be expected from a random process: two ways to sit opposite, but four ways to sit on the corners From O’Sullivan and Unwin p.69 Briggs Henan University 2010

27 High Peak district biomass index: ratio of remotely sensed data spectral bands B3 and B4
Spatially clustered Geographically random

28 Why Processes differ from Random
Processes differ from random in two primary ways Variation in the study area Diseases cluster because people cluster (e.g. cancer) Cancer cases cluster ‘cos chemical plants cluster First order effect Interdependence of the points themselves Diseases cluster ‘cos people catch them from others who have the disease (colds) Second order effect In practice, it is very difficult to distinguish these two effects merely by the analysis of spatial data Briggs Henan University 2010

29 Briggs Henan University 2010
Bank Robberies—First Order or Second Order effect? Bank robberies are clustered First order--because banks are clustered Bank robbery Banks Bank Robberies In lecture on Spatial Analysis we called this the effect of “non-uniformity of space” Could there also be a second order effect? Briggs Henan University 2010

30 Briggs Henan University 2010
Remember our data on software and telecommunications industries in Dallas? We can think of this data as a sample. We can use statistical inference to test if the spatial pattern is clustered, or “random” (no pattern) We will look at the actual tests later. Briggs Henan University 2010

31 Spatial Statistical Hypothesis Testing: Simulation Approach
Because of the complexity of spatial processes, it is often difficult to derive theoretically a test statistic with known probability distribution Instead, we often use computer simulations We take multiple samples from a random spatial pattern, the spatial statistic we are using is calculated for each sample, and then a frequency distribution is drawn This simulated sampling distribution is used to measure the probability of obtaining our actual observed spatial statistic Empirical frequency distribution from 500 random patterns (“samples”) Our observed value: --highly unlikely to have occurred if the process was random --conclude that process is not random

32 Software for Spatial Statistics
ArcGIS 9 The most common GIS Software, but $$$$! Spatial Statistics Tools for point and polygon analysis Spatial Analyst tools for density kernel GeoStatistical Analyst Tools for interpolation of continuous surface data CrimeStat III download from Standalone package, free for government and education use Calculates values for spatial statistics but no GIS graphics Good documentation and explanation of measures and concepts OpenGeoDA, Geographic Data Analysis by Luc Anselin now at Arizona State Download from: Runs on Vista and Windows 7 (also MAC and UNIX) Earlier version called GeoDA runs only on XP (0.9.5i_6) Easy to use and has good graphic capabilities R Open Source statistical package, originally on UNIX but now has MS Windows version Has the most extensive set of spatial statistical analyses Difficult to use Need to learn it if you are going to do major work in this area S-Plus the only commercial statistical package with extensive support for spatial statistics Briggs Henan University 2010

33 Briggs Henan University 2010
References O’Sullivan and Unwin Geographic Information Analysis New York: John Wiley, 1st ed. 2003, 2nd ed. 2010 Jay Lee and David Wong Statistical Analysis with ArcView GIS New York: Wiley, 1st ed (all page references are to this book), 2nd ed. 2005 Unfortunately, these books are based on old software (Avenue scripts used with ArcView 3.x) and no longer work in the current version of ArcGIS 9 or 10. Ned Levine and Associates CrimeStat III Washington: National Institutes of Justice, 2010 Available as pdf download from: Arthur J. Lembo at (no longer active) Briggs Henan University 2010

34 Next time: Inferential Statistics for Point Pattern Analysis
Briggs Henan University 2010

35 Briggs Henan University 2010

36 Software for Spatial Statistics: Examples
Planned as a separate lecture …but we couldn’t meet last Friday …so I will look as some examples after today’s lecture, and again after the next lecture Briggs Henan University 2010

37 1. Using ArcGIS to find the Population Centroid of China
Open ArcGIS Add data files: China.shp and ChinaProvinceData.xls Join ChinaProvinceData.xlx to China,shp Right click China and select Joins .. Use GMI_Admin as join field Open ArcToolbox by clicking on Go to Spatial Statistics Tools>Measuring Geographic Distribution>Mean Center Input Feature Class: China Output: China_MeanCenter.shp Weight Field: Population 2008 Note the warning: we should have projected data first! WARNING : The input feature class does not appear to contain projected data. It is in south Henan province! Briggs Henan University 2010

38 Briggs Henan University 2010
2. Calculate Population Centroid using a Spreadsheet Program (e.g. Excel) Make a copy of ChinaProvinceData.xls and open this copy ChinaProvinceData Copy.xls It contains Centroids for each province obtained from GeoDA. (You need the very expensive ArcInfo version to get centroids for all polygons from ArcGIS and I do not have it!) Calculate: XCentroid * Weight (Population 2008), and then Sum YCentroid * Weight (Population 2008), and then Sum Divide each sum by the Sum of the Weights (Total Population 2008). These are the X and Y coordinates for the China Population Centroid Copy these values into a new worksheet, and create a very simple data table ID X Y Save spreadsheet and close Excel. Read this table into ArcGIS Right click on table name and select Display XY Data This displays X, Y coordinates from a table on the map. The results are very similar to the value calculated by ArcGIS itself! Briggs Henan University 2010

39 Briggs Henan University 2010
3. Use ArcGIS to Calculate Standard Deviation Ellipse for Population and for Illiterate Population SDE for Population Go to Spatial Statistics Tools>Measuring Geographic Distribution> Directional Distribution Input Feature Class: China Output: SDE_Population.shp Weight Field: Data$.Pop2008 Mean Center for Illiterate Percent Go to Spatial Statistics Tools>Measuring Geographic Distribution>Mean Center Output: MC_Illit_PerCent.shp Weight Field: Data$.Illiterate_Prcnt SDE for Illiterate Percent Output: SDE_Illit_PerCent.shp Weight Field: Data$.Illiterate_Prcnt. Briggs Henan University 2010

40 4. Use GeoDA to find the Centroids of the Provinces of China
(Need ArcInfo to do this in ArcGIS, which is expensive. GeoDA is free. ) --The GeoDA program is on my Web site at: or go to --download, unzip, and click the file OpenGeoDA.exe to start the software --it does have some “bugs” so some things may not work or it may crash! --Input the provinces shapefile: File>Open Shape File China.shp --Open the data table: Table>Promotion to see what is there --Create centroids for each province: Options> Add Centroids to Table Place check mark in X coordinates and Y coordinates box, click OK --Go to Table>Promotion to open the table—it has the X and Y centroid coordinates --Save as a new shapefile: Table> Save to Shapefile as China_Centroids.shp I then opened the China_centroids.dbf (part of the shapefile) file with Excel and copied the centroid values into the ChinaProvincesData.xls spreadsheet. Briggs Henan University 2010


Download ppt "Inferential Spatial Statistics: Introduction to Concepts"

Similar presentations


Ads by Google