Why Is It There? Getting Started with Geographic Information Systems Chapter 6
6 Why Is It There? l 6.1 Describing Attributes l 6.2 Statistical Analysis l 6.3 Spatial Description l 6.4 Spatial Analysis l 6.5 Searching for Spatial Relationships l 6.6 GIS and Spatial Analysis
Duecker (1979) " A geographic information system is a special case of information systems where the database consists of observations on spatially distributed features, activities or events, which are definable in space as points, lines, or areas. A geographic information system manipulates data about these points, lines, and areas to retrieve data for ad hoc queries and analyses ".
GIS is capable of data analysis l Attribute Data –Describe with statistics –Analyze with hypothesis testing l Spatial Data –Describe with maps –Analyze with spatial analysis
Describing one attribute
Attribute Description l The extremes of an attribute are the highest and lowest values, and the range is the difference between them in the units of the attribute. l A histogram is a two-dimensional plot of attribute values grouped by magnitude and the frequency of records in that group, shown as a variable-length bar. l For a large number of records with random errors in their measurement, the histogram resembles a bell curve and is symmetrical about the mean.
If the records are: l Text –Length of text –word frequency –address matching l Example: Display all places called “State Street”
If the records are: l Classes –histogram by class –numbers in class –contiguity description
Describing a classed raster grid P (blue) = 19/48
If the records are: l Numbers –statistical description –min, max, range –variance and standard deviation
Statistical description l Range (min, max, max-min) l Central tendency (mode, median, mean) l Variation (variance, standard deviation)
Elevation (book example)
Mean l Statistical average l Sum of the values for one attribute divided by the number of records X i i1= n = X
Computing the Mean l Sum of attribute values across all records, divided by the number of records. l A representative value, and for measurements with normally distributed error, converges on the true reading. l A value lacking sufficient data for computation is called a missing value.
Variance l The total variance is the sum of each record with its mean subtracted and then multiplied by itself. l The standard deviation is the square root of the variance divided by the number of records less one.
l Average difference from the mean l Sum of the mean subtracted from the value for each record, squared, divided by the number of records- 1, square rooted. st.dev. = (X - X ) 2 i n - 1 Standard Deviation
GPS Example Data: Elevation Standard deviation l Same units as the values of the records, in this case meters. l The average amount by which the readings differ from the average l Can be above or below the mean l Elevation is the mean (459.2 meters), plus or minus the expected error of meters l Elevation is most likely to lie between meters and meters. l These limits are called the error band or margin of error.
Hypothesis testing l Establish NULL hypothesis (e.g. Values or Means are the same) l Establish ALTERNATIVE hypothesis, based on some expectation. l Test hypothesis. Try to reject NULL. l If null hypothesis is rejected, there is some support for the alternative (theory-based) hypothesis.
Uses of the standard deviation l Shorthand description : given the mean and s.d., we know where 67% of a random distribution lies. l A standardized measure : –a score of 80% can be good or bad, depending on the mean and s.d.
Testing the Mean l A test of means can establish whether two samples from a population are different from each other, or whether the different measures they have are the result of random variation.
Samples and populations l A sample is a set of measurements taken from a larger group or population. l Sample means and variances can serve as estimates for their populations.
Spatial analysis with GIS l GIS data description answers the question: Where? l GIS data analysis answers the question: Why is it there? l GIS data description is different from statistics because the results can be placed onto a map for visual analysis.
Spatial Statistical Description l For coordinates, the means and standard deviations correspond to the mean center and the standard distance l A centroid is any point chosen to represent a higher dimension geographic feature, of which the mean center is only one choice. l The standard distance for a set of point spatial measurements is the expected spatial error.
Spatial Statistical Description l For coordinates, data extremes define the two corners of a bounding rectangle.
Geographic extremes l Southernmost point in the continental United States. l Range: e.g. elevation difference; map extent
Mean Center mean y mean x
Centroid: mean center of a feature
GIS and Spatial Analysis l Descriptions of geographic properties such as shape, pattern, and distribution are often verbal l Quantitative measure can be devised, although few are computed by GIS. l GIS statistical computations are most often done using retrieval options such as buffer and spread. l Also by manipulating attributes with arithmetic commands (map algebra).
An example l Lower 48 United States l 1994 Data from the U.S. Census on gender l Gender Ratio = # females per 100 males l Range is l What does the spatial distribution look like?
Gender Ratio by State: 1994
Searching for Spatial Pattern l A linear relationship is a predictable straight-line link between the values of a dependent and an independent variable. It is a simple model of the relationship. l A linear relation can be tested for goodness of fit with least squares methods. The coefficient of determination r-squared is a measure of the degree of fit, and the amount of variance explained.
Simple linear relationship dependent variable independent variable observation best fit regression line y = a + bx intercept gradient y=a+bx
Testing the relationship gr = long.
Patterns in Residual Mapping l Differences between observed values of the dependent variable and those predicted by a model are called residuals. l A GIS allows residuals to be mapped and examined for spatial patterns. l A model helps explanation and prediction after the GIS analysis. l A model should be simple, should explain what it represents, and should be examined in the limits before use.
Mapping residuals from a model
Unexplained variance l More variables? l Different extent? l More records? l More spatial dimensions? l More complexity? l Another model? l Another approach?
GIS and Spatial Analysis l Many GIS systems have to be coaxed to generate a full set of spatial statistics.
Analytic Tools and GIS l Tools for searching out spatial relationships and for modeling are only lately being integrated into GIS. l Statistical and spatial analytical tools are also only now being integrated into GIS, and many people use separate software systems outside the GIS: “loosely coupled” analyses.
Analytic Tools and GIS l Real geographic phenomena are dynamic, but GISs have been mostly static. Time-slice and animation methods can help in visualizing and analyzing spatial trends. l GIS organizes real-world data to allow numerical description and allows the analyst to model, analyze, and predict with both the map and the attribute data.
You can lie with... l Maps l Statistics l Correlation is not causation!