Download presentation
Published byKelley Atkinson Modified over 9 years ago
1
Stefan Falke stefan@me.wustl.edu
An Overview of Spatial Data Analysis Stefan Falke
2
Pop vs Soda vs Coke
3
Pop vs Soda vs Coke by County
4
2000 Presidential Election Results
Bush States: 30 votes: 50,456,169 Gore States: 21 votes: 50,996,116
5
2000 Presidential Election Results by County
Bush Gore
6
Environmental Pattern and Trend Analysis
When analyzing environmental data we examine: Spatial Patterns Temporal Trends We are particularly interested in changes in these patterns and trends and relationships with other patterns and trends The analysis also strives to determine why we see these patterns and trends – what are the casual factors and what are their impacts.
7
Spatial and Temporal Data Analysis
Turns raw data into useful information by adding greater informative content and value Wisdom Knowledge / Evidence Data Information
8
What is Spatial Data Analysis?
Spatial analysis is the quantitative and qualitative study of phenomena that are located in space. Environmental spatial data analysis describes characteristics and behavior of the environment Explores patterns, trends, and relationships in environmental data Seeks to explain these patterns, trends, and relationships Differs from general data analysis and statistics in that: Spatial data are dependent on location and related by location (they do not adhere to the independence assumption made in regular data analysis) Have properties that require special analysis methods Why is spatial analysis such a big deal? about 85% of environmental data is spatial
9
What is GIS? Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data. ‘GIS’ is Geographical Information System OR IS IT Geographical Information Science? GISystems: Emphasis on technology and tools GIScience: Fundamental issues raised by the use of GIS, such as Spatial analysis Map projections Accuracy Scientific visualization Implementation and application of GIS covers a wide spectrum: Simple maps Overlaying multiple map “layers” Conducting proximity or cluster analysis based on distance Comparing data sets (simple spatial statistics) Complex statistical analysis
10
Nature Vol 427 22 January 2004
11
Special Spatial Nomenclature
Geographic – Limited to phenomena and problems relating to Earth’s surface and near-surface Spatial – Any space, including geographic, but not restricted to geographic coordinate space, e.g. medical imaging Geospatial – A recent term to represent the subset of spatial applied specifically to the Earth’s surface. (synonymous with geographic)
13
Tobler’s First Law of Geography
“Everything is related to everything else, but near things are more related than distant things.” Tobler, 1970 This general assumption is what subjects spatial data subject to special statistical laws
14
Types of Spatial Analysis
There are literally thousands of techniques Bailey and Gatrell, 1995 offer four spatial data analysis classes: Point Data Analysis Do the locations of point data and the relationship among the points represent a ‘significant’ pattern Continuous Data Analysis What are the spatial pattern and characteristics over a region given a set of samples Area Data Analysis Analysis of data that have been aggregated over a spatial zone, e.g. county
15
The John Snow Map A classic example of the use of location to draw inferences 1854 cholera outbreak in London Point data map indicated some spatial clustering Overlaying a map of water pump locations showed many cases were concentrated around a single pump
16
Continuous Data Analysis
Temperature data is well suited for converting from point to continuous data - It has high spatial density - Ambient temperature is relatively spatially homogenous (no sharp gradients)
17
County Level Aggregated Data
Also known as a chloropleth plot
18
Scale The most appropriate analysis method to use depends on the spatial and temporal scales of the problem. The spatial variability of temperature at a ‘local’ scale is not necessarily significant when conducting an analysis over at the ‘regional’ or ‘global’ scale.
19
Scale Dependent Measurements
How long is Maine’s coastline? length=340 km length=355 km length=415 km From Longley et al., 2001
20
What’s in a map, anyway? Theme: Static map
Maps of entities whose location is known and constant (relatively) Roads, borders, locations of buildings These types of layers are often referred to as “thematic” layers Are usually used to provide context to other spatial data Statistical: Realization of one of the many possible patterns that may have been generated by a process Given a set of conditions, a given spatial pattern is just one instance among a distribution of possible patterns The question is: Is the observed realization significantly different than what would be expected by chance?
21
Deterministic versus Stochastic Processes
Deterministic processes have one realization: the value at a given location is always the same, regardless of the number of times the process is occurs Stochastic processes have multiple realizations that are not precisely predicted and involve a random component. For our purposes, random refers to the method used to generate a pattern not the resulting pattern itself.
22
Examples of Deterministic & Stochastic Processes
random variable
23
Random Spatial Processes
A random process does not mean that all events are independent of one another, as is the case with flipping a coin or rolling dice. Rather, spatial random processes are random with dependence (or rules). Consider a “conditionally” random display of 4 coins: Flip the first 3 coins and display by their flipped side (head or tails) The 4th coin will not be flipped The 4th coin is displayed as follows: If the 2nd and 3rd flipped coins are heads, the 4th is the same as the first Otherwise, the 4th is opposite of the first.
24
Basic Statistical Concepts
Variance: Mean: Median: The value in the distribution at which 50% of the data points lie both above and below Covariance: Frequency/Probability Distributions Normal or Gaussian Poisson mean=variance mean=median
25
Distribution Summary Statistics
The features of a distribution can be summarized using: Measures of Location Mean Median Quantiles Measures of Spread Standard Deviation = Square Root of Variance Measures of Shape Coefficient of skewness – a measure of symmetry Kurtosis – a measure of the likelihood of outliers
26
Complete Spatial Randomness
Take as an example a randomly generated point data set where 1) the chance of a given x,y point existing is equal to the chance any other point existing (uniform probability distribution) 2) the existence of a x,y point is independent of the existence of any other point These two conditions constitute an independent random process (IRP) or complete spatial randomness (CSR)
27
Exploratory Spatial Data Analysis (ESDA)
Aim is to identify data properties for purposes of pattern detection Based on the use of graphical and visual methods and the use of numerical techniques that are statistically robust i.e. not much affected by extreme or atypical data values. ArcGIS Geostatistical Analyst extension contains a set of ESDA tools: Histogram (Frequency Distribution) Voronoi Map QQPlot Trend Analysis
28
Exploratory Analysis Example
29
Summary Statistics
30
Quantile Plots Graphs the quantiles of a dataset against the quantiles of a normal distribution
31
Vornoi Plot Voronoi plots assign or calculate values to a point’s polygon. Including: value itself mean of neighboring polygons most frequent value among neighboring polygons unique value among neighbors variation among neighbors
32
Spatial Smoothing/Averaging
33
Data Types Two general views to organizing spatial data:
Entities or objects Point measurements, rivers, structures Have attributes or features attached to them Point, vector or area format Values exist at discrete locations Fields Continuous data such as temperature gradient fields and satellite imagery Values exist over an area Raster format (grids)
34
Data Types Entities and fields can be transformed to the other type
35
Raster and Vector Data Models
Real World 600 1 2 3 4 5 6 7 8 9 10 1 B G Trees 500 2 B G G 3 B 400 4 B G G Trees Y-AXIS 5 B G G 300 6 B G G BK House 7 B 200 8 B B River 9 B 100 10 B 100 200 300 400 500 600 X-AXIS Raster Representation Vector Representation adapted from Lembo, 2003
36
Landcover Raster Grid (16-20) (11-15) (6-10) (1-5) 2 17 16 15 14 11 13
12 10 8 7 6 5 4 3 Legend Mixed conifer Douglas fir Oak savannah Grassland
37
What is GIS? Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data. ‘GIS’ is Geographical Information System OR IS IT Geographical Information Science? GISystems: Emphasis on technology and tools GIScience: Fundamental issues raised by the use of GIS, such as Spatial analysis Map projections Accuracy Scientific visualization Implementation and application of GIS covers a wide spectrum: Simple maps Overlaying multiple map “layers” Conducting proximity or cluster analysis based on distance Comparing data sets (simple spatial statistics) Complex statistical analysis
38
GIS Functionality Filtering Aggregation Integration
Retrieves a subset of a dataset Examples Query (search) Aggregation Combines attributes or features within data sources (layers) Reclassify, dissolve Integration Combine two or more data sources (layers) Example Polygon overlay, table joining
39
Spatial Queries (Filter)
Identifying features based on spatial criteria Criteria include variations on: adjacency, containment, arrangement, and connectivity Adjacency Which states are adjacent to the State of Missouri? Containment Which states “contain” the Mississippi River and its tributaries?
40
Reclassification (Aggregation)
An assignment of a class or value based on the attributes or geography of an object
41
Reclassification & Dissolve
42
Variable Distance Buffering
43
Polygon Overlay (Integration)
Topology describes the relationships between elements of a map. A topological data structure defines the elements of the map in a way that makes it possible to know which line segments are connected to each other and to know what polygon is adjacent to each side of a line segment.
44
Polygon Overlay Examples
“Cookie-cutter” method
45
© Paul Bolstad, GIS Fundamentals
Coordinate Systems A geographical coordinate system uses a three-dimensional spherical surface to define locations on the earth. Divides space into orderly structure of locations. Two types: Cartesian and angular (spherical) © Paul Bolstad, GIS Fundamentals
46
Parallels and Meridians
Meridians are great circles of constant longitude Example is the prime meridian Parallels are circles of constant latitude Example is the equator latitude (φ): angular distance from equator longitude (λ): angular distance from standard meridian St. Louis 38° 39' N 90° 38' W New York 40° 47' N 73° 58' W Los Angeles 34° 3' N 118° 14' W Rome 41° 48' N 12° 36' E Sydney 33° 52' S 151° 12' E
47
Earth’s Expanding Waistline
From the Chronicle of Higher Education Jan 17, 2003
48
Datum While a spheroid approximates the shape of the earth, a datum defines the position of the ellipsoid relative to the center of the Earth The datum provides a frame of reference for measuring locations on the surface of the Earth A datum is chosen to align a spheroid to closely fit the Earth’s surface in a particular area
49
Map Projections and Distortions
Three general types of projections: Equal area – the ratio of areas on the earth and on the map are constant. Shape, angle, and scale are distorted. Conformal – the shape of any small surface of the map is preserved in its original form. If meridians and parallel lines are at 90-degree angles, then angles are also preserved. Equidistant - preserve distances between certain points. Scale is not maintained correctly, however, typically one or more lines has its scale maintained.
50
Comparing Projections
51
Summary Statistics of a Point Pattern
Mean center average of the x and y coordinates (geographic mean) X Standard Distance average distance of points from center (provides measure of dispersion) X Summary Circle Centered at mean center with a radius of the standard distance X
52
US Population Density
53
Geographic Center of US Population
The center of the US population is calculated as the average latitude and longitudes weighted by the population at a uniformly spaced set of points
54
Quadrant Count A quadrant count is conducted by superimposing a regular grid over data, counting the number of events in each grid cell and divide the count by its cell area to get intensity. 40 grid cells Variance: Mean cell count A s2 to µ ratio greater than 1 indicates clustering
55
Spatial Autocorrelation
Defines the correlation between values of the same variable at different spatial locations Positive Spatial Autocorrelation Like values tend to cluster in space Negative Spatial Autocorrelation Neighbors are dissimilar Zero Spatial Autocorrelation No correlation
56
spatial estimation method continuous surface of estimates (map)
From points to fields The factor that determines how much influence a data point is assigned during the calculation of the estimate spatial estimation method ci is the estimated value at location i n is the number of data points cj is the value at data point j wij is the weight assigned to data point j continuous surface of estimates (map) point monitoring data The weighting factor is usually the distinguishing feature of interpolation methods. Biggest challenge: How to determine the weights?
57
Inverse Distance Interpolation
k is the power-law of distance weighting Constrained to the minimum and maximum values in point data set
58
Spatial Smoothing/Averaging
59
Landcover Raster Grid (16-20) (11-15) (6-10) (1-5) 2 17 16 15 14 11 13
12 10 8 7 6 5 4 3 Legend Mixed conifer Douglas fir Oak savannah Grassland
60
Raster Analysis (Continuous Data)
2 7 Moving Windows minimum maximum 2 3 5 2 3 6 3 5 7 range mean 5 4
61
Slope Slope is the change is elevation (rise) with a change in horizontal position (run). The steepest decent between a cell and its neighbors is known as the gradient. Slope is often reported in degrees (0° is flat, 90° is vertical) but is also expressed as a percent
62
Hands-on Exercise: Mapping Census Data
Database manipulation (table joins) Reprojecting maps Calculating derived values (population density, change population over time) Visualization
63
ArcGIS Main Components
ArcCatalog ArcToolbox ArcMap
65
Data Quality It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable Uncertainty is found in data and in its processing and analysis The outputs from spatial data analysis and GIS are only as good as the inputs and associated assumptions.
66
Logical Consistency Representation of data that does not make sense
Road in the water Contours that cross or end Features on steep slopes
67
Modifiable areal unit problem
Multiple ways to aggregate data into zones and thereby yielding different results.
68
Anscombe’s Quartet These four data sets look identical from a statistical perspective.
69
Anscombe’s Quartet They don’t look anything alike from a graphical perspective!!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.