Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis Stefan Falke stefan@me.wustl.edu http://capita.wustl.edu/ENVE424/REU/SpatialAnalysis.htm.

Slides:



Advertisements
Similar presentations
WFM 6202: Remote Sensing and GIS in Water Management © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 6202: Remote Sensing and GIS in Water Management Akm.
Advertisements

Agricultural and Biological Statistics
Basic geostatistics Austin Troy.
Raster Based GIS Analysis
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
School of Environmental Sciences University of East Anglia
GIS and Spatial Statistics: Methods and Applications in Public Health
Correlation and Autocorrelation
West Hills College Farm of the Future. West Hills College Farm of the Future Where are you NOW?! Precision Agriculture – Lesson 3.
Geographic Information Systems
Spatial Analysis Longley et al., Ch 14,15. Transformations Buffering (Point, Line, Area) Point-in-polygon Polygon Overlay Spatial Interpolation –Theissen.
Information Systems and GIS Chapter 2 Slides from James Pick, Geo-Business: GIS in the Digital Organization, John Wiley and Sons, Copyright © 2008.
Geographic Information Systems. What is a Geographic Information System (GIS)? A GIS is a particular form of Information System applied to geographical.
Environmental Spatial Data Analysis Stefan Falke Urbauer 319D CE/ENVE 424/524.
Why Geography is important.
Dr. David Liu Objectives  Understand what a GIS is  Understand how a GIS functions  Spatial data representation  GIS application.
12.3 – Measures of Dispersion
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Slope and Aspect Calculated from a grid of elevations (a digital elevation model) Slope and aspect are calculated at each point in the grid, by comparing.
Basic Spatial Analysis
Spatial data models (types)
Area Objects and Spatial Autocorrelation Chapter 7 Geographic Information Analysis O’Sullivan and Unwin.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
ESRM 250 & CFR 520: Introduction to GIS © Phil Hurvitz, KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation,
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Map Scale, Resolution and Data Models. Components of a GIS Map Maps can be displayed at various scales –Scale - the relationship between the size of features.
Using ESRI ArcGIS 9.3 Spatial Analyst
Basic Geographic Concepts GEOG 370 Instructor: Christine Erlien.
Interpolation.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Interpolation Tools. Lesson 5 overview  Concepts  Sampling methods  Creating continuous surfaces  Interpolation  Density surfaces in GIS  Interpolators.
Intro to Raster GIS GTECH361 Lecture 11. CELL ROW COLUMN.
Raster Concepts.
GEOSTATISICAL ANALYSIS Course: Special Topics in Remote Sensing & GIS Mirza Muhammad Waqar Contact: EXT:2257.
Data Types Entities and fields can be transformed to the other type Vectors compared to rasters.
Tables tables are rows (across) and columns (down) common format in spreadsheets multiple tables linked together create a relational database entity equals.
Introduction to Cartographic Modeling
Spatial Interpolation III ENVE 424/524. Spatial Interpolation We can estimate the value of a variable at a location based on the values of that variable.
Spatial Interpolation III
Extent and Mask Extent of original data Extent of analysis area Mask – areas of interest Remember all rasters are rectangles.
Ripley K – Fisher et al.. Ripley K - Issues Assumes the process is homogeneous (stationary random field). Ripley K was is very sensitive to study area.
Spatial Interpolation Chapter 13. Introduction Land surface in Chapter 13 Land surface in Chapter 13 Also a non-existing surface, but visualized as a.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Geo479/579: Geostatistics Ch4. Spatial Description.
Concepts and Applications of Kriging
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
URBDP 422 URBAN AND REGIONAL GEO-SPATIAL ANALYSIS Lecture 3: Building a GeoDatabase; Projections Lab Session: Exercise 3: vector analysis Jan 14, 2014.
NR 143 Study Overview: part 1 By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
1 Overview Importing data from generic raster files Creating surfaces from point samples Mapping contours Calculating summary attributes for polygon features.
So, what’s the “point” to all of this?….
L15 – Spatial Interpolation – Part 1 Chapter 12. INTERPOLATION Procedure to predict values of attributes at unsampled points Why? Can’t measure all locations:
Geotechnology Geotechnology – one of three “mega-technologies” for the 21 st Century Global Positioning System (Location and navigation) Remote Sensing.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Geospatial Data Types. Data Types Two general views to organizing spatial data: –Objects Monitoring measurement points, rivers, structures Have attributes.
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
Patterns and Trends CE/ENVE 424/524. Classroom Situation Option 1: Stay in Lopata House 22 pros: spacious room desks with chairs built in projector cons:
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
INTERPOLATION Procedure to predict values of attributes at unsampled points within the region sampled Why?Examples: -Can not measure all locations: - temperature.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
GEOGRAPHICAL INFORMATION SYSTEM
Raster Analysis Ming-Chun Lee.
Statistical surfaces: DEM’s
Lecture 6 Implementing Spatial Analysis
Review- vector analyses
Cartographic and GIS Data Structures
Spatial interpolation
Concepts and Applications of Kriging
Presentation transcript:

Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis Stefan Falke stefan@me.wustl.edu http://capita.wustl.edu/ENVE424/REU/SpatialAnalysis.htm

Pop vs Soda vs Coke http://www.popvssoda.com/

Pop vs Soda vs Coke by County

2000 Presidential Election Results Bush States: 30 votes: 50,456,169 Gore States: 21 votes: 50,996,116

2000 Presidential Election Results by County Bush Gore

Environmental Pattern and Trend Analysis When analyzing environmental data we examine: Spatial Patterns Temporal Trends We are particularly interested in changes in these patterns and trends and relationships with other patterns and trends The analysis also strives to determine why we see these patterns and trends – what are the casual factors and what are their impacts.

Spatial and Temporal Data Analysis Turns raw data into useful information by adding greater informative content and value Wisdom Knowledge / Evidence Data Information

What is Spatial Data Analysis? Spatial analysis is the quantitative and qualitative study of phenomena that are located in space. Environmental spatial data analysis describes characteristics and behavior of the environment Explores patterns, trends, and relationships in environmental data Seeks to explain these patterns, trends, and relationships Differs from general data analysis and statistics in that: Spatial data are dependent on location and related by location (they do not adhere to the independence assumption made in regular data analysis) Have properties that require special analysis methods Why is spatial analysis such a big deal? about 85% of environmental data is spatial

What is GIS? Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data. ‘GIS’ is Geographical Information System OR IS IT Geographical Information Science? GISystems: Emphasis on technology and tools GIScience: Fundamental issues raised by the use of GIS, such as Spatial analysis Map projections Accuracy Scientific visualization Implementation and application of GIS covers a wide spectrum: Simple maps Overlaying multiple map “layers” Conducting proximity or cluster analysis based on distance Comparing data sets (simple spatial statistics) Complex statistical analysis

Nature Vol 427 22 January 2004

Special Spatial Nomenclature Geographic – Limited to phenomena and problems relating to Earth’s surface and near-surface Spatial – Any space, including geographic, but not restricted to geographic coordinate space, e.g. medical imaging Geospatial – A recent term to represent the subset of spatial applied specifically to the Earth’s surface. (synonymous with geographic)

http://labs.google.com/location

Tobler’s First Law of Geography “Everything is related to everything else, but near things are more related than distant things.” Tobler, 1970 This general assumption is what subjects spatial data subject to special statistical laws

Types of Spatial Analysis There are literally thousands of techniques Bailey and Gatrell, 1995 offer four spatial data analysis classes: Point Data Analysis Do the locations of point data and the relationship among the points represent a ‘significant’ pattern Continuous Data Analysis What are the spatial pattern and characteristics over a region given a set of samples Area Data Analysis Analysis of data that have been aggregated over a spatial zone, e.g. county

The John Snow Map A classic example of the use of location to draw inferences 1854 cholera outbreak in London Point data map indicated some spatial clustering Overlaying a map of water pump locations showed many cases were concentrated around a single pump

Continuous Data Analysis Temperature data is well suited for converting from point to continuous data - It has high spatial density - Ambient temperature is relatively spatially homogenous (no sharp gradients)

County Level Aggregated Data Also known as a chloropleth plot

Scale The most appropriate analysis method to use depends on the spatial and temporal scales of the problem. The spatial variability of temperature at a ‘local’ scale is not necessarily significant when conducting an analysis over at the ‘regional’ or ‘global’ scale.

Scale Dependent Measurements How long is Maine’s coastline? length=340 km length=355 km length=415 km From Longley et al., 2001

What’s in a map, anyway? Theme: Static map Maps of entities whose location is known and constant (relatively) Roads, borders, locations of buildings These types of layers are often referred to as “thematic” layers Are usually used to provide context to other spatial data Statistical: Realization of one of the many possible patterns that may have been generated by a process Given a set of conditions, a given spatial pattern is just one instance among a distribution of possible patterns The question is: Is the observed realization significantly different than what would be expected by chance?

Deterministic versus Stochastic Processes Deterministic processes have one realization: the value at a given location is always the same, regardless of the number of times the process is occurs Stochastic processes have multiple realizations that are not precisely predicted and involve a random component. For our purposes, random refers to the method used to generate a pattern not the resulting pattern itself.

Examples of Deterministic & Stochastic Processes random variable

Random Spatial Processes A random process does not mean that all events are independent of one another, as is the case with flipping a coin or rolling dice. Rather, spatial random processes are random with dependence (or rules). Consider a “conditionally” random display of 4 coins: Flip the first 3 coins and display by their flipped side (head or tails) The 4th coin will not be flipped The 4th coin is displayed as follows: If the 2nd and 3rd flipped coins are heads, the 4th is the same as the first Otherwise, the 4th is opposite of the first.

Basic Statistical Concepts Variance: Mean: Median: The value in the distribution at which 50% of the data points lie both above and below Covariance: Frequency/Probability Distributions Normal or Gaussian Poisson mean=variance mean=median

Distribution Summary Statistics The features of a distribution can be summarized using: Measures of Location Mean Median Quantiles Measures of Spread Standard Deviation = Square Root of Variance Measures of Shape Coefficient of skewness – a measure of symmetry Kurtosis – a measure of the likelihood of outliers

Complete Spatial Randomness Take as an example a randomly generated point data set where 1) the chance of a given x,y point existing is equal to the chance any other point existing (uniform probability distribution) 2) the existence of a x,y point is independent of the existence of any other point These two conditions constitute an independent random process (IRP) or complete spatial randomness (CSR)

Exploratory Spatial Data Analysis (ESDA) Aim is to identify data properties for purposes of pattern detection Based on the use of graphical and visual methods and the use of numerical techniques that are statistically robust i.e. not much affected by extreme or atypical data values. ArcGIS Geostatistical Analyst extension contains a set of ESDA tools: Histogram (Frequency Distribution) Voronoi Map QQPlot Trend Analysis

Exploratory Analysis Example

Summary Statistics

Quantile Plots Graphs the quantiles of a dataset against the quantiles of a normal distribution

Vornoi Plot Voronoi plots assign or calculate values to a point’s polygon. Including: value itself mean of neighboring polygons most frequent value among neighboring polygons unique value among neighbors variation among neighbors

Spatial Smoothing/Averaging

Data Types Two general views to organizing spatial data: Entities or objects Point measurements, rivers, structures Have attributes or features attached to them Point, vector or area format Values exist at discrete locations Fields Continuous data such as temperature gradient fields and satellite imagery Values exist over an area Raster format (grids)

Data Types Entities and fields can be transformed to the other type

Raster and Vector Data Models Real World 600 1 2 3 4 5 6 7 8 9 10 1 B G Trees 500 2 B G G 3 B 400 4 B G G Trees Y-AXIS 5 B G G 300 6 B G G BK House 7 B 200 8 B B River 9 B 100 10 B 100 200 300 400 500 600 X-AXIS Raster Representation Vector Representation adapted from Lembo, 2003

Landcover Raster Grid (16-20) (11-15) (6-10) (1-5) 2 17 16 15 14 11 13 12 10 8 7 6 5 4 3 Legend Mixed conifer Douglas fir Oak savannah Grassland

What is GIS? Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data. ‘GIS’ is Geographical Information System OR IS IT Geographical Information Science? GISystems: Emphasis on technology and tools GIScience: Fundamental issues raised by the use of GIS, such as Spatial analysis Map projections Accuracy Scientific visualization Implementation and application of GIS covers a wide spectrum: Simple maps Overlaying multiple map “layers” Conducting proximity or cluster analysis based on distance Comparing data sets (simple spatial statistics) Complex statistical analysis

GIS Functionality Filtering Aggregation Integration Retrieves a subset of a dataset Examples Query (search) Aggregation Combines attributes or features within data sources (layers) Reclassify, dissolve Integration Combine two or more data sources (layers) Example Polygon overlay, table joining

Spatial Queries (Filter) Identifying features based on spatial criteria Criteria include variations on: adjacency, containment, arrangement, and connectivity Adjacency Which states are adjacent to the State of Missouri? Containment Which states “contain” the Mississippi River and its tributaries?

Reclassification (Aggregation) An assignment of a class or value based on the attributes or geography of an object

Reclassification & Dissolve

Variable Distance Buffering

Polygon Overlay (Integration) Topology describes the relationships between elements of a map. A topological data structure defines the elements of the map in a way that makes it possible to know which line segments are connected to each other and to know what polygon is adjacent to each side of a line segment.

Polygon Overlay Examples “Cookie-cutter” method

© Paul Bolstad, GIS Fundamentals Coordinate Systems A geographical coordinate system uses a three-dimensional spherical surface to define locations on the earth. Divides space into orderly structure of locations. Two types: Cartesian and angular (spherical) © Paul Bolstad, GIS Fundamentals

Parallels and Meridians Meridians are great circles of constant longitude Example is the prime meridian Parallels are circles of constant latitude Example is the equator latitude (φ): angular distance from equator longitude (λ): angular distance from standard meridian St. Louis 38° 39' N 90° 38' W New York 40° 47' N 73° 58' W Los Angeles 34° 3' N 118° 14' W Rome 41° 48' N  12° 36' E Sydney 33° 52' S  151° 12' E

Earth’s Expanding Waistline From the Chronicle of Higher Education Jan 17, 2003

Datum While a spheroid approximates the shape of the earth, a datum defines the position of the ellipsoid relative to the center of the Earth The datum provides a frame of reference for measuring locations on the surface of the Earth A datum is chosen to align a spheroid to closely fit the Earth’s surface in a particular area

Map Projections and Distortions Three general types of projections: Equal area – the ratio of areas on the earth and on the map are constant. Shape, angle, and scale are distorted. Conformal – the shape of any small surface of the map is preserved in its original form. If meridians and parallel lines are at 90-degree angles, then angles are also preserved. Equidistant - preserve distances between certain points. Scale is not maintained correctly, however, typically one or more lines has its scale maintained.

Comparing Projections

Summary Statistics of a Point Pattern Mean center average of the x and y coordinates (geographic mean) X Standard Distance average distance of points from center (provides measure of dispersion) X Summary Circle Centered at mean center with a radius of the standard distance X

US Population Density

Geographic Center of US Population The center of the US population is calculated as the average latitude and longitudes weighted by the population at a uniformly spaced set of points

Quadrant Count A quadrant count is conducted by superimposing a regular grid over data, counting the number of events in each grid cell and divide the count by its cell area to get intensity. 40 grid cells Variance: Mean cell count A s2 to µ ratio greater than 1 indicates clustering

Spatial Autocorrelation Defines the correlation between values of the same variable at different spatial locations Positive Spatial Autocorrelation Like values tend to cluster in space Negative Spatial Autocorrelation Neighbors are dissimilar Zero Spatial Autocorrelation No correlation

spatial estimation method continuous surface of estimates (map) From points to fields The factor that determines how much influence a data point is assigned during the calculation of the estimate spatial estimation method ci is the estimated value at location i n is the number of data points cj is the value at data point j wij is the weight assigned to data point j continuous surface of estimates (map) point monitoring data The weighting factor is usually the distinguishing feature of interpolation methods. Biggest challenge: How to determine the weights?

Inverse Distance Interpolation k is the power-law of distance weighting Constrained to the minimum and maximum values in point data set

Spatial Smoothing/Averaging

Landcover Raster Grid (16-20) (11-15) (6-10) (1-5) 2 17 16 15 14 11 13 12 10 8 7 6 5 4 3 Legend Mixed conifer Douglas fir Oak savannah Grassland

Raster Analysis (Continuous Data) 2 7 Moving Windows minimum maximum 2 3 5 2 3 6 3 5 7 range mean 5 4

Slope Slope is the change is elevation (rise) with a change in horizontal position (run). The steepest decent between a cell and its neighbors is known as the gradient. Slope is often reported in degrees (0° is flat, 90° is vertical) but is also expressed as a percent

Hands-on Exercise: Mapping Census Data Database manipulation (table joins) Reprojecting maps Calculating derived values (population density, change population over time) Visualization

ArcGIS Main Components ArcCatalog ArcToolbox ArcMap

Data Quality It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable Uncertainty is found in data and in its processing and analysis The outputs from spatial data analysis and GIS are only as good as the inputs and associated assumptions.

Logical Consistency Representation of data that does not make sense Road in the water Contours that cross or end Features on steep slopes

Modifiable areal unit problem Multiple ways to aggregate data into zones and thereby yielding different results.

Anscombe’s Quartet These four data sets look identical from a statistical perspective.

Anscombe’s Quartet They don’t look anything alike from a graphical perspective!!