15. Descriptive Summary, Design, and Inference. Outline Data mining Descriptive summaries Optimization Hypothesis testing.

Slides:



Advertisements
Similar presentations
Spatial Autocorrelation using GIS
Advertisements

Spatial statistics Lecture 3.
GIS and Spatial Statistics: Methods and Applications in Public Health
Lecture 23 (mini-lecture): A Brief Introduction to Network Analysis Parts of the Network Analysis section of this lecture were borrowed from a UC Berkeley.
Using ESRI ArcGIS 9.3 Arc ToolBox 3 (Spatial Analyst)
QUANTITATIVE DATA ANALYSIS
Grid-based GIS Modeling Nigel Trodd Modified from Berry JK, GIS Modeling, presented at Grid-based Map Analysis Techniques and Modeling Workshop,
More Raster and Surface Analysis in Spatial Analyst
Applied Geostatistics Geostatistical techniques are designed to evaluate the spatial structure of a variable, or the relationship between a value measured.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
SA basics Lack of independence for nearby obs
Why Geography is important.
Pattern Statistics Michael F. Goodchild University of California Santa Barbara.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
1 Spatial Statistics and Analysis Methods (for GEOG 104 class). Provided by Dr. An Li, San Diego State University.
Geographic Information Systems Applications in Natural Resource Management Chapter 14 Raster GIS Database Analysis II Michael G. Wing & Pete Bettinger.
Fundamentals of GIS Lecture Materials by Austin Troy except where noted © 2008 Lecture 14: More Raster and Surface Analysis in Spatial Analyst Using.
Slope and Aspect Calculated from a grid of elevations (a digital elevation model) Slope and aspect are calculated at each point in the grid, by comparing.
Spatial Analysis cont. Density Estimation, Summary Spatial Statistics, Routing.
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
Descriptive Statistics for Spatial Distributions Chapter 3 of the textbook Pages
Spatial Analysis cont. Optimization Network Analysis, Routing.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
Spatial Analysis.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Geographic Information Science
1 Spatial Analysis in GIS EAA 502 MSc. Course Lecture 3 Dr Mohd Sanusi.
Intro to Raster GIS GTECH361 Lecture 11. CELL ROW COLUMN.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Chapter 4 – Descriptive Spatial Statistics Scott Kilker Geog Advanced Geographic Statistics.
Extent and Mask Extent of original data Extent of analysis area Mask – areas of interest Remember all rasters are rectangles.
Geo479/579: Geostatistics Ch4. Spatial Description.
1 Spatial Statistics and Analysis Methods (for GEOG 104 class). Provided by Dr. An Li, San Diego State University.
Final Review Final will cover all lectures, book, and class assignments. New lectures since last test are 18 – 26, summarized here. Over half the test.
What’s the Point? Working with 0-D Spatial Data in ArcGIS
So, what’s the “point” to all of this?….
Final Project : 460 VALLEY CRIMES. Chontanat Suwan Geography 460 : Spatial Analysis Prof. Steven Graves, Ph.D.
PCB 3043L - General Ecology Data Analysis. PCB 3043L - General Ecology Data Analysis.
Spatial Analysis cont. Density Estimation, Summary Spatial Statistics, Routing Longley et al., chs. 13,14.
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/ Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan,
L15 – Spatial Interpolation – Part 1 Chapter 12. INTERPOLATION Procedure to predict values of attributes at unsampled points Why? Can’t measure all locations:
Spatial Statistics and Analysis Methods (for GEOG 104 class).
Geotechnology Geotechnology – one of three “mega-technologies” for the 21 st Century Global Positioning System (Location and navigation) Remote Sensing.
PCB 3043L - General Ecology Data Analysis.
Phil Hurvitz Avian Conservation Lab Meeting 8. March. 2002
Types of Spatial Analysis
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Spatial statistics Lecture 3 2/4/2008. What are spatial statistics Not like traditional, a-spatial or non-spatial statistics But specific methods that.
PCB 3043L - General Ecology Data Analysis Organizing an ecological study What is the aim of the study? What is the main question being asked? What are.
Network Analyst. Network A network is a system of linear features that has the appropriate attributes for the flow of objects. A network is typically.
Unsupervised Learning
Density Estimation Converts points to a raster
15. Descriptive Summary, Design, and Inference
Introduction to Spatial Statistical Analysis
Data Mining: Concepts and Techniques
Raster Analysis Ming-Chun Lee.
Quantifying Scale and Pattern Lecture 7 February 15, 2005
PCB 3043L - General Ecology Data Analysis.
Raster Analysis and Terrain Analysis
Lidar Image Processing
Spatial Analysis cont. Optimization Network Analysis, Routing
Spatial Analysis Part 2.
Spatial Analysis cont. Optimization Network Analysis, Routing
Spatial Analysis cont. Optimization Network Analysis, Routing
Unsupervised Learning
Presentation transcript:

15. Descriptive Summary, Design, and Inference

Outline Data mining Descriptive summaries Optimization Hypothesis testing

Data mining Analysis of massive data sets in search for patterns, anomalies, and trends spatial analysis applied on a large scale must be semi-automated because of data volumes widely used in practice, e.g. to detect unusual patterns in credit card use

Descriptive summaries Attempt to summarize useful properties of data sets in one or two statistics The mean or average is widely used to summarize data centers are the spatial equivalent there are several ways of defining centers

The centroid Found for a point set by taking the weighted average of coordinates The balance point

Optimization properties The centroid minimizes the sum of distances squared but not the sum of distances from each point the center with that property is called the point of minimum aggregate travel (MAT) the properties have frequently been confused, e.g. by the U.S. Bureau of the Census in calculating the center of U.S. population the MAT must be found by iteration rather than by calculation

Applications of the MAT Because it minimizes distance the MAT is a useful point at which to locate any central service e.g., a school, hospital, store, fire station finding the MAT is a simple instance of using spatial analysis for optimization

Dispersion A measure of the spread of points around a center Useful for determining positional error Related to the width of the kernel used in density estimation

Spatial dependence There are many ways of measuring this very important summary property The semivariogram, see Chapter 13 measures spatial dependence over a range of scales The Moran and Geary indices, see Chapter 5

Descriptions of Pattern Many techniques depending on the type of features and whether they are differentiated by attributes (labeled) measures for unlabeled features look for purely geometric pattern measures for labeled features ask about patterns in the labels

Patterns in Unlabeled Points Locations of disease, crimes, traffic accidents Do events tend to cluster more in some areas than others? Or are they random, equally likely anywhere? Or are they dispersed, such that points are less likely in areas close to other points?

The K Function Captures how density of points varies with distance away from a reference point By comparing to what would be expected in a random distribution of points

Point pattern of individual tree locations. A, B, and C identify the individual trees analyzed in the next slide. (Source: Getis A, Franklin J 1987 Second-order neighborhood analysis of mapped point patterns. Ecology 68(3): ).

Analysis of the local distribution of trees around three reference trees in the previous slide (see text for discussion). (Source: Getis A, Franklin J 1987 Second-order neighborhood analysis of mapped point patterns. Ecology 68(3): ).

Pattern in Labeled Features How are the attributes (labels) distributed over the features? Clustered, with neighboring features having similar values Random, with labels assigned independently of location Dispersed, with neighboring features having dissimilar values

In the map window the states are colored according to median house value, with the darker shades corresponding to more expensive housing. In the scatterplot window the three points colored yellow are instances where a state of below-average housing value is surrounded by states of above-average value.

Fragmentation statistics Measure the patchiness of data sets e.g., of vegetation cover in an area Useful in landscape ecology, because of the importance of habitat fragmentation in determining the success of animal and bird populations populations are less likely to survive in highly fragmented landscapes

Three images of part of the state of Rondonia in Brazil, for 1975, 1986, and Note the increasing fragmentation of the natural habitat as a result of settlement. Such fragmentation can adversely affect the success of wildlife populations.

Optimization Spatial analysis can be used to solve many problems of design A spatial decision support system (SDSS) is an adaptation of GIS aimed at solving a particular design problem

Optimizing point locations The MAT is a simple case: one service location and the goal of minimizing total distance traveled The operator of a chain of convenience stores or fire stations might want to solve for many locations at once where are the best locations to add new services? which existing services should be dropped?

Location-allocation problems Design locations for services, and allocate demand to them, to achieve specified goals Goals might include: minimizing total distance traveled minimizing the largest distance traveled by any customer maximizing profit minimizing a combination of travel distance and facility operating cost

Routing problems Search for optimum routes among several destinations The traveling salesman problem find the shortest tour from an origin, through a set of destinations, and back to the origin

Routing service technicians for Schindler Elevator. Every day this company’s service crews must visit a different set of locations in Los Angeles. GIS is used to partition the day’s workload among the crews and trucks (color coding) and to optimize the route to minimize time and cost.

Optimum paths Find the best path across a continuous cost surface between defined origin and destination to minimize total cost cost may combine construction, environmental impact, land acquisition, and operating cost used to locate highways, power lines, pipelines requires a raster representation

Solution of a least-cost path problem. The white line represents the optimum solution, or path of least total cost, across a friction surface represented as a raster. The area is dominated by a mountain range, and cost is determined by elevation and slope. The best route uses a narrow pass through the range. The blue line results from solving the same problem using a coarser raster.

Hypothesis testing Hypothesis testing is a recognized branch of statistics A sample is analyzed, and inferences are made about the population from which the sample was drawn The sample must normally be drawn randomly and independently from the population

Hypothesis testing with spatial data Frequently the data represent all that are available e.g., all of the census tracts of Los Angeles It is consequently difficult to think of such data as a random sample of anything not a random sample of all census tracts Tobler’s Law guarantees that independence is problematic unless samples are drawn very far apart

Possible approaches to inference Treat the data as one of a very large number of possible spatial arrangements useful for testing for significant spatial patterns Discard data until cases are independent no one likes to discard data Use models that account directly for spatial dependence Be content with descriptions and avoid inference