Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Border around project area Everything else is hardly noticeable… but it’s there Big circles… and semi- transparent Color distinction is clear.
The University of Akron Dept of Business Technology Computer Information Systems The Relational Model: Query-By-Example (QBE) 2440: 180 Database Concepts.
Correlation and Autocorrelation
Geographic Information Systems
Spatial Analysis Longley et al., Ch 14,15. Transformations Buffering (Point, Line, Area) Point-in-polygon Polygon Overlay Spatial Interpolation –Theissen.
Information Systems and GIS Chapter 2 Slides from James Pick, Geo-Business: GIS in the Digital Organization, John Wiley and Sons, Copyright © 2008.
Geographic Information Systems. What is a Geographic Information System (GIS)? A GIS is a particular form of Information System applied to geographical.
Why Geography is important.
Marine GIS Applications using ArcGIS Global Classroom training course Marine GIS Applications using ArcGIS Global Classroom training course By T.Hemasundar.
Let’s pretty it up!. Border around project area Everything else is hardly noticeable… but it’s there Big circles… and semi- transparent Color distinction.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Correlation and Linear Regression
Data Mining Techniques
Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.
Site Location.
DU GIS Modeling -- Surface Modeling/Analysis
ESRM 250 & CFR 520: Introduction to GIS © Phil Hurvitz, KEEP THIS TEXT BOX this slide includes some ESRI fonts. when you save this presentation,
Spatial Data Mining Practical Approaches for Analyzing Relationships Within and Among Maps Berry & Associates // Spatial Information Systems 2000 S. College.
Part 3) Spatial Statistics. Spatial Statistics involves quantitative analysis of the “numerical context” of mapped data, such as characterizing the geographic.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
PPT th Edition. PPT 8-2 McGraw-Hill/Irwin Levy/Weitz: Retailing Management, 5/e Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Part 3) Spatial Statistics. Spatial Statistics involves quantitative analysis of the “numerical context” of mapped data, such as characterizing the geographic.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Multivariate Data Analysis CHAPTER seventeen.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
How do we represent the world in a GIS database?
Traditional Statistics Mean, StDev (Normal Curve) Mean, StDev (Normal Curve) Central Tendency Central Tendency Typical Response (scalar) Typical Response.
Intro to Raster GIS GTECH361 Lecture 11. CELL ROW COLUMN.
Spatial Statistics Operations Spatial Analysis Operations Reclassify and Overlay Distance and Neighbors GISer’s Perspective: Surface Modeling Spatial Data.
Traditional Statistics Mean, StDev (Normal Curve) Mean, StDev (Normal Curve) Central Tendency Central Tendency Typical Response (scalar) Typical Response.
Arben Asllani University of Tennessee at Chattanooga Prescriptive Analytics CHAPTER 8 Marketing Analytics with Linear Programming Business Analytics with.
2007 CAS Predictive Modeling Seminar Estimating Loss Costs at the Address Level Glenn Meyers ISO Innovative Analytics.
SpatialSTEM: A Mathematical/Statistical Framework for Understanding and Communicating Map Analysis and Modeling Presented by Joseph K. Berry Adjunct Faculty.
Spatial Interpolation III
Store 111 Competition Analysis (Costa Mesa Orange County, California) Ocean 28 Miles Newport Beach Mission Viejo Santa Ana TustinTustin Analytic Tools.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
GIS Data Structures How do we represent the world in a GIS database?
Figure 2-1. Two different renderings (categorizations) of corn yield data. Analyzing Precision Ag Data – text figures © 2002, Joseph K. Berry—permission.
Analyzing Precision Ag Data : Intermediate workshop on what is needed to move Precision Agriculture beyond mapping Joseph K. Berry W. M. Keck Visiting.
Introduction to GIS Modeling Week 8 — Surface Modeling GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department of.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Special Topics in Geo-Business Data Analysis Week 2 Covering Topics 4 and 5 Spatial Analysis Analyzing Location.
NR 143 Study Overview: part 1 By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
A Quick Introduction to GIS
1 Overview Importing data from generic raster files Creating surfaces from point samples Mapping contours Calculating summary attributes for polygon features.
So, what’s the “point” to all of this?….
Wireless Communication Technologies Group 3/20/02CISS 2002, Princeton 1 Distributional Properties of Inhibited Random Positions of Mobile Radio Terminals.
Grid-based Map Analysis Techniques and Modeling Workshop
Statistical Surfaces Any geographic entity that can be thought of as containing a Z value for each X,Y location –topographic elevation being the most obvious.
L15 – Spatial Interpolation – Part 1 Chapter 12. INTERPOLATION Procedure to predict values of attributes at unsampled points Why? Can’t measure all locations:
Geotechnology Geotechnology – one of three “mega-technologies” for the 21 st Century Global Positioning System (Location and navigation) Remote Sensing.
Presented by Joseph K. Berry Adjunct Faculty in Geosciences, Department of Geography, University of Denver Adjunct Faculty in Natural Resources, Warner.
Special Topics in Geo-Business Data Analysis Week 3 Covering Topic 6 Spatial Interpolation.
Zakaria A. Khamis GE 2110 GEOGRAPHICAL STATISTICS GE 2110.
Geo479/579: Geostatistics Ch10. Global Estimation.
Grid-based Map Analysis Techniques and Modeling Workshop Part 1 – Maps as Data Part 2– Surface Modeling Part 3 – Spatial Data Mining Linking geographic.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
Part 3) Spatial Statistics. Spatial Statistics involves quantitative analysis of the “numerical context” of mapped data, such as characterizing the geographic.
TruVue LLC Visual Decision Support Tools TruVue provides location-based solutions to the healthcare industry for facility and physician network optimization.
INTERPOLATION Procedure to predict values of attributes at unsampled points within the region sampled Why?Examples: -Can not measure all locations: - temperature.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Spatial Analysis Longley et al..
Tabulations and Statistics
Special Topics in Geo-Business Data Analysis
Interpolation & Contour Maps
Spatial interpolation
Interpolating Surfaces
Presentation transcript:

Title: Spatial Data Mining in Geo-Business

Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the generation of a customer density surface  Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic distributions of mapped data  Interpolating Spatial Distributions — discusses the basic concepts underlying spatial interpolation  Interpreting Interpolation Results — describes the use of “residual analysis” for evaluating spatial interpolation performance  Characterizing Data Groups — describes the use of “data distance” to derive similarity among the data patterns in a set of map layers  Identifying Data Zones — describes the use of “level- slicing” for classifying locations with a specified data pattern (data zones)  Mapping Data Clusters — describes the use of “clustering” to identify inherent groupings of similar data patterns  Mapping the Future — describes the use of “linear regression” to develop prediction equations relating dependent and independent map variables  Mapping Potential Sales — describes an extensive geo- business application that combines retail competition analysis and product sales prediction Paper available online at

Classified Density Levels Classify Density Map Density Surface Totals Density Surface Analysis Counts the number of customers (points) within in each grid cell Customer Street Address Customer GIS Location Customer Counts (# per cell) Geo-CodingVector to Raster 2D grid display of customer counts Roving Window Calculates the total number of customers within a roving window– customer density 2D perspective display of density contours 3D surface plot 91

Identifying Pockets of High Density Customer Density (Map Surface) Customer Density (Non-spatial Statistics) Unusually High = Mean + 1 Standard Deviation

Grid-based Analysis Frame (Keystone Concept) Customer Database (non-spatial) …appends Lat, Lon, Column, Row location to customer records …GeoCoding plots customers address on the streets map Vector (point) Raster (cell) Analysis Frame …V to R Conversion plots customers location in the analysis frame (grid) Latitude, Longitude, C, R Customer Database (spatial)

Point Samples Surface Modeling (Spatial Interpolation) Surface Map “Spikes ‘n Blanket” Avg = “Spikes” 66.3 …“maps the variance” by using geographic position to help explain the differences in the sample values.

IDW Interpolation (Inverse Distanced Weighted) 5) Move window to next grid location and repeat 2) Calculate distance from location to data points— Pythagorean Theorem #11 distance = #14 distance = #15 distance = 6.32 #16 distance = ) Weight-average values in the window based on distance to grid location— (1/Distance) 2 * Value “closer has more influence” X #11 #14 #15 #16 Sampled Data 1) Identify data points in window— #11 value = 56.9 #14 value = 22.5 #15 value = 52.3 #16 value = 66.3 #16 #15 #14 #11 x X ) Assign weight-averaged value— 53.35

Average vs. IDW Interpolated Surface Average IDW Surface Reds Avg>IDW Greens Avg<IDW Min = Max = 29.5 Difference Surface (IDW – Average) IDW - Average

IDW vs. Krig Interpolated Surfaces Krig Surface IDW Surface Min = Max = 5.0 Difference Surface (IDW – Krig) Reds Krig>IDW Greens Krig<IDW IDW - Krig

Assessing Relationships Among Maps Housing Density Home Value Home Age (Units/ac) ($K) (Years) South has Lower Density South has Higher Values South has Newer Homes

Geographic Space  Data Space Density Value Age Geographic Space – relative spatial position of measurements Point #1 Point #2 Data Space – relative numerical magnitude of measurements Comparison Point #1 D= Low (2.4 units/ac) V= High ($407,000) A= Low (18.3 years) Least Similar Point #2 D= High (4.8 units/ac) V= Low ($190,000) A= High (51.2 years) Data Similarity is inversely proportional to Data Distance …as data distance increases, the map values for two locations are less similar

Assessing Map Similarity “Data Distance” determines similarity among data patterns …the farthest away point in data space (least similar) is set 0 and the comparison point is set to 100 — Data Space Percent Similar Least similar point Comparison point Least Similar Point = 4.8, 190, 51.2 Comparison Point = 2.4, 407, 18.3 …all other Data Distances are scaled in terms of their relative similarity as “percent similar” to the comparison point (0 to 100) Geographic Space

Identifying Data Patterns of Interest Housing Density Geographic Space Data Space Geographic Space Mean = StDev = 0.80 Level Min = 4.36 Unusually High 67.2 = -StDev = Level Max = Mean Home Value Unusually Low

Level-Slicing Classifier (two variables) Data Space Unusually High Housing Density Unusually Low Home Value Unusually High Density and Low Value Geographic Space

Level-Slicing Classifier (three variables) …common “data zones” can be mapped by identifying specific levels of each mapped variable then adding the binary maps Geographic Space …locates combinations of selected measurements (high D, low V, high A) = 7 (high D, low V but not high A) = 3 Data Space …identifies combinations of selected measurements (high D, low V, high A)

Spatial Data Clustering … “data clusters” are identified as groups of neighboring data points in Data Space, and then mapped as corresponding grid cells in Geographic Space Geographic Space …maps common data patterns (clusters) Relatively high D, low V and high A Relatively low D, high V and low A Three Clusters Four Clusters Two Clusters Data Space …plots and identifies groups of similar data values

Spatial Regression (prediction equation) Low High Low High Housing Density Home Value Home Age Loan Concentration …relationship between Loan Concentration and independent variables housing Density, Value and Age Loan Concentration vs. Housing Density Y = * X density [R 2 = 40%] V Loan Concentration vs. Home Value Y = * X value [R 2 = 46%] V Loan Concentration vs. Home Age Y = * X age [R 2 = 23%] V

Competition Analysis (Spatial Analysis Steps) Build travel time maps for entire market area Compute travel time from every location to our store This requires grid-based map analysis software Update customer record with travel time to our store Add this to every non-customer record in trading area Step 1 Repeat for every competitor Update every customer record with travel time to competitor store Add to every non-customer record in trading area Step 2 Compute Travel Time Gain for travel to main store Every customer and non-customer record is updated The greater gain indicates lower travel effort to visit our store Step 3

Predictive Modeling (Spatial Statistics Steps) Build analytic dataset from customer data Geocoding information Transactions, sales, product category purchases Visitation frequency, recency, spend Customer Segment, travel times, demographics Step 4 Build predictive models Probability of Visitation (not possible for this demo) Probability of Purchase by Product Category Expected Sales and Transactions Use store travel time and all competitive differences Step 5 Map the scores The distribution of the scores provide visual evidence of the effects of travel time and competitive pressure Spatial hypotheses can be tested and evaluated Step 6

Map Analysis Framework Mapping and Geo-query While discrete sets of points, lines and polygons have served our mapping demands for over 8,000 years and keep us from getting lost… …the expression of mapped data as continuous spatial distributions (surfaces) provides a new foothold for the contextual and numerical analysis of mapped data— “Thinking with Maps”

References  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the generation of a customer density surface  Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic distributions of mapped data  Interpolating Spatial Distributions — discusses the basic concepts underlying spatial interpolation  Interpreting Interpolation Results — describes the use of “residual analysis” for evaluating spatial interpolation performance  Characterizing Data Groups — describes the use of “data distance” to derive similarity among the data patterns in a set of map layers  Identifying Data Zones — describes the use of “level- slicing” for classifying locations with a specified data pattern (data zones)  Mapping Data Clusters — describes the use of “clustering” to identify inherent groupings of similar data patterns  Mapping the Future — describes the use of “linear regression” to develop prediction equations relating dependent and independent map variables  Mapping Potential Sales — describes an extensive geo- business application that combines retail competition analysis and product sales prediction Paper available online at

…to download this PowerPoint slide set

Spatial Data Mining in Geo-Business Weighted Average Calculations for Inverse Distance Weighting (IDW) Spatial Interpolation Technique

Evaluating Interpolation Performance … Residual Analysis is used to evaluate interpolation performance (Krig at.03 Normalized Error is best) AverageIDW Krig