Workshop on GIS Modeling (Part 3)

Slides:



Advertisements
Similar presentations
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Advertisements

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Basic geostatistics Austin Troy.
Introduction to GIS Modeling Week 7 — GIS Modeling Examples GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.
Correlation and Autocorrelation
Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.
Applications in GIS (Kriging Interpolation)
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Relationships Among Variables
Raster Data Analysis Chapter 11. Introduction  Regular grid  Value in each cell corresponds to characteristic  Operations on individual, group, or.
Accuracy Assessment. 2 Because it is not practical to test every pixel in the classification image, a representative sample of reference points in the.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
DU GIS Modeling -- Surface Modeling/Analysis
Spatial Data Mining Practical Approaches for Analyzing Relationships Within and Among Maps Berry & Associates // Spatial Information Systems 2000 S. College.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Part 3) Spatial Statistics. Spatial Statistics involves quantitative analysis of the “numerical context” of mapped data, such as characterizing the geographic.
Analyzing Precision Ag Data
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Part 3) Spatial Statistics. Spatial Statistics involves quantitative analysis of the “numerical context” of mapped data, such as characterizing the geographic.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Traditional Statistics Mean, StDev (Normal Curve) Mean, StDev (Normal Curve) Central Tendency Central Tendency Typical Response (scalar) Typical Response.
Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.
Intro to Raster GIS GTECH361 Lecture 11. CELL ROW COLUMN.
Spatial Statistics Operations Spatial Analysis Operations Reclassify and Overlay Distance and Neighbors GISer’s Perspective: Surface Modeling Spatial Data.
Introduction to GIS Modeling Week 5 — Summarizing Neighborhoods GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.
Examining Relationships in Quantitative Research
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
An example application in GIS Modeling Presentation and hands-on exercise materials prepared by Joseph K. Berry Keck Scholar in Geosciences, University.
Figure 2-1. Two different renderings (categorizations) of corn yield data. Analyzing Precision Ag Data – text figures © 2002, Joseph K. Berry—permission.
Analyzing Precision Ag Data : Intermediate workshop on what is needed to move Precision Agriculture beyond mapping Joseph K. Berry W. M. Keck Visiting.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
NR 143 Study Overview: part 1 By Austin Troy University of Vermont Using GIS-- Introduction to GIS.
Introduction to GIS Modeling Week 7 — GIS Modeling Examples GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.
Grid-based Map Analysis Techniques and Modeling Workshop
An Analytic Framework for GIS Modeling (Berry) The Analysis Frame provides consistent “parceling” needed for map analysis and extends discrete point,
Geotechnology Geotechnology – one of three “mega-technologies” for the 21 st Century Global Positioning System (Location and navigation) Remote Sensing.
Presented by Joseph K. Berry Adjunct Faculty in Geosciences, Department of Geography, University of Denver Adjunct Faculty in Natural Resources, Warner.
Agronomic Spatial Variability and Resolution What is it? How do we describe it? What does it imply for precision management?
Special Topics in Geo-Business Data Analysis Week 3 Covering Topic 6 Spatial Interpolation.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Grid-based Map Analysis Techniques and Modeling Workshop Part 1 – Maps as Data Part 2– Surface Modeling Part 3 – Spatial Data Mining Linking geographic.
Part 3) Spatial Statistics. Spatial Statistics involves quantitative analysis of the “numerical context” of mapped data, such as characterizing the geographic.
Central Tendency  Key Learnings: Statistics is a branch of mathematics that involves collecting, organizing, interpreting, and making predictions from.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
DU Mini-Workshops on GIS Modeling -- Surface Modeling/Analysis
Spatial statistics: Spatial Autocorrelation
MATH-138 Elementary Statistics
DU GIS Modeling -- Surface Modeling/Analysis
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Forest Availability and Accessibility
Lecture 4 Statistical analysis
Basic Statistical Terms
Tabulations and Statistics
Keller: Stats for Mgmt & Econ, 7th Ed
Special Topics in Geo-Business Data Analysis
Image Information Extraction
Spatial interpolation
Descriptive and Inferential
MIS2502: Data Analytics Clustering and Segmentation
Product moment correlation
An Introduction to Correlational Research
15.1 The Role of Statistics in the Research Process
Making Use of Associations Tests
Data exploration and visualization
Forecasting Plays an important role in many industries
Presentation transcript:

Workshop on GIS Modeling (Part 3) Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department of Geography, University of Denver Basic Descriptive Statistics and its GIS Expression: Normalizing maps; Mapping spatial dependency Linking Numeric and Geographic Patterns: Map comparison; Similarity maps; Clustering mapped data; Investigating map correlation; Developing prediction models; Assessing prediction results Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Kicking for the Finish Workshop on GIS Modeling (Part 3) Exercise #9 — to tailor your work to your interests, you can choose to not complete this standard exercise, however in lieu of the exercise you will submit a short paper (4-8 pages) on a GIS modeling topic of your choice. Due 12:00 midnight, Thursday, March 10th. Optional Exercises — you can turn in these exercises for extra credit anytime before 5:00 pm, Tuesday, March 15th. Final Exam Study Questions — covering weeks 7-10, Spatial Statistics and Future Directions; the exam is optional and can only improve your grade. You are encouraged to study together and exchange insights about answering the questions. …at least three-fourths of the exam will be taken directly (verbatim) from the list of study questions. The format will be similar to the last exam, with questions from Terminology, Procedures and Basic Concepts, How Things Work and Mini-Exercises. …study questions for Exam 2 are posted on the class website now; send me an email if a question needs further explanation and I will post the clarification. Final Exam — Covers material from weeks 7 (GIS Modeling), 8 (Surface Modeling), 9 (Spatial Data Mining) and 10 (Future Directions) …exam posted on the class website by 10:00 am, Thursday, March 10th and must be completed by 8:00 am, Monday, March 14th. … at the end of the last class I will be handing out a CD with all of the class material— sort of a “graduation present” that will keep you GIS-ing for years (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Concepts in GIS -- Topic #1 An Analytic Framework for GIS Modeling See www.innovativegis.com/basis/Download/IJRSpaper/ (Last week) Surface Modelling operations involve creating continuous spatial distributions from point sampled data (univariate). (This week) Spatial Data Mining operations involve characterizing numerical patterns and relationships among mapped data (multivariate). (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Basic Concepts in Statistics (Standard Normal Curve) Workshop on GIS Modeling (Part 3) Basic Concepts in Statistics (Standard Normal Curve) See Beyond Mapping III , Topic 7, Linking Data Space and Geographic Space (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Basic Concepts in Statistics (SN_Curve Shape) Workshop on GIS Modeling (Part 3) Basic Concepts in Statistics (SN_Curve Shape) Kurtosis …shape (positive= peaked; negative= flat) See Beyond Mapping III , Topic 7, Linking Data Space and Geographic Space (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Basic Concepts in Statistics (SN_Curve Shape continued) Workshop on GIS Modeling (Part 3) Basic Concepts in Statistics (SN_Curve Shape continued) …Multi-modal …Skewness (positive= right; negative= left) See Beyond Mapping III , Topic 7, Linking Data Space and Geographic Space (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Preprocessing Mapped Data (Preprocessing Types 1-3) Preprocessing involves conversion of raw data into consistent units that accurately represent mapped conditions (4 considerations) Calibration 1 — “tweaking” the values… sort of like a slight turn on a bathroom scale to alter the reading to what you know is your ‘true weight’ Translation 2 — converts map values into appropriate units for analysis, such as feet into meters or bushels per acre (measure of volume) into tons per hectare (measure of mass) Adjustment/Correction 3 — dramatically changes the data, such as post processing GPS coordinates and/or Mass Flow Lag adjustment Antenna Offset GPS Fix Delay Overlap and Multiple Passes Mass Flow Lag and Mixing (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Normalizing Mapped Data (4th type of preprocessing) Workshop on GIS Modeling (Part 3) Normalization — involves standardization of a data set, usually for comparison among different types of data… Goal …Norm_GOAL = (mapValue / 250 ) * 100 0-100 …Norm_0-100 = ((mapValue – min) * 100) / (max – min) + 0 SNV …Norm_SNV = ((mapValue - mean) / stdev) * 100 Since normalization involves scalar mathematics (constants), the pattern of the numeric distribution (histogram) and the spatial distribution (map) do not change …same relative distributions Norm_GOAL = (Yield_Vol / 250 ) * 100 …generates a standardized map based on a yield goal of 250 bushels/acre. This map can be used in analysis with other goal-normalized maps, even from different crops “apples and oranges to mixed fruit scale” Key Concept Note: the generalized rescaling equation is…   Normalize a data set to a fixed range of Rmin to Rmax = (((X-Dmin) * (Rmax – Rmin)) / (Dmax – Dmin)) + Rmin …where Rmin and Rmax is the minimum and maximum values for the rescaled range, Dmin and Dmax is the minimum and maximum values for the input data and X is any value in the data set to be rescaled. See Beyond Mapping III , Topic 18, Understanding Grid-based Data (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Assessing Localized Variation Workshop on GIS Modeling (Part 3) Question 1 – Visual Map Analysis (Spatial and Numeric distributions) The “Scan” operation moves a window around the yield map and calculates the Coefficient of Variation with a 2-cell radius of each location Scan Yield_Volume Coffvar Within 2 For Yield_Coffvar Where, Coffvar = Stdev/mean *100 …higher values indicate areas with more localized variability CoffVar= (StDev/Mean) * 100 (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Data Proximity/Buffer Stratification …proximity to field edge …Proximity map identifies the distance from point, line or polygon features to all other locations Edge effects “Sweet Spot” (interior) …Stratification partitions the data (numeric) or the project area (spatial) into logical groups— …Yield map …unusually high yield > Average + 1Stdev …proximity to high yield areas “High Yield” vicinity Far : Close (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Summarizing Map Regions (template/data) …creates a map summarizing values from a data map (Phosphorous levels) that coincide with the categories of a template map (Soil types) BIB Phosphorous levels Soil Types Soil Type Ve VdC BIB BIA TuC HvB Pavg 15.0 12.8 11.2 14.6 10.5 11.3 …average phosphorous level for each soil type …average P-level for each soil unit (clump first before COMPOSITE) Overall BIA Pavg = 14.6 15.5 13.6 8.6 (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Discrete Maps (Multivariate analysis) High Medium Low Thematic Categorization …we often represent continuous spatial data (map surfaces) as a set of discrete polygons Which classified map is correct? How similar are the three maps? Spatial Precision (Where — boundaries) of Points, Lines and Areas (polygons) is a primary concern of GIS, but we are often less concerned with Thematic Accuracy (What — map values) See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Discrete Maps Workshop on GIS Modeling (Part 3) Two ways to compare Discrete Maps… Coincidence Summary Proximal Alignment …Coincidence Summary generates a cross-tabular listing of the intersection of two maps. Table Interpretation Diagonal (Same) Off-diagonal (Above/Below) Percentages (% Same) Overall Percentage ((631+297+693)/1950)*100= 83% ((475+297+563)/1950)*100= 68% Raster versus Vector 693 See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Discrete Maps Workshop on GIS Modeling (Part 3) Two ways to compare Discrete Maps… Coincidence Summary Proximal Alignment …Coincidence Summary generates a cross-tabular listing of the intersection of two maps. Table Interpretation Diagonal (Same) Off-diagonal (Above/Below) Percentages (% Same) Overall Percentage ((631+297+693)/1950)*100= 83% ((475+297+563)/1950)*100= 68% Raster versus Vector Question 2 Map2: Med-- 104 + 297 + 225 = 626; (297/626) *100= 47 percent matched 631 + 297 + 693 = 1621; (1621/1950) *100= 83 percent matched Map1 Map2 Map1 475 + 297 + 563 = 1335; (1335/1950) *100= 68 percent matched Map3: Med-- 260 + 297 + 335= 912; (297/912) *100= 36 percent matched Map3 See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Coincidence Table (idealized conditions) Workshop on GIS Modeling (Part 3) Coincidence Table (idealized conditions) 208 208/208 = 100% Low Med High 624/624 Total …PERFECT COINCIDENCE where all of the increasing ordinal steps are matched (diagonal) and there is no mismatches (off-diagonal). Overall coincidence is 624/624 = 100% found by the sum of the diagonal elements (matches); the other totals indicate percent agreement by category on each map Diagonal elements in the map comparison matrix identify agreement (matches) between two progressive ordinal maps Off-diagonal elements in the map comparison matrix identify disagreement (miss-matches) between two progressive ordinal maps 69 70 69/208 = 33% Low Med High 208/624 Total …EQUALLY BALANCED matches and mismatches where there is no pattern relationship Overall coincidence = 33% partially matched and mismatched 104 0/208 = 0% Low Med High 0/624 Total …ALL MISMATCHES where there is an opposite relationship Overall coincidence = 0% (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Discrete Maps Workshop on GIS Modeling (Part 3) Two ways to compare Discrete Maps… Coincident Summary Proximal Alignment Proximity_Map1_Category1 * Binary_Map3_Category1 …non-zero values identify changes and how far away …Proximal Alignment isolates a category on one of the maps, generates its proximity, then identifies the proximity values that align with the same category on the other map. Table Interpretation Zeros (Agreement) Values (> Disagreement) PA Index (average) See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Map Surfaces (Statistical Tests) Workshop on GIS Modeling (Part 3) Three ways to compare Map Surfaces… Statistical Tests Percent Difference Surface Configuration …Statistical Tests compare one set of cell values to that of another based on the differences in the distributions of the data— 1) data sets (partition or coincidence; continuous or sampled) 2) statistical procedure (t-Test, f-Test, etc.) Box-and-whisker graphs See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Map Surfaces (%Difference) Workshop on GIS Modeling (Part 3) Three ways to compare Map Surfaces… Statistical Tests Percent Difference Surface Configuration …Percent Difference capitalizes on the spatial arrangement of the values by comparing the values at each map location— %Difference Map, %Difference Table Question 3 See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Map Surfaces (Surface Configuration) Three ways to compare Map Surfaces… Statistical Tests Percent Difference Surface Configuration …Surface Configuration capitalizes on the spatial arrangement of the values by comparing the localized trend in the values — Slope Map, Aspect Map, Surface Configuration Index See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Comparing Map Surfaces (Temporal Difference) 1997_Yield_Volume - 1998_Yield_Volume = Yield_Diff Map Variables… map values within an analysis grid can be mathematically and statistically analyzed …green indicates areas of increased production …yellow indicates minimal change …red indicates decreased production (Berry) See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Data Analysis (establishing relationships) Workshop on GIS Modeling (Part 3) Data Analysis (establishing relationships) On-farm studies, such as seed hybrid performance, can be conducted using actual farm conditions… …management action recommendations are based on local relationships instead of Experiment Station research hundreds of miles away …is radically changing research and management practices in agriculture and numerous other fields from business to epidemiology and natural resources (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Spatial Dependency Spatial Variable Dependence — what occurs at a location in geographic space is related to: the conditions of that variable at nearby locations, termed Spatial Autocorrelation (intra-variable dependence) the conditions of other variables at that location, termed Spatial Correlation (inter-variable dependence) Map Stack– relationships among maps are investigated by aligning grid maps with a common configuration… #cols/rows, cell size and geo-reference. Data Shishkebab– each map represents a variable, each grid space a case and each value a measurement with all of the rights, privileges, and responsibilities of non-spatial mathematical , numerical and statistical analysis See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Visualizing Spatial Relationships Workshop on GIS Modeling (Part 3) Visualizing Spatial Relationships Interpolated Spatial Distribution Phosphorous (P) What spatial relationships do you see? …do relatively high levels of P often occur with high levels of K and N? …how often? …where? See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Identifying Unusually High Measurements …isolate areas with mean + 1 StDev (tail of normal curve) See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Level Slicing …simply multiply the two maps to identify joint coincidence 1*1=1 coincidence (any 0 results in zero) Question 4 See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Multivariate Data Space …sum of a binary progression (1, 2 ,4 8, 16, etc.) provides level slice solutions for many map layers See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Calculating Data Distance …an n-dimensional plot depicts the multivariate distribution; the distance between points determines the relative similarity in data patterns …the closest floating ball is the least similar (largest data distance) from the comparison point See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Identifying Map Similarity Question 5 …the relative data distance between the comparison point’s data pattern and those of all other map locations form a Similarity Index The green tones indicate field locations with fairly similar P, K and N levels; red tones indicate dissimilar areas. See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Clustering Maps for Data Zones Workshop on GIS Modeling (Part 3) Question 6 …groups of “floating balls” in data space identify locations in the field with similar data patterns– data zones …a map stack is a spatially organized set of numbers …fertilization rates vary for the different clusters “on-the-fly” Variable Rate Application Cyber-Farmer, Circa 1992 See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Assessing Clustering Results …Clustering results can be roughly evaluated using basic statistics Average, Standard Deviation, Minimum and Maximum values within each cluster are calculated. Ideally the averages between the two clusters would be radically different and the standard deviations small—large difference between groups and small differences within groups. Standard Statistical Tests of two data sets Box and Whisker Plots to visualize differences See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

How Clustering Works (IsoData algorithm) Workshop on GIS Modeling (Part 3) How Clustering Works (IsoData algorithm) The scatter plot shows Height versus Weight data that might have been collected in your old geometry class The data distance to each weight/height measurement pair is calculated and the point is assigned to the closest arbitrary cluster center The average X,Y coordinates of the assigned students is calculated and used to reposition the cluster centers Repeat data distances, cluster assignments and repositioning until no change in cluster membership (centers do not move) (Berry) See Beyond Mapping III , Topic 7, Linking Data Space and Geographic Space Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Spatial Data Mining (The Big Picture) Workshop on GIS Modeling (Part 3) Spatial Data Mining (The Big Picture) …making sense out of a map stack Mapped data that exhibits high spatial dependency create strong prediction functions. As in traditional statistical analysis, spatial relationships can be used to predict outcomes …the difference is that spatial statistics predicts where responses will be high or low See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Concepts in GIS -- Topic #1 An Analytic Framework for GIS Modeling This Week Spatial Data Mining operations involve characterizing numerical patterns and relationships among mapped data. See www.innovativegis.com/basis/Download/IJRSpaper/ (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Regression (conceptual approach) A line is “fitted” in data space that balances the data so the differences from the points to the line (residuals) for all the points are minimized and the sum of the differences is zero… …the equation of the regression line is used to predict the “Dependent” variable (Y axis) using one or more “Independent” variables (X axis) (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Evaluating Prediction Maps (non-spatial) Non-spatial …R-squared value looks at the deviations from the regression line; data patterns about the regression line (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Map Variables The Dependent Map variable is the one that you want to predict… …derive from customer data Question 7 …from a set of existing or easily measured Independent Map variables See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Map Regression Results (Bivariate) Scatter plots and regression equations relating Loan Density to three candidate driving variables (Housing Density, Value and Age) Loans= fn( Housing Density ) Loans= fn( Home value ) Loans= fn( Home Age ) Question 8 The “R-squared index” provides a general measure of how good the predictions ought to be— 40%, 46% indicates a moderately weak predictors; 23% indicates a very weak predictor (R-squared index = 100% indicates a perfect predictor; 0% indicates an equation with no predictive capabilities) See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Generating a Multivariate Regression …a regression equation using all three independent map variables using multiple linear regression is used to generate a prediction map Question 9 See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Evaluating Regression Results (multiple linear) Optional Question 9-1 …a regression equation using all three independent map variables using multiple linear regression is used to generate a prediction map …that is compared to the actual dependent variable data — Error Surface See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Using the Error Map to Stratify One way to improve the predictions, however, is to stratify the data set by breaking it into groups of similar characteristics …and then generating separate regressions …generate a different regression for each of the stratified areas– red, yellow and green …other stratification techniques include indigenous knowledge, level-slicing and clustering See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Concepts in GIS -- Topic #1 An Analytic Framework for GIS Modeling This Week Spatial Data Mining operations involve characterizing numerical patterns and relationships among mapped data. See www.innovativegis.com/basis/Download/IJRSpaper/ (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Prescriptive Mapping Four primary types of applied spatial models: Suitability— mapping preferences (e.g., Habitat and Routing) Economic— mapping financial interactions (e.g., Combat Zone and Sales Propensity) Physical— mapping landscape interactions (e.g., Terrain Analysis and Sediment Loading) Mathematical/Statistical— mapping numerical relationships… Descriptive math/stat models summarize existing mapped data (e.g., Standard Normal Variable Map for Unusual Conditions and Clustering for Data Zones) Predictive math/stat models develop equations relating mapped data (e.g., Map Regression for Equity Loan Prediction and Probability of Product Sales ) Prescriptive math/stat models identify management actions based on descriptive/predictive relationships (e.g., Retail Marketing and Precision Ag)… Phosphorous (P) Discrete Actions: If <condition(s)> Then <Action(s)> If P is 0-4 ppm, then apply 50 lbs P2O5/Acre If P is 4-8 ppm, then apply 30 lbs P2O5/Acre If P is 8-12 ppm, then apply 15 lbs P2O5/Acre If P is >12 ppm, then apply 0 lbs P2O5/Acre 50 12 more P P2O5/ 30 15 Continuous Actions: Equation defining action(s) Negative linear equation of the form: y = aX Negative exponential equation of the form: y = e-x 50 12 more P P2O5/ (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.

Workshop on GIS Modeling (Part 3) Grid-Based Map Analysis Spatial analysis investigates the “contextual” relationships in mapped data… Reclassify— reassigning map values (position; value; size, shape; contiguity) Overlay— map overlay (point-by-point; region-wide) Distance— proximity and connectivity (movement; optimal paths; visibility) Neighbors— ”roving windows” (slope/aspect; diversity; anomaly) ...Whew!!! Surface modeling maps the spatial distribution and pattern of point data… Density Analysis— count/sum of points within a local window Spatial Interpolation— weighted average of points within a local window Map Generalization— fits mathematical relationship to all of the point data Spatial data mining investigates the “numerical” relationships in mapped data… Descriptive— summary statistics, comparison, classification (e.g., clustering) Predictive— math/stat relationships among map layers (e.g., regression) Prescriptive— appropriate actions (e.g., optimization) (Berry) Joseph K. Berry, BA_SIS, Inc. All Rights Reserved.