Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.

Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department of Geography, University of Denver Basic Descriptive Statistics and its GIS Expression: Normalizing maps; Mapping spatial dependency Linking Numeric and Geographic Patterns: Map comparison; Similarity maps; Clustering mapped data; Investigating map correlation; Developing prediction models; Assessing prediction results

Kicking at the Finish (Waning Class Moments) The last of the “Learning Opportunities” that remain are… Exercise #9 on Spatial Data Mining (or paper) for 50 points Exercise #9 on Spatial Data Mining (or paper) for 50 points Exam #2 on Surface Modeling, Spatial Data Mining and Future Directions Exam #2 on Surface Modeling, Spatial Data Mining and Future Directions material for 150 points material for 150 points Optional Exercises for up to 50 extra credit points (can only improve your grade) Optional Exercises for up to 50 extra credit points (can only improve your grade) Special, special offer provided you fully participate in the study question “group study” you can choose not to take the second exam— Fine print: I will simply allocate the points for the exam according to the current percentage of all of your graded materials which means not taking the exam has no effect on your grade. If you choose to take the exam and get a grade below your current percentage of all graded materials, the exam grade will be ignored …therefore taking the exam can only improve your grade. 2 nd Exam Study Questions …posted Monday 2/27 by 12:00noon. Class initiative to “group study” to collectively address the 24 study questions (complete by 5:00pm Thursday March 8) 2 nd Exam Study Questions …posted Monday 2/27 by 12:00noon. Class initiative to “group study” to collectively address the 24 study questions (complete by 5:00pm Thursday March 8) Midterm Exam …you will download and take the 2-hour exam online (honor system) sometime between 10:00 am Friday March 9 and 5:00 pm Tuesday March 13 Midterm Exam …you will download and take the 2-hour exam online (honor system) sometime between 10:00 am Friday March 9 and 5:00 pm Tuesday March 13

Spatial Statistics Operations – Numerical Context Map Analysis Toolbox Grid Map Layers GIS and Map-ematical Perspectives (SS) Berry Basic Descriptive Statistics (Min, Max, Median, Mean, StDev, etc.) Basic Classification (Reclassify, Binary/Ranking/Rating Suitability) Unique Map Descriptive Statistics (Roving Window Summaries) Map Comparison (Joint Coincidence, Statistical Tests) Surface Modeling (Density Analysis, Spatial Interpolation) Advanced Classification (Map Similarity, Maximum Likelihood, Clustering) Predictive Statistics (Map Correlation/Regression, Data Mining Engines) Statistical Perspective: Surface Modeling (Density Analysis, Spatial Interpolation, Map Generalization) Spatial Data Mining (Descriptive, Predictive, Prescriptive) GIS Perspective:

Basic Concepts in Statistics (Standard Normal Curve) (Berry) See Beyond Mapping III, Topic 7, Linking Data Space and Geographic SpaceBeyond Mapping III, Topic 7 … “sorting/counting” individual data values

Basic Concepts in Statistics (SN_Curve Shape) Kurtosis …shape (positive= peaked; negative= flat) (Berry) See Beyond Mapping III, Topic 7, Linking Data Space and Geographic SpaceBeyond Mapping III, Topic 7

Basic Concepts in Statistics (SN_Curve Shape continued) …multi-modal …Skewness (positive= right; negative= left) (Berry) See Beyond Mapping III, Topic 7, Linking Data Space and Geographic SpaceBeyond Mapping III, Topic 7

…a Histogram depicts the numeric distribution (Mean/Central Tendency focus) …a Map depicts the geographic distribution (Variance/Variability focus) … Data Values link the two views— Click anywhere on the Map and the Histogram interval is highlighted Click on the Histogram interval and the Map locations are highlighted Linking Numeric & Geographic Distributions (Berry) (See Beyond Mapping III, “Topic 7” for more information) Beyond Mapping IIIBeyond Mapping III …simply different ways to organize and analyze “mapped data” (x,y= Where and z= What)

See www.innovativegis.com/basis/Download/IJRSpaper/ www.innovativegis.com/basis/Download/IJRSpaper/ An Analytic Framework for GIS Modeling (Berry) (Last week) Surface Modelling operations involve creating continuous spatial distributions from point sampled data (univariate). (This week) Spatial Data Mining operations involve characterizing numerical patterns and relationships among mapped data (multivariate).

Preprocessing Mapped Data (Preprocessing Types 1-3) (Berry) Calibration 1 — “tweaking” the values… sort of like a slight turn on a bathroom scale to alter the reading to what you know is your ‘true weight’ Calibration 1 — “tweaking” the values… sort of like a slight turn on a bathroom scale to alter the reading to what you know is your ‘true weight’ Translation 2 — converts map Translation 2 — converts map values into appropriate units for analysis, such as feet into meters or bushels per acre (measure of volume) into tons per hectare (measure of mass) Adjustment/Correction 3 — Adjustment/Correction 3 — dramatically changes the data, such as post processing GPS coordinates and/or Mass Flow Lag adjustment Preprocessing involves conversion of raw data into consistent values that accurately represent mapped conditions (4 types of preprocessing) Antenna Offset GPS Fix Delay Overlap and Multiple Passes Mass Flow Lag and Mixing … “trolling” for data

Normalizing Mapped Data (4 th type of preprocessing) Normalization — involves standardization of a data set, usually for comparison among different types of data… Goal … Norm_GOAL = (mapValue / 250 ) * 100 Goal … Norm_GOAL = (mapValue / 250 ) * 100 0-100 … Norm_0-100 = ((mapValue – min) * 100) / (max – min) + 0 0-100 … Norm_0-100 = ((mapValue – min) * 100) / (max – min) + 0 SNV … Norm_SNV = ((mapValue - mean) / stdev) * 100 SNV … Norm_SNV = ((mapValue - mean) / stdev) * 100 (Berry) Key Concept Note: the generalized rescaling equation is… Normalize a data set to a fixed range of R min to R max = (((X-D min ) * (R max – R min )) / (D max – D min )) + R min …where Rmin and Rmax is the minimum and maximum values for the rescaled range, Dmin and Dmax is the minimum and maximum values for the input data and X is any value in the data set to be rescaled. See Beyond Mapping III, Topic 18, Understanding Grid-based DataBeyond Mapping III, Topic 18 Since normalization involves scalar mathematics (constants), the pattern of the numeric distribution (histogram) and the spatial distribution (map) do not change …same relative distributions …same relative distributions Norm_GOAL = (Yield_Vol / 250 ) * 100 Norm_GOAL = (Yield_Vol / 250 ) * 100 …generates a standardized map based on a yield goal of 250 bushels/acre. This map can be used in analysis with other goal-normalized maps, even from different crops “apples and oranges to mixed fruit scale”

Proximity Stratification …proximity to high yield …unusually high yield > Average + 1Stdev …Yield map (Berry) …proximity to field edge …Proximity map identifies the distance from point, line or polygon features to all other locations Edge effects “Sweet Spot” (interior) …Stratification partitions the data (numeric) or the project area (spatial) into logical groups— “High Yield” vicinity Far : Close

Summarizing Map Regions (template/data) (Berry) Soil Type VeVdCBIBBIATuCHvBPavg15.012.811.214.610.511.3 …average phosphorous level for each soil type …creates a map summarizing values from a data map (Phosphorous levels) that coincide with the categories of a template map (Soil types) or stratification partitioningBIB Phosphorous levels Soil Types …average P-level for each soil unit (clump first before COMPOSITE) (clump first before COMPOSITE) Overall BIA Pavg = 14.6 15.5 13.6 8.6 Individual BIA clumps

Data Analysis (establishing relationships) (Berry) On-farm studies, such as seed hybrid performance, can be conducted using actual farm conditions… …management action recommendations are based on local relationships instead of Experiment Station research hundreds of miles away …is radically changing research and management practices in agriculture and numerous other fields from business to epidemiology and natural resources

Comparing Discrete Maps (Multivariate analysis) Spatial Precision (Where — boundaries) (Where — boundaries) of Points, Lines and Areas (polygons) is a primary concern of GIS, but we are often less concerned with Thematic Accuracy (What — map values) (Berry) Thematic Categorization …we often represent continuous spatial data (map surfaces) as a set of discrete polygons Which classified map is correct? How similar are the three maps? High Medium Low See Beyond Mapping III, Topic 10, Analyzing Map Similarity and ZoningBeyond Mapping III, Topic

Comparing Discrete Maps (Berry) Two ways to compare Discrete Maps… Coincidence Summary Coincidence Summary Proximal Alignment Proximal Alignment … Coincidence Summary generates a cross-tabular listing of the intersection of two maps. Table Interpretation Diagonal (Same) Diagonal (Same) Off-diagonal (Above/Below) Off-diagonal (Above/Below) Percentages (% Same) Percentages (% Same) Overall Percentage ((631+297+693)/1950)*100= 83% ((631+297+693)/1950)*100= 83% ((475+297+563)/1950)*100= 68% ((475+297+563)/1950)*100= 68% Raster versus Vector 693 See Beyond Mapping III, Topic 10, Analyzing Map Similarity and ZoningBeyond Mapping III, Topic

Comparing Discrete Maps (Coincident Summary) (Berry) See Beyond Mapping III, Topic 10, Analyzing Map Similarity and ZoningBeyond Mapping III, Topic Two ways to compare Discrete Maps… Coincidence Summary Coincidence Summary Proximal Alignment Proximal Alignment … Coincidence Summary generates a cross-tabular listing of the intersection of two maps. Table Interpretation Diagonal (Same) Diagonal (Same) Off-diagonal (Above/Below) Off-diagonal (Above/Below) Percentages (% Same) Percentages (% Same) Overall Percentage ((631+297+693)/1950)*100= 83% ((631+297+693)/1950)*100= 83% ((475+297+563)/1950)*100= 68% ((475+297+563)/1950)*100= 68% Raster versus Vector Map2: Med-- 104 + 297 + 225 = 626; (297/626) *100= 47 percent matched Map3: Med-- 260 + 297 + 335= 912; (297/912) *100= 33 percent matched Map2 Map3 Map1 …helpful in answering Question 2 475 + 297 + 563 = 1335; (1335/1950) *100= 68 percent matched 631 + 297 + 693 = 1621; (1621/1950) *100= 83 percent matched

Comparing Discrete Maps (Proximal Alignment) Two ways to compare Discrete Maps… Coincident Summary Coincident Summary Proximal Alignment Proximal Alignment … Proximal Alignment isolates a category on one of the maps, generates its proximity, then identifies the proximity values that align with the same category on the other map. Table Interpretation Zeros (Agreement) Zeros (Agreement) Values (> Disagreement) Values (> Disagreement) PA Index (average) (Berry) Proximity_Map1_Category1 * Binary_Map3_Category1 …non-zero values identify changes and how far away See Beyond Mapping III, Topic 10, Analyzing Map Similarity and ZoningBeyond Mapping III, Topic 10

Comparing Map Surfaces (Statistical Tests) Three ways to compare Map Surfaces… Statistical Tests Statistical Tests Percent Difference Percent Difference Surface Configuration Surface Configuration (Berry) Box-and-whisker graphs See Beyond Mapping III, Topic 10, Analyzing Map Similarity and ZoningBeyond Mapping III, Topic 10 …must be quantitative isopleth data … Statistical Tests compare one set of cell values to that of another based on the differences in the distributions of the data— 1) data sets (partition or coincidence; continuous or sampled) 2) statistical procedure (t-Test, f-Test, etc.) Table 1

Comparing Map Surfaces (%Difference) (Berry) Three ways to compare Map Surfaces… Statistical Tests Statistical Tests Percent Difference Percent Difference Surface Configuration Surface Configuration See Beyond Mapping III, Topic 10, Analyzing Map Similarity and ZoningBeyond Mapping III, Topic 10 Question 3 … Percent Difference capitalizes on the spatial arrangement of the values by comparing the values at each map location— %Difference Map, %Difference Table Table 2

Comparing Map Surfaces (Surface Configuration) (Berry) Three ways to compare Map Surfaces… Statistical Tests Statistical Tests Percent Difference Percent Difference Surface Configuration Surface Configuration See Beyond Mapping III, Topic 10, Analyzing Map Similarity and ZoningBeyond Mapping III, Topic 10 …Surface Configuration capitalizes on the spatial arrangement of the values by comparing the localized trend in the values — Slope Map, Aspect Map, Surface Configuration Index Table 3

Spatial Dependency (Berry) See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16 Map Stack – relationships among maps are investigated by aligning grid maps with a common configuration… #cols/rows, cell size and geo-reference. Data Shishkebab – each map represents a variable, each grid space a case and each value a measurement with all of the rights, privileges, and responsibilities of non-spatial mathematical, numerical and statistical analysis Spatial Variable Dependence — what occurs at a location in geographic space is related to: the conditions of that variable at nearby locations, termed Spatial Autocorrelation (intra-variable dependence) the conditions of that variable at nearby locations, termed Spatial Autocorrelation (intra-variable dependence) Discrete Point Map Continuous Map Surface Surface Modeling the conditions of that variable at nearby locations, termed Spatial Autocorrelation (intra-variable dependence) the conditions of that variable at nearby locations, termed Spatial Autocorrelation (intra-variable dependence) the conditions of other variables at that location, termed Spatial Correlation (inter-variable dependence) the conditions of other variables at that location, termed Spatial Correlation (inter-variable dependence) Multivariate Spatial Data Mining

Visualizing Spatial Relationships (Berry) What spatial relationships do you see? Interpolated Spatial Distribution Phosphorous (P) …do relatively high levels of P often occur with high levels of K and N? …how often? …where? See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16

Identifying Unusually High Measurements …isolate areas with mean + 1 StDev (tail of normal curve) (Berry) See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16

Level Slicing …simply multiply the two maps to identify joint coincidence 1*1=1 coincidence (any 0 results in zero) (Berry) Question 4 See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16 2-dimensional data space  Box

Multivariate Data Space …sum of a binary progression (1, 2,4 8, 16, etc.) provides level slice solutions for many map layers (Berry) See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16 3-dimensional space  Cube (Parallel piped )

Calculating Data Distance …an n-dimensional plot depicts the multivariate distribution; the distance between points determines the relative similarity in data patterns …the closest floating ball is the least similar (largest data distance) from the comparison point (Berry) See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16

Identifying Map Similarity (Berry) The green tones indicate field locations with fairly similar P, K and N levels; red tones indicate dissimilar areas. …the relative data distance between the comparison point’s data pattern and those of all other map locations form a Similarity Index Question 5 See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16

Clustering Maps for Data Zones (Berry) …groups of “floating balls” in data space identify locations in the field with similar data patterns– data zones …a map stack is a spatially organized set of numbers …fertilization rates vary for the different clusters “on-the-fly” Variable Rate Application Cyber-Farmer, Circa 1992 Question 6 See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16

Assessing Clustering Results (Berry) …Clustering results can be roughly evaluated using basic statistics Average, Standard Deviation, Minimum and Maximum values within each cluster are calculated. Ideally the averages between the two clusters would be radically different and the standard deviations small—large difference between groups and small differences within groups. Standard Statistical Tests of two data sets Box and Whisker Plots to visualize differences See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16

How Clustering Works (IsoData algorithm) (Berry) 1) The scatter plot shows Height versus Weight data that might have been collected in your old geometry class 2) The data distance to each weight/height measurement pair is calculated and the point is assigned to the closest arbitrary cluster center 3) The average X,Y coordinates of the assigned students to each “working” cluster is calculated and used to reposition the cluster centers 4) Repeat data distances, cluster assignments and repositioning until no change in cluster membership (centers do not move) See Beyond Mapping III, Topic 7, Linking Data Space and Geographic SpaceBeyond Mapping III, Topic 7 Data Space

Map Correlation (How it works) …625 small data tables within 5 cell reach = 81map values for localized summary Roving Window =.562 localized Localized Correlation …where x = Elevation value and y = Slope value and n = number of value pairs =.432 map wide r = …one large data table with 25rows x 25 columns = 625 map values for map wide summary Elevation (Feet) Slope (Percent) Point- by-Point Y slope = 38% X elev = 2,063 feet Spatially Aggregated Correlation (Berry)

Spatially Aggregated Correlation r =.432 map wide Scalar Value – one value represents the overall non-spatial relationship between the two map surfaces Strong Positive Strong Negative Minimal Correlation Localized Correlation r =.562 Localized Strong Positive Strong Negative Map Variable – a continuous quantitative surface represents the localized spatial relationship between the two map surfaces Map Correlation (Aggregated and Localized results) (Berry)

An Analytic Framework for GIS Modeling (Berry) Spatial Data Mining operations involve characterizing numerical patterns and relationships among mapped data. See www.innovativegis.com/basis/Download/IJRSpaper/ www.innovativegis.com/basis/Download/IJRSpaper/

Regression (conceptual approach) (Berry) A line is “fitted” in data space that balances the data so the differences from the points to the line (residuals) for all the points are minimized and the sum of the differences is zero… …the equation of the regression line is used to predict the “Dependent” variable (Y axis) using one or more “Independent” variables (X axis)

Evaluating Prediction Maps (non-spatial) (Berry) Non-spatial …R-squared value looks at the deviations from the regression line; data patterns about the regression line

Map Variables (Berry) …from a set of existing or easily measured Independent Map variables The Dependent Map variable is the one that you want to predict… …derive from customer data Question 7 See Beyond Mapping III, Topic 28, Spatial Data Mining in Geo-BusinessBeyond Mapping III, Topic 28

Map Regression Results (Bivariate) (Berry) The “R-squared index” provides a general measure of how good the predictions ought to be— 40%, 46% indicates a moderately weak predictors; 23% indicates a very weak predictor (R-squared index = 100% indicates a perfect predictor; 0% indicates an equation with no predictive capabilities) See Beyond Mapping III, Topic 28, Spatial Data Mining in Geo-BusinessBeyond Mapping III, Topic 28 Scatter plots and regression equations relating Loan Density to three candidate driving variables (Housing Density, Value and Age) Loans= fn( Housing Density ) Loans= fn( Home value ) Loans= fn( Home Age ) Question 7 Creates the Loan Concentration map surface Question 8 Creates regression equation and R 2 index

Generating a Multivariate Regression (Berry) …a regression equation using all three independent map variables using multiple linear regression is used to generate a prediction map See Beyond Mapping III, Topic 28, Spatial Data Mining in Geo-BusinessBeyond Mapping III, Topic 28 Question 9

Evaluating Regression Results (multiple linear) (Berry) …a regression equation using all three independent map variables using multiple linear regression is used to generate a prediction map See Beyond Mapping III, Topic 28, Spatial Data Mining in Geo-BusinessBeyond Mapping III, Topic 28 Optional Question 9-1 …that is compared to the actual dependent variable data — Error Surface

Using the Error Map to Stratify (Berry) One way to improve the predictions, however, is to stratify the data set by breaking it into groups of similar characteristics …and then generating separate regressions …generate a different regression for each of the stratified areas– red, yellow and green …other stratification techniques include indigenous knowledge, level-slicing and clustering See Beyond Mapping III, Topic 28, Spatial Data Mining in Geo-BusinessBeyond Mapping III, Topic 28 Optional Question 9-2

Spatial Data Mining (The Big Picture) Mapped data that exhibits high spatial dependency create strong prediction functions. As in traditional statistical analysis, spatial relationships can be used to predict outcomes …the difference is that spatial statistics predicts where responses will be high or low …making sense out of a map stack (Berry) See Beyond Mapping III, Topic 16, Characterizing Patterns and RelationshipsBeyond Mapping III, Topic 16

An Analytic Framework for GIS Modeling (Berry) Spatial Data Mining operations involve characterizing numerical patterns and relationships among mapped data. See www.innovativegis.com/basis/Download/IJRSpaper/ www.innovativegis.com/basis/Download/IJRSpaper/

Prescriptive Mapping (Berry) Four primary types of applied spatial models:  Suitability— mapping preferences (e.g., Habitat and Routing)  Economic— mapping financial interactions (e.g., Combat Zone and Sales Propensity)  Physical— mapping landscape interactions (e.g., Terrain Analysis and Sediment Loading)  Mathematical/Statistical— mapping numerical relationships… ― Descriptive math/stat models summarize existing mapped data (e.g., Standard Normal Variable Map for Unusual Conditions and Clustering for Data Zones) ― Predictive math/stat models develop equations relating mapped data (e.g., Map Regression for Equity Loan Prediction and Probability of Product Sales ) ― Prescriptive math/stat models identify management actions based on descriptive/predictive relationships (e.g., Retail Marketing and Precision Ag)… Discrete Actions: If Then Discrete Actions: If Then If P is 0-4 ppm, then apply 50 lbs P 2 O 5 /Acre If P is 4-8 ppm, then apply 18 lbs P 2 O 5 /Acre If P is 8-12 ppm, then apply 7 lbs P 2 O 5 /Acre If P is >12 ppm, then apply 0 lbs P 2 O 5 /Acre 0 50 012 more P P2O5/P2O5/P2O5/P2O5/ 50 18 7 0 Phosphorous (P) 0 50 012 more P P2O5/P2O5/P2O5/P2O5/ Continuous Actions: Equation defining action(s) Negative linear equation of the form: Negative linear equation of the form: y = aX Negative exponential equation of the form: Negative exponential equation of the form: y = e -x

Grid-Based Map Analysis Spatial Data Mining investigates the “numerical” relationships in mapped data… Descriptive — summary statistics, comparison, classification (e.g., clustering) Descriptive — summary statistics, comparison, classification (e.g., clustering) Predictive — math/stat relationships among map layers (e.g., regression) Predictive — math/stat relationships among map layers (e.g., regression) Prescriptive — appropriate actions (e.g., optimization) Prescriptive — appropriate actions (e.g., optimization) Surface Modeling maps the “spatial distribution” of point data… Density Analysis — count/sum of points within a local window Density Analysis — count/sum of points within a local window Spatial Interpolation — weighted average of points within a local window Spatial Interpolation — weighted average of points within a local window Map Generalization— fits mathematical relationship to all of the point data Map Generalization— fits mathematical relationship to all of the point data Spatial Analysis investigates the “contextual” relationships in mapped data… Reclassify — reassigning map values (position; value; size, shape; contiguity) Reclassify — reassigning map values (position; value; size, shape; contiguity) Overlay — map overlay (point-by-point; region-wide) Overlay — map overlay (point-by-point; region-wide) Distance — proximity and connectivity (movement; optimal paths; visibility) Distance — proximity and connectivity (movement; optimal paths; visibility) Neighbors — ”roving windows” (slope/aspect; diversity; anomaly) Neighbors — ”roving windows” (slope/aspect; diversity; anomaly) (Berry)

Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.

Similar presentations

Presentation on theme: "Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department.

Similar presentations

Presentation on theme: "Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver Presented by Joseph K. Berry W. M. Keck Scholar, Department."— Presentation transcript:

Similar presentations

About project

Feedback