Spatial Data Mining hari agung
What is Spatial Data? The data related to objects that occupy space traffic, bird habitats, global climate, logistics, ... Object types: Points, Lines, Polygons,etc. Used in/for: GIS - Geographic Information Systems Meteorology Astronomy Environmental studies, etc.
Why do we need Data Mining? Large number of records(cases) (108-1012 bytes) One thousand (103) bytes = 1 kilobyte (KB) One million (106) bytes = 1 megabyte (MB) One billion (109) bytes = 1 gigabyte (GB) One trillion (1012) bytes = 1 terabyte (TB) High dimensional data (variables) 10-104 attributes Only a small portion, typically 5% to 10%, of the collected data is ever analyzed We are drowning in data, but starving for knowledge!
Spatial Data Mining Spatial Patterns Primary Tasks Spatial outliers Location prediction Associations, co-locations Hotspots, Clustering, trends, … Primary Tasks Mining Spatial Association Rules Spatial Classification and Prediction Spatial Data Clustering Analysis Spatial Outlier Analysis
Spatial Classification Use spatial information at different (coarse/fine) levels (different indexing trees) for data focusing Determine relevant spatial or non-spatial features Perform normal supervised learning algorithms e.g., Decision trees,
Spatial Clustering Use tree structures to index spatial data DBSCAN: R-tree CLIQUE: Grid or Quad tree Clustering with spatial constraints (obstacles need to adjust notion of distance)
Spatial Association Rules Spatial objects are of major interest, not transactions A B A, B can be either spatial or non-spatial (3 combinations) What is the fourth combination? Association rules can be found w.r.t. the 3 types Pp 234-235
Spatial Data Mining Results Understanding spatial data, discovering relationships between spatial and nonspatial data, construction of spatial knowledge bases, etc. In various forms The description of the general weather patterns in a set of geographic regions is a spatial characteristic rule. The comparison of two weather patterns in two geographic regions is a spatial discriminant rule. A rule like “most cities in Canada are close to the Canada-US border” is a spatial association rule near(x,coast) ^ southeast(x, USA) ) hurricane(x), (70%) Others: spatial clusters,…
Basic Concepts (1) Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography, meteorology, etc. The main difference (Spatial autocorrelation) the neighbors of a spatial object may have an influence on it and therefore have to be considered as well Spatial attributes Topological adjacency or inclusion information Geometric position (longitude/latitude), area, perimeter, boundary polygon
Basic Concepts (2) Spatial neighborhood Topological relation “intersect”, “overlap”, “disjoint”, … distance relation “close_to”, “far_away”,… direction/orientation relation “left_of”, “west_of”,… Global model might be inconsistent with regional models Global Model Local Model
Applications NASA Earth Observing System (EOS): Earth science data National Inst. of Justice: crime mapping Census Bureau, Dept. of Commerce: census data Dept. of Transportation (DOT): traffic data National Inst. of Health(NIH): cancer clusters
Example: What Kind of Houses Are Highly Valued Example: What Kind of Houses Are Highly Valued?—Associative Classification
Data SOM Application for DataMining Downscaling Weather Forecasts ERA-15 using a T106L31 model (from 1978 to 1994) with 1.125◦ resolution Terabytes Comprises data from approx. 20 variables (such as temperature,humidity, pressure, etc.) at 30 pressure levels of a 360x360 nodes grid 6 SOM Application for DataMining Downscaling Weather Forecasts Adaptive Competitive Learning Sub-grid details scape from numerical models
Dept. of Applied Mathematics Universidad de Cantabria Santander, Spain
And now discussion