Spatial Data Mining and Spatial Data Warehousing Special Topics In Database Sadra Abedinzadeh Ashkan Zarnani Farzad Peyravi.

Slides:



Advertisements
Similar presentations
1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.
Advertisements

OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Chapter 9. Mining Complex Types of Data
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Spatial Mining.
Spatial Data Mining CSE 6331, Fall 1999 Ajay Gupta
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Data Mining By Archana Ketkar.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Recommender systems Ram Akella November 26 th 2008.
Classification.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Data Warehouse & Data Mining
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Datawarehouse Objectives
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
Spatial Data Mining Ashkan Zarnani Sadra Abedinzadeh Farzad Peyravi.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Warehousing.
Data Mining and Decision Support
12 1 Database Systems: Design, Implementation, & Management, 6 th Edition, Rob & Coronel 12.4 Online Analytical Processing OLAP creates an advanced data.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Overview of Mining Spatial Data
Data Mining Functionalities
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
Chapter 13 Business Intelligence and Data Warehouses
Chapter 13 The Data Warehouse
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Data Mining Concept Description
Data Warehousing and Data Mining
Supporting End-User Access
Introduction of Week 9 Return assignment 5-2
Data Mining: Characterization
Presentation transcript:

Spatial Data Mining and Spatial Data Warehousing Special Topics In Database Sadra Abedinzadeh Ashkan Zarnani Farzad Peyravi

Outline F Motivation and General Description F Data Warehousing: Basic Concepts and Techniques F Spatial Data Warehousing and Spatial OLAP Techniques –Spatial Data Warehouse: Models and Construction –Spatial OLAP: Implementation and Application F Data Mining: Basic Concepts and Techniques F Spatial Data Mining –Mining Spatial Association Rules. –Spatial Classification and Prediction –Spatial Data Clustering Analysis F Conclusions and Future Research.

Motivation F Data warehousing: Integrating data from multiple sources into large warehouses and support on-line analytical processing and business decision making. F Data mining (knowledge discovery in databases): Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases. F Necessity: Data explosion problem --- computerized data collection tools and mature database technology lead to tremendous amounts of data stored in databases. F We are drowning in data, but starving for knowledge!

Data Warehousing F “ A data warehouse is a subject-oriented, integrated, time- variant, and nonvolatile collection of data in support of management’s decision-making process.” --- W. H. Inmon F A data warehouse is – A decision support database that is maintained separately from the organization’s operational databases. –It integrates data from multiple heterogeneous sources to support the continuing need for structured and /or ad- hoc queries, analytical reporting, and decision support.

Modeling Data Warehouses F Modeling data warehouses: dimensions & measurements – Star schema: A single object (fact table) in the middle connected to a number of objects (dimension tables) radially. – Snowflake schema: A refinement of star schema where the dimensional hierarchy is represented explicitly by normalizing the dimension tables. – Fact constellations: Multiple fact tables share dimension tables. F Storage of selected summary tables: – Independent summary table storing pre-aggregated data, e.g., total sales by product by year. –Encoding aggregated tuples in the same fact table and the same dimension tables.

Example of Star Schema Many Time Attributes Time Dimension Table Many Store Attributes Store Dimension Table Sales Fact Table Time_Key Product_Key Store_Key Location_Key unit_sales dollar_sales Yen_sales Measurements Many Product Attributes Product Dimension Table Many Location Attributes Location Dimension Table

Example of a Snowflake Schema Many Time Attributes Time Dimension Table Many Store Attributes Store Dimension Table Sales Fact Table Time_Key Product_Key Store_Key Location_Key unit_sales dollar_sales Yen_sales Measurements Supplier_Key Product Dimension Table Location_Key Location Dimension Table Product_Key Location_Key Country Region Supplier_Key

A Star-Net Query Model Shipping Method AIR-EXPRESS TRUCK ORDER Customer Orders CONTRACTS Customer Product PRODUCT GROUP PRODUCT LINE PRODUCT ITEM SALES PERSON DISTRICT DIVISION OrganizationPromotion DISTRICT REGION COUNTRY Geography DAILYQTRLYANNUALY Time

Construction of Data Cubes sum 0-20K20-40K60K-sum Comp_Metho d …... sum Database Amount Province Discipline 40-60K B.C. Prairies Ontario All Amount Comp_Method, B.C. l Each dimension contains a hierarchy of values for one attribute l A cube cell stores aggregate values, e.g., count, sum, max, etc. l A “sum” cell stores dimension summation values. l Sparse-cube technology and MOLAP/ROLAP integration. l “Chunk”-based multi-way aggregation and single-pass computation.

Efficient Data Cube Computation Methods F Data cube can be viewed as a lattice of cuboids –The bottom-most cuboid is the base cube. –The top most cuboid contains only one cell. F Materialization of data cube –Materialize every (cuboid), none, or some. –Algorithms for selection of which cuboids to materialize. u Based on size, sharing, and access frequency. F Efficient cube computation methods –ROLAP algorithms. – Array-based cubing algorithm. ABC AB AC BC AB C ALL AC

OLAP: On-Line Analytical Processing F A multidimensional, LOGICAL view of the data. F Interactive analysis of the data: drill, pivot, slice_dice, filter. F Summarization and aggregations at every dimension intersection. F Retrieval and display of data in 2-D or 3-D crosstabs, charts, and graphs, with easy pivoting of the axes. F Analytical modeling: deriving ratios, variance, etc. and involving measurements or numerical data across many dimensions. F Forecasting, trend analysis, and statistical analysis. F Requirement: Quick response to OLAP queries.

OLAP Architecture F Logical architecture: –OLAP view: multidimensional and logic presentation of the data in the data warehouse/mart to the business user. –Data store technology: The technology options of how and where the data is stored. F Three services components: –data store services –OLAP services, and –user presentation services. F Two data store architectures: – Multidimensional data store: (MOLAP). – Relational data store: Relational OLAP (ROLAP).

Spatial Data Warehouse and Spatial OLAP F Spatial Data Warehouse: Integrated, subject-oriented, time-variant, and nonvolatile spatial data repository for data analysis and decision making. F Spatial Data Integration: A big issue. F Spatial data cube: Multidimensional spatial database. –Non-spatial dimensions: time, product, organization hierarchies. –Spatial dimensions: formed by geo-spatial hierarchies. –Non-spatial (numerical) measurements: u Distributive, algebraic, holistic. –Spatial Measurements: u Collection of spatial object pointers which may require spatial merge, overlay, or other operations.

Example: Weather Pattern Analysis F Input: –a map with about 3,000 weather probes scattered in B.C. –daily data for temperature, precipitation, wind velocity, etc. –concept hierarchies for all attributes F Output: –a map that reveals patterns: merged (similar) regions! F Goals: –interactive analysis (drill-down, slice, dice, pivot, roll-up) –fast response time –minimizing storage space used F Challenge: a merged region may contain hundreds of “primitive” regions (polygons).

A Model of Spatial Data Warehouses F Dimensions –nonspatial u (e.g degrees generalizes to hot) –spatial-to-nonspatial u (e.g. region “B.C.” generalizes to description “western provinces”) –spatial-to-spatial u (e.g. region “Burnaby” generalizes to region “Lower Mainland”) F Measurements –numerical u distributive (e.g. count, sum) u algebraic (e.g. average) u holistic (e.g. median, rank) –spatial u collection of spatial pointers (e.g. pointers to all regions with degrees in July)

Star Model of a Spatial Data Warehouse F Dimensions –region_name –time –temperature –precipitation F Measurements –region_map –area –count Fact tableDimension table

Spatial Merge: Pre- vs On-line Computation On-line merge: very expensive Precomputing all: too much storage space

Spatial Measurements: Selective Materialization F Methods for computation of spatial measurements in spatial data cube. –Collect and store pointers to spatial objects in a spatial data cube:Computing on the fly --- expensive and slow. –Saving all the possible combinations --- huge space overhead. –Precompute and store rough approximations in a spatial data cube --- accuracy trade-off. –Selective computation: only materialize those which will be accessed frequently --- a reasonable choice. F Cube lattice and granularity of merge-able spatial objects. –Cuboid-level vs. cube cell level granularity.

Computing Spatial Measurements F Apply [HRU96] greedy algorithm to select cuboids F [HRU96] algorithm has granularity on a cuboid level Temperature Region_name cold mod warm hot Okanagan Vanc Isl. Lower Main. Kooteney Interior BC Northern BC F Finer granularity, on a cell level F Only selected cells are materialized (not the whole cuboid) F Factors in selections of cells –access frequency –size of a cell (number of merged objects) It could be better to save {1,3,4,7} than {1,3} –benefit for on-the-fly computation:If {1,3} is saved, it can be used for {1,3,6}. F Only neighboring objects are merged.

Integration of Data Mining and Data Warehousing F Data warehouse provides clean, integrated data for fruitful mining. F Data mining provides powerful tools for analysis of data stored in data warehouses. F OLAP can be viewed as data summarization and simple data mining. F Data mining provides more analysis tools, e.g., association, classification, clustering, pattern-directed, and trend analysis. F Mining multi-level knowledge by integration with OLAP facilities: mining in multiple data cubes.

Mining Different Kinds of Knowledge F Characterization: Generalize, summarize, and possibly contrast data characteristics, e.g., dry vs. wet regions.  Association: Rules like “inside(x, city)  near(x, highway)”. F Classification: Classify data based on the values in a classifying attribute, e.g., classify countries based on climate. F Clustering: Cluster data to form new classes, e.g., cluster houses to find distribution patterns. F Trend and deviation analysis: Find and characterize evolution trend, sequential patterns, similar sequences, and deviation data, e.g., housing market analysis. F Pattern-directed analysis: Find and characterize user- specified patterns in large databases, e.g., volcanos on Mars.

Different Mining Tasks in Spatial DBs F Spatial data mining tasks: –Spatial data characterization and comparison –Spatial clustering analysis –Spatial classification –Spatial association –Spatial pattern analysis F Spatial concept hierarchies: thematic vs. spatial. –Thematic hierarchy: e.g., agriculture (food (grain (corn, rice,...), vegetable, fruit), others(...)). –Spatial hierarchy, based on u Spatial data structures (MBR, quad-tree & R-tree). u Spatial related semantics (geo-region classification). u Clustering analysis (e.g., neighborhood or adjacent_to).

A Geo-Spatial Data Mining Query Language: GMQL  mine characteristic rules type of rule (characteristic, discriminant, association, clustering, classification) for “Description of states along I 80 highway”  from us_hiway, states_census SQL like from, where clauses  where states_census.obj intersects us_hiway.obj high level concepts and and highway = "I 80” spatial joins may be used with respect to states_census.obj, state_name, pop90, capita_income list of relevant attributes  set attribute threshold 51 for state_name thresholds for rules filtration FExtension to Spatial SQL [Egenhofer’94]. FSupport ad-hoc data mining queries.

Background Knowledge for Data Mining F Conceptual "hierarchies" and generalization operators. –Instance-based: {freshman,..., senior}  undergraduate. –Schema-based: address(city, province, country). –Rule-based: good(x)  undergraduate(x)  gpa(x)  3.5. –Operation-based: aggregation, approximation, clustering, etc. F Where to get such background knowledge? –Implicitly stored in databases, such as address. –Explicitly defined by experts, such as "physics  science". –Formed with different attribute combinations, u food(category, brand, content _spec, package _size, price). –Generated automatically by data distribution analysis. F May need dynamic adjustment for a particular set of data. F Choose from multiple hierarchies or try them in parallel.

Automatic Generation of Numeric Hierarchies Count Amount

Spatial OLAP (Characterization) F Viewing data from different angles F Summarization on multiple concept levels Spatial slicing Drilling- down on medium family income

Mining Discriminant Rules F Discrimination: Comparison of two or more classes F Strategy: – Collect the relevant data respectively into the target class and the contrasting class – Generalize both classes to the same high level concepts, – Compare tuples with the same high level descriptions, – Present for every tuple its description and two numbers u support - distribution within single class u comparison - distribution between classes – Highlight the tuples with strong discriminant features F Interestingness: – Different measures of interestingness, e.g. consider also the sizes of different classes

Spatial OLAP (Comparison) F Comparing different classes of data Population increases faster in the western part. Drill down, and look at different dimensions to get explanation!!

Mining Association Rules F Association: Finding association among a set of attributes and their values. F Applications: pattern association, market analysis, etc. F Examples. –milk  bread [5%, 60%] –tire  auto_accessories  auto_services [2%, 80%] F Methods for mining associations : –Apriori ( Agrawal & Srikant’94) –Partition technique (Savasere, Omiecinski, Navathe’95) –Sampling (Toivonen’96)

Spatial Associations FIND SPATIAL ASSOCIATION RULE DESCRIBING "Golf Course" FROM Washington_Golf_courses, Washington WHERE CLOSE_TO(Washington_Golf_courses.Obj, Washington.Obj, "3 km") AND Washington.CFCC <> "D81" IN RELEVANCE TO Washington_Golf_courses.Obj, Washington.Obj, CFCC SET SUPPORT THRESHOLD 0.5

Spatial Associations & Hierarchy of Spatial Relationships F Spatial association: Association relationship containing spatial predicates, e.g., close_to, intersect, contains, etc. –Topological relations: u intersects, overlaps, disjoint, etc. –Spatial orientations: u left_of, west_of, under, etc. –Distance information: u close_to, within_distance, etc. F Hierarchy of spatial relationship: –“g_close_to”: near_by, touch, intersect, contain, etc. –First search for rough relationship and then refine it.

Efficient Mining of Spatial Associations F Two-step computation of spatial associations: –Step 1: rough spatial computation as a filter u MBR or R-tree rough estimation. –Step2: Detailed spatial algorithm as refinement u apply only to those pairs which have passed the rough spatial association testing (no less than min_support). F Multi-dimensional mining: –explore association relationships at any selected granularity level –perform drill-down and roll-up on any dimension.

Example: Spatial Association Rule Mining F “What kinds of spatial objects are close to each other in B.C.?” –Kinds of objects: cities, water, forests, usa_boundary, mines, etc. F Rules mined: –is_a(x, large_town) ^ intersect(x, highway)  adjacent_to(x, water). [7%, 85%] –is_a(x, large_town) ^adjacent_to(x, georgia_strait)  close_to(x, u.s.a.). [1%, 78%] F Mining method: Ariori + multi-level association + geo- spatial algorithms (from rough to high precision).

Data Classification F Data categorization based on a set of training objects. –Applications: credit approval, target marketing, medical diagnosis, treatment effectiveness analysis, etc. –Example: classify a set of diseases and provide the symptoms which describe each class or subclass. F The classification task: Based on the features present in the class_labeled training data, develop a description or model for each class. It is used for – classification of future test data, – better understanding of each class, and – prediction of certain properties and behaviors. F Data classification methods: Decision-trees (e.g., ID3, C4.5), statistics, neural networks, rough sets, etc.

A Decision-Tree Based Classification Method F A decision tree: F ID-3 and C4.5 (Quinlan’93): A top-down decision tree generation algorithm. –At start, all the training examples are at the root. –Partition examples recursively based on selected attributes. –Attribute selection: Maximizing an information gain measure, i.e., favoring the partitioning which makes the majority of examples belong to a single class. windy sunny rain overcast NP P NP humidity outlook

Scalable Classification Methods F Scalability of decision-tree classification algorithms. F Previous approaches: – Incremental tree construction (Quinlan’86): total cost is high. – Data sampling and discretizing continuous attributes (Cattlet’91): still in main memory. – Data partition and merge of parallel partition (Chan and Stolfo’91): reduced classification accuracy. F SLIQ & SPRINT (Mehta et al.’96, Shafer et al.’96): disk-based – Decision-tree construction algorithms. –Techniques: Pre-sorting, breadth_first tree-growing, and tree- pruning.

Generalization-Based Decision-Tree Induction F Integration of generalization with decision-tree induction. F Classification at primitive concept levels, e.g., precise temperature, humidity, outlook, etc. –Weakness: low-level concepts, scattered classes, bushy classification-trees, semantic interpretation problems. F Classification at high or medium concept levels: – may lead to imprecise classification. F Medium level generalization & adjustment: –Generalize to intermediate concept level(s). –Merge and split concept levels for better class representation and classification accuracy. –Efficiency: Analysis performed in compressed, generalized relations.

Mining Classification Rules F Classification: Based on the features present in the class_labeled training data, develop a description or model for each class. F Applications: credit approval, target marketing, medical diagnosis, treatment effectiveness analysis, etc. F Example: classify a set of diseases and provide the symptoms which describe each class or subclass.

Spatial Classification F Generalization-based induction F Interactive classification

F Predictive modeling: Predict data values or construct generalized linear models based on the database data. F One can only predict value ranges or category distributions. F Method outline: – Minimal generalization – Attribute relevance analysis – Generalized linear model construction – Prediction. F Determine the major factors which influence the prediction. –Data relevance analysis: uncertainty measurement, entropy analysis, expert judgement, etc. F Multi-level prediction: drill-down and roll-up analysis. Predictive Modeling in Databases

F Spatial trend predictive modeling (Ester et al’97): –Discover centers: local maximal of some non-spatial attribute. –Determine the (theoretical) trend of some non-spatial attribute, when moving away from the centers. –Discover deviations (from the theoretical trend). –Explain the deviations. F Example: Trend of unemployment rate change according to the distance to Munich. F Similar modeling can be used to study trend of temperature with the altitude, degree of pollution in relevance to the regions of population density, etc. Spatial Prediction and Trend Analysis

Data Clustering Analysis F Data clustering (“unsupervised learning”): Cluster objects into classes, based on their features, which maximize intraclass similarity and minimize interclass similarity. F Probability-based vs. distance-based clustering analysis. F Typical probability-based clustering analysis algorithms: – COBWEB (Fisher’87): Incremental concept formation. u Category utility measurement (probability of each concept’s occurrence) u Top-down, incremental, hierarchical organization of concepts. –CLASSIT (Gennari’89): extend it to real-valued data. F Typical distance-based clustering analysis algorithms: –Statistics-based, k-means, k-medoids, nearest neighbors.

Distance-Based Spatial Clustering Analysis F Statistical approaches: scan data frequently, iterative optimization, hierarchical clustering, etc. F CLARANS (Ng & Han’94): randomized search (sampling) + PAM (a distance-based clustering algorithm). F DASCAN (Ester et al.’96): density-based clustering using spatial data structures (R*-tree). F BIRCH (Zhang et al.’96): Balanced iterative reducing and clustering using hierarchies. – Focus on densely occupied portions of the data space. – Measurement reflects the “natural” closeness of points. – A height-balanced tree (CF-tree) is used for clustering. F Describe aggregate proximity relationships (Knorr & Ng’96).

Spatial Clustering F How can we cluster points? F What are the distinct features of the clusters? There are more customers with university degrees in clusters located in the West. Thus, we can use different marketing strategies!

Data and Knowledge Visualization F Visualization of characteristic and discriminant rules: – tables & cubes + bar/pie charts, curves, surfaces, etc. F Visualization of association rules: –Association rule graph: Nodes for large 1-itemset, lines for large 2-items sets, arrows for implication strength. –Association matrix: support/confidence: size/color in cells. F Cluster analysis: viewing clusters and their characteristics. F Classification: colored decision trees. F Prediction: curves, pie charts, and relevance analysis results. F Deviation analysis: boxplots (quartiles, median) and outliers. F Visual impression of large data mining results –arrange and color data items as pixels (Keim et al.’94)

Visual Data Mining ( ref. D. Keim SIGMOD’96 Tutorial) F Data visualization and exploratory analysis: –Interactive, usually undirected search for structures, trends, etc. F Typical data visualization techniques: –Geometric techniques, icon-based techniques, pixel- oriented techniques, hierarchical techniques, graph-based techniques, 3D-techniques, dynamic techniques, and hybrid techniques. F Database visualization systems: –Statistics-oriented systems, visualization-oriented systems, database-oriented systems and special purpose systems. F Visual database exploration is another powerful approach to data mining, especially spatial data mining.

Data Mining Interfaces F Interactive mining versus a data mining language. F Specification of data mining tasks. –Data sets: any sets of data in databases –Mining task specification: kinds of knowledge or forms of rules to be mined. –Background knowledge (e.g., concept hierarchies): specification and manipulation. –Interestingness measurement: significance, confidence, thresholds, concept levels, etc. F Transformation and manipulation of output results. –Roll-up vs. drill-down. –Multiple output forms: generalized relations, crosstabs, charts, curves, and other visual outputs.

GeoMiner: Graphical User Interface

Systems for Data Warehousing and Data Mining F Systems for Data Warehousing – Arbor Software: Essbase – Oracle (IRI): Express – Cognos: PowerPlay – Redbrick Systems: Redbrick Warehouse – Microstrategy: DSS/Server F Systems or Research Prototypes for Data Mining – IBM: QUEST (Intelligent Miner) – Silicon Graphics: MineSet – Integral Solutions Ltd.: Clementine –Information Discovery Inc.: Data Mining Suite – SFU (DBTech): DBMiner, GeoMiner – Rutger: DataMine, GMD: Explora, U Munich: VisDB

Conclusions F Data warehousing and data mining: – A rich, promising, young field with broad applications and many challenging research issues. –Imminent task: spatial database analysis --- from spatial data manipulation to on-line spatial analytical processing (Spatial OLAP) and spatial data mining. F Spatial data cube construction: fine granularity analysis. F Multiple spatial data mining tasks: Characterization, association, classification, clustering, sequence and pattern analysis, prediction, etc. F Integration of data mining with OLAP: OLAP-based spatial data mining. F Integration of spatial analysis methods, spatial query processing methods, and spatial indexing techniques.

Future Research F Foundation of spatial data warehousing and data mining. F Implementation methods: –Efficient construction of spatial data cubes. –A set of well-tuned spatial data mining operators. –Spatial data and knowledge visualization tools. –Integration of multiple mining tasks with OLAP functions. F New spatial indexing techniques for spatial data warehousing and spatial mining. F New spatial data mining methodologies: Statistical tools, neural nets, and ad-hoc query-based mining, etc. F Mining spatiotemporal data, raster data, and integration with existing spatial analysis techniques.

References F [1] Floris Geerts, Sofie Haesevoets and Bart Kuijpers. [1] Floris Geerts, Sofie Haesevoets and Bart Kuijpers. F A Theory of Spatio-Temporal Database. Computer Science Dept., North Dakota State University (2000) A Theory of Spatio-Temporal Database. Computer Science Dept., North Dakota State University (2000) F F [2] Martin Ester, Hans-Peter Kriegel, Jörg Sander.Algorithms and Applications for Spatial Data Mining, Geographic Data Mining and Knowledge Discovery, [2] Martin Ester, Hans-Peter Kriegel, Jörg Sander.Algorithms and Applications for Spatial Data Mining, Geographic Data Mining and Knowledge Discovery, F F [3] Martin Ester, Alexander Frommelt, Hans-Peter Kriegel, Jörg Sander. Algorithms for Characterization and Trend Detection in Spatial Databases, International Conference on Knowledge Discovery and Data Mining (KDD-98) [3] Martin Ester, Alexander Frommelt, Hans-Peter Kriegel, Jörg Sander. Algorithms for Characterization and Trend Detection in Spatial Databases, International Conference on Knowledge Discovery and Data Mining (KDD-98) F F [4] Jan Paredaens, Bart Kuijpers. Data Models and Query Languages for Spatial Databases. ACM SIGKDD Explorations (1999) [4] Jan Paredaens, Bart Kuijpers. Data Models and Query Languages for Spatial Databases. ACM SIGKDD Explorations (1999) F F [5] Hans-Peter Kriegel, Thomas Brinkhoff, Ralf Schneider. Efficient Spatial Query Processing in Geographic Database Systems. VLDB (2001) [5] Hans-Peter Kriegel, Thomas Brinkhoff, Ralf Schneider. Efficient Spatial Query Processing in Geographic Database Systems. VLDB (2001) F F [6] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From Data Mining to Knowledge Discovery in Databases. AI MAGAZINE (1999) [6] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From Data Mining to Knowledge Discovery in Databases. AI MAGAZINE (1999) F F [7] Ramakrishnan Srikant, Rakesh Agrawal. Mining Quantitative Association Rules in Large Relational Tables. VLDB (1996) [7] Ramakrishnan Srikant, Rakesh Agrawal. Mining Quantitative Association Rules in Large Relational Tables. VLDB (1996) F F [8] Krzysztof Koperski, A Progressive Refinement Approach to Spatial Data Mining. SFU PhD Thesis (1999) [8] Krzysztof Koperski, A Progressive Refinement Approach to Spatial Data Mining. SFU PhD Thesis (1999)

Thank you !!! Thank you !!!