Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
KDD for Science Data Analysis Issues and Examples.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Understanding Data Analytics and Data Mining Introduction.
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Spatial Data Mining Hari Agung Departemen Ilmu Komputer FMIPA IPB
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Introduction To Data Mining. What Is Data Mining? A toolA tool Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful)
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Role of Statistics in Geography
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Spatial Data Mining hari agung.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
7. Air Quality Modeling Laboratory: individual processes Field: system observations Numerical Models: Enable description of complex, interacting, often.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
There is an inherent meaning in everything. “Signs for people who can see.”
Chapter 2: Data Mining Dr. Goutam Sarker,
Data Mining Functionalities
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining: Introduction
DATA MINING © Prentice Hall.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Introduction
Data Mining: Concepts and Techniques
Presentation transcript:

Spatial Data Mining Yang Yubin Joint Laboratory for Geoinformation Science The Chinese University of Hong Kong

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 2 Agenda Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting Conclusions Questions & Discussions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 3 Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting Conclusions Questions & Discussions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 4 Why do we need Data Mining? Large number of records(cases) ( bytes) –One thousand (10 3 ) bytes = 1 kilobyte (KB) –One million (10 6 ) bytes = 1 megabyte (MB) –One billion (10 9 ) bytes = 1 gigabyte (GB) –One trillion (10 12 ) bytes = 1 terabyte (TB) High dimensional data (variables) – attributes Only a small portion, typically 5% to 10%, of the collected data is ever analyzed We are drowning in data, but starving for knowledge!

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 5 Data collected and stored at enormous speeds (Gbyte/hour) –remote sensor on a satellite –telescope scanning the skies –scientific simulations generating terabytes of data Classical modeling techniques are infeasible Data reduction Cataloging, classifying, segmenting data Helps scientists in Hypothesis Formation Scientific Viewpoint

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 6 Great efforts for construction and maintenance of large information databases Data cannot be analyzed by standard statistical methods –numerous missing records –data are qualitative rather than quantitative We do not always know what information might be represented or how relevant it might be to the questions Current Situations (1)

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 7 the ways and means for using all this data lag far behind the increase of available data –Information can only be found with: a lot of coincidence (internet) not explicitly available (company databases) only accessible for human eyes by using lots of processing power (astronomical, meteorological and earth observation data) This leads to a clear demand for means of uncovering the information and knowledge hidden in the massive quantities of data Current Situations (2)

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 8 Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting Conclusions Questions & Discussions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 9 What is Data Mining? Data mining is concerned with solving problems by analyzing existing data “Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from huge amount of data” Alternative Names: Knowledge Discovery in Databases (KDD) –A term originated in Artificial Intelligence (AI) field –KDD consists of several steps (one of which is Data Mining)

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 10 Data Mining vs. KDD Knowledge Discovery in Databases (KDD): The whole process of finding useful information and patterns in data Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process Data mining is the core of the knowledge discovery process

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 11 KDD Process Selection: Obtain data from various sources. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 12 Data Mining: A KDD Process –Data mining: core of knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 13 Typical Data Mining Architecture Data Warehouse Data cleaning & data integration Filtering Databases Database or data warehouse server Data mining engine Pattern evaluation Graphical user interface Knowledge-base

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 14 Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Algorithms, …,Other Disciplines Information Theory Machine Learning Visualization

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 15 Data Mining is: A “hot” word for a class of techniques that find patterns in data A user-centric, interactive process which leverages analysis technologies and computing power A group of techniques that find relationships that have not previously been discovered Not reliant on an existing database A relatively easy task that requires knowledge of the business problem/subject matter expertise

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 16 Experts and clients are needed in: Define and redefine problems Determine relevant aspects of the problem Supply the data Remove errors from the data Provide constraints on possible patterns Interpret patterns and possibly reject implausible ones Evaluate predicted effects…

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 17 Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting Conclusions Questions & Discussions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 18 Primary Data Mining Tasks (1) Descriptive Modeling –Finding a compact description for large dataset [Concept Description] –Clustering people or things into groups based on their attributes [Clustering] –Associating what events are likely to occur together [Association Rule] –Sequencing what events are likely to lead to later events [Sequential Pattern Analysis] –Discovering the most significant changes [Deviation Detection]

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 19 Primary Data Mining Tasks (2) Predictive Modeling –Classifying people or things into groups by recognizing patterns [Classification] –Forecasting what may happen in the future by mapping a data item to a predicting real-value variable [Regression]

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 20 Concept Description Characterization: provides a concise and succinct summarization of the given collection of data Discrimination: provides descriptions comparing two or more collections of data can handle complex data types of the attributes a more automated process

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 21 Generalized Relation Initial Relation Concept description: Characterization

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 22 Clustering Cluster: a collection of data objects –Similar to one another within the same cluster –Dissimilar to the objects in other clusters Clustering –Grouping a set of data objects into clusters based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity Example –Land use: Identification of areas of similar land use in an earth observation database –City-planning: Identifying groups of houses according to their house type, value, and geographical location

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 23 Association rule Association (correlation and causality) –age(X, “20..29”) ^ income(X, “20..29K”) buys(X, “PC”) [support = 2%, confidence = 60%] Association rule mining –Finding frequent patterns, associations, correlations among sets of items or objects in transaction databases, relational databases, and other information repositories –Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database Motivation: finding regularities in data –What products were often purchased together?

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 24 Example: Association rule Itemset A1,A2={a 1, …, a k } Find all the rules A1  A2 with min confidence and support –support, s, probability that a transaction contains A1  A2 –confidence, c, conditional probability that a transaction having A1 also contains A2. Let min_support = 50%, min_conf = 50%: a1  a3 (50%, 66.7%) a3  a1 (50%, 100%) Transaction-idItems bought 10a1,a2, a3 20a1, a3 30a1, a4 40a2, a5, a6

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 25 Sequential Pattern Analysis Given a set of sequences, find the complete set of frequent subsequences Applications of sequential pattern –Customer shopping sequences: First buy computer, then CD-ROM, and then digital camera, within 3 months. –Weblog click streams –Telephone calling patterns SIDsequence Given support threshold min_sup =2, is a sequential pattern

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 26 Deviation Detection Outlier analysis –Outlier: a data object that does not comply with the general behavior of the data –It can be considered as noise or exception but is quite useful in fraud detection, rare events analysis Trend and evolution analysis –Trend and deviation: regression analysis –Periodicity analysis –Similarity-based analysis

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 27 Classification and Regression Classification: –constructs a model (classifier) based on the training set and uses it in classifying new data –Example: Climate Classification,… Regression: –models continuous-valued functions, i.e., predicts unknown or missing values –Example: stock trends prediction,…

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 28 Classification (1): Model Construction Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model) Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 29 Classification (2): Prediction Using the Model Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 30 Classification Techniques Decision Tree Induction Bayesian Classification Neural Networks Genetic Algorithms Fuzzy Set and Logic

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 31 Regression Regression is similar to classification –First, construct a model –Second, use model to predict unknown value Methods –Linear and multiple regression –Non-linear regression Regression is different from classification –Classification refers to predict categorical class label –Regression models continuous-valued functions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 32 Are All the “Discovered” Patterns Interesting? A data mining task may generate thousands of patterns, not all of them are interesting. Interestingness measures: –A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm –Objective vs. Subjective interestingness measures: Objective : based on statistics and structures of patterns, e.g., support, confidence, etc. Subjective : based on user’s belief in the data, e.g., unexpectedness, novelty, executability, etc.

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 33 Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting Conclusions Questions & Discussions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 34 Spatial Data Mining Spatial Patterns –Spatial outliers –Location prediction –Associations, co-locations –Hotspots, Clustering, trends, … Primary Tasks –Mining Spatial Association Rules –Spatial Classification and Prediction –Spatial Data Clustering Analysis –Spatial Outlier Analysis Example: Unusual warming of Pacific ocean (El Nino) affects weather in USA…

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 35 Spatial Data Mining Results Understanding spatial data, discovering relationships between spatial and nonspatial data, construction of spatial knowledge bases, etc. In various forms –The description of the general weather patterns in a set of geographic regions is a spatial characteristic rule. –The comparison of two weather patterns in two geographic regions is a spatial discriminant rule. –A rule like “most cities in Canada are close to the Canada-US border” is a spatial association rule near(x,coast) ^ southeast(x, USA) ) hurricane(x), (70%) –Others: spatial clusters,…

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 36 What is Spatial Data? Used in/for: l GIS - Geographic Information Systems l Meteorology l Astronomy l Environmental studies, etc. The data related to objects that occupy space –traffic, bird habitats, global climate, logistics,... Object types: –Points, Lines, Polygons,etc.

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 37 Basic Concepts (1) Spatial data mining follows along the same functions in data mining, with the end objective to find patterns in geography, meteorology, etc. The main difference (Spatial autocorrelation) –the neighbors of a spatial object may have an influence on it and therefore have to be considered as well Spatial attributes –Topological adjacency or inclusion information –Geometric position (longitude/latitude), area, perimeter, boundary polygon

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 38 Basic Concepts (2) Spatial neighborhood –Topological relation “intersect”, “overlap”, “disjoint”, … –distance relation “close_to”, “far_away”,… –direction/orientation relation “left_of”, “west_of”,… Global model might be inconsistent with regional models Global Model Local Model

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 39 Applications NASA Earth Observing System (EOS): Earth science data National Inst. of Justice: crime mapping Census Bureau, Dept. of Commerce: census data Dept. of Transportation (DOT): traffic data National Inst. of Health(NIH): cancer clusters ……

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 40 Example: What Kind of Houses Are Highly Valued?—Associative Classification

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 41 Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting Conclusions Questions & Discussions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 42 Meteorological Data Mining Motivation –Lot of analysis methods must be applied to fast growing data for climate studies Result –Appropriate presentation instruments (graphs, maps, reports, etc) must be applied Examples –Spatial outliers can be associated with disastrous natural events such as tornadoes, hurricane, and forest fires –Associations between disaster events and certain meteorological observations

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 43 SKICAT(SKy Image Cataloging and Analysis Tool ) (Caltech, US) The Palomar Observatory discovered 22 quasars with the help of data mining the Second Palomar Observatory Sky Survey (POSS-II) –decision tree methods –classification of galaxies, stars and other stellar objects About 3 TB of sky images were analyzed Case Studies (1): Astronomy

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 44 Case Studies (2): NCAR & UCAR National Center for Atmospheric Research (NCAR) & University Corporation for Atmospheric Research(UCAR), US – “Automatic Fuzzy Logic-based systems now compete with human forecasts” Richard Wagoner, Deputy Director at Research Applications Program(RAP), NCAR Intelligent Weather System (IWS) –Detection and forecast in the areas of en-route turbulence, en-route icing, ceiling/visibility, and convective hazards in the aviation community –Road winter maintenance, airport operations, and flash flood forecasting

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 45 Operational Application Prediction System: WIND-2 –WIND: “Weather Is Not Discrete” Consists of three parts: –Data Past airport weather observations, 30 years of hourly observations, time series of 300,000 detailed observations Recent and current observations (METARs) Model based guidance (knowledge of near-term changes,e.g., imminent wind-shift, onset/cessation of precipitation) –Fuzzy similarity-measuring algorithm –Prediction composition – predictions based on k nearest neighbors(k-nn, clustering method)

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 46 Operational Application Hybrid methods are used to predict weather –Dynamical approach - based upon equations of the atmosphere,uses finite element techniques –Empirical approach - similar weather situations lead to similar outcomes WIND runs in real-time for meteorologically different sites Data-mining/forecast process takes about one second

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 47

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 48 Case Studies (3): CrossGrid (EU) Objective –To develop, implement and exploit new Grid components for interactive compute and data intensive applications like flooding crisis team decision support systems, air pollution combined with weather forecasting Main tasks in Meteorological applications package –Data mining for atmospheric circulation patterns Find a set of representative prototypes of the atmospheric patterns in a region of interest –Weather forecasting for maritime applications –Ocean wave forecasting by models of various complexity

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 49 Data –ERA-15 using a T106L31 model (from 1978 to 1994) with ◦ resolution –Terabytes –Comprises data from approx. 20 variables (such as temperature,humidity, pressure, etc.) at 30 pressure levels of a 360x360 nodes grid 6 SOM Application for DataMining Downscaling Weather Forecasts Adaptive Competitive Learning Sub-grid details scape from numerical models

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 50 Dept. of Applied Mathematics Universidad de Cantabria Santander, Spain

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 51 Case Studies (4): Typhoon Image Data Mining Objective –To establish algorithms and database models for the discovery of information and knowledge useful for typhoon analysis and prediction –Content-based image retrieval technology to search for similar cloud patterns in the past –Data mining technology to extract spatio-temporal pattern information which is meaningful from the meteorology viewpoints Result –Alignment of Multiple Typhoons, Explore by Projection to 2D Plane, Diurnal Analysis

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 52 Methods Archive of approximately 34,000 typhoon images for the northern and southern hemisphere Various data mining approaches –Principal component analysis(PCA), K-means clustering, self-organizing map(SOM), wavelet transform Retrieval of historical similar patterns from image databases to perform instance-based typhoon analysis and prediction Extracting the eigenvectors of the whole typhoon image collection

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 53

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 54 Case Studies (5): LEAD Linked Environments for Atmospheric Discovery –To accommodate the real time, on-demand, and dynamically-adaptive nature of mesoscale problems Complexities: vastly disparate, high volume and bandwidth data Tremendous computational demands –Used in accessing, preparing, assimilating, predicting, managing, mining/analyzing, and displaying a broad array of meteorological and related information Data Mining Solution Center: ITSC, The Univ. of Alabama in Huntsville, US –

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 55 ADaM The Algorithm Development and Mining –Component architecture data mining toolkit –For geophysical phenomena detection and feature extraction Applications –Detecting tropical cyclones and estimating their maximum sustained wind speed –Mesocyclone Identification from RADAR –Detecting Cumulus Cloud Fields in GOES Images

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 56 ADaM (cont’d) –Mesoscale Convective Systems Detection EOS Special Sensor Microwave/Imager (SSM/I) Brightness Temperature Swaths from DMSP F13 and F14 –Rain Detection Using SSM/I –Lightning Detection Using OLS –Rain Accumulation Study

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 57 Case Studies (6): Rainfall Classification University of Oklahoma Norman To classify significant and interesting features within a two-dimensional spatial field of meteorological data –Observed or predicted rainfall Data source –Estimates of hourly accumulated rainfall –Using radar and raingage data “Attributes” for classification –Statistical parameters representing the distribution of rainfall amounts across the region Classification Method –Hierarchical cluster analysis

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 58 Many Others… JARtool Project (Fayyad et al., NASA ) Identifying volcanoes on the surface of Venus from images transmitted by the Magellan spacecraft More than 30,000 high resolution Synthetic Aperture Radar(SAR) images of the surface of Venus from different angles The obtained accuracy was about 80%

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 59 What we can learn from those scenarios? Data Mining is a promising way for meteorological analysis Very strong interaction between scientists and the knowledge discovery system is necessary The users define features of the meteorological phenomena based on their expert knowledge The system extracts the instances of such phenomena Then, further analysis of phenomena is possible

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 60 Motivation and General Description Data Mining: Basic Concepts Data Mining Techniques Spatial Data Mining Spatial Data Mining Scenarios in Meteorology and Weather Forecasting Conclusions Questions & Discussions

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 61 Conclusions Data mining: discovering interesting patterns from large amounts of data A natural evolution of database technology, in great demand, with wide applications A KDD process includes data mining, and other steps Data Mining can be performed in a variety of information repositories Data mining Tasks: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.

2004/09/09Hong Kong Observatory Hong Kong Meteorological Society 62 And now discussion