Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
Mutual Information Mathematical Biology Seminar
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Data Mining – Intro.
Radial-Basis Function Networks
Evaluating Performance for Data Mining Techniques
Integrating Multi-Media with Geographical Information in the BORG Architecture R. George Department of Computer Science Clark Atlanta University Atlanta,
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Data Mining Techniques
Projective Texture Atlas for 3D Photography Jonas Sossai Júnior Luiz Velho IMPA.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Parallelism and Robotics: The Perfect Marriage By R.Theron,F.J.Blanco,B.Curto,V.Moreno and F.J.Garcia University of Salamanca,Spain Rejitha Anand CMPS.
Presented by Tienwei Tsai July, 2005
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Prepared by: Mahmoud Rafeek Al-Farra
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
DDM Kirk. LSST-VAO discussion: Distributed Data Mining (DDM) Kirk Borne George Mason University March 24, 2011.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
A new clustering tool of Data Mining RAPID MINER.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
The article written by Boyarshinova Vera Scientific adviser: Eltyshev Denis THE USE OF NEURO-FUZZY MODELS FOR INTEGRATED ASSESSMENT OF THE CONDITIONS OF.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Data Mining – Intro.
What Is Cluster Analysis?
Chapter 7. Classification and Prediction
Data Warehousing and Data Mining
Neuro-Computing Lecture 4 Radial Basis Function Network
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

2 Research Clustering Algorithms for Data Mining Spatio-Temporal Domain Parallelization of Algorithms Algorithms for Feature Extraction and Knowledge Discovery

3 Challenges of Geographical Data Complexities associated with data volume Terabyte databases Domain complexities Interesting signals hidden by stronger patterns Complexities caused by local variation Systems are interconnected Data gathering and sampling Interpretation of aggregated data Formalizing the domain

4 Background: Issues with Hard Clustering Issue: Force data with imprecision and/or uncertainty into discrete classes Result: Missing important outliers, boundary patterns Approach: Use of Approximate Clustering Technique

5 Background: K-Means Clustering Partition the data into K Clusters that are homogenous Algorithm Select K time series as initial centroids Assign all time series to the most similar centroid Re-compute the centeroids Repeat till centroids do not change Variations based on different measures of similarity

6 Unsupervised Fuzzy K-Means (UKFM) Clustering Choose the initial number of clusters Develop a clustering using the Fuzzy K- Means Merge the cluster pair that have maximum correlation Compute validity measure Repeat till until termination condition reached

7 UKFM Results Weather Data Set Initial: 11 ClustersOptimal: 8 Clusters Final: 4 Clusters

8 Global Earth Science Data Collaborative Effort with V. Kumar (UMinn) Test bed for UKFM (comparison with existing techniques) Data Set Global Sea Pressure (1989 – 1993) Ocean Climate Indices Capture Teleconnections Result UKFM can capture even weaker OCI’s using coarse clusters

9 Global Climate Data (Sea Level Pressure) Intermediate: 60 Clusters

10 Global Climate Data (Sea Level Pressure) Final: 26 Clusters

11 Relation with SOI

12 Integrating Multi Datasets in UFKM Clustering Motivation: Data-based approach of Determining “interesting” clusters Validate using multi datasets Rule: Retain clusters that have supporting data Applicable in Data Rich Environment

13 UKFM Clustering with Multi- Dataset Validation Choose the initial number of clusters Develop a clustering using the Fuzzy K- Means Validate cluster with other datasets D i=1,n Merge if clusters is uncorrelated Else Consider next candidate pair to merge Repeat till until termination condition reached

14 UKFM Multi-Dataset Results Height Pressure Temperature Windspeed

15 Multi-threading Parallel Algorithm For each clustering stage For each iteration Slaves: Calculate M for each cluster Master: Normalize M Slaves: Calculate C for each cluster Master: Normalize C

16 Multi-threading Result Implemented on Sun Fire workstation with four 900-MHz UltraSPARC® III processors Near Linear Speed Up Obtained

17 Relevance to the Army Directly supports the FBKOF STO (B. Broome) Development of the Weather Information and Tactical Support (WITS) System

18 Weather Information and Tactical Support (WITS) Objective: Extraction of patterns from weather to be extracted and fused with external databases (logistics, terrain, forces, etc.) for higher level planning

19 Approach Development of an OLAP Weather Repository GA Weather ( ) Sources: Nat. Weather Svc, GA Env. Network Development of WITS Modules Ad-hoc Querying Real time Analysis and Planning Effects on Army Systems Integration with IWEDA Abstract Data Representation

20 WITS System Design

21 WITS/IQ

22 WITS/IQ

23 WITS/IWEDA

24 WITS/Analysis

25 WITS/Analysis

26 Work in Progress Characterization of Analysis Queries Incorporation into Data Mining Algorithms into WITS Enhancement of WITS/TAPS Implementation of WITS/Real

27 Hybrid Genetic Fuzzy Systems for Feature Extraction and Knowledge Discovery

28 Project Goals Design and implement hybrid genetic fuzzy system for knowledge discovery. Develop API/Tools. Apply tools to Army related problems.

29 Contribution Hybrid system based on the Simple Genetic Algorithm (SGA). Enhanced the SGA by adding three levels of knowledge discovery. Level 1: Discovers up to k possible rules for a given set of inputs and outputs. It then attempts to minimize the number of rules and tune the knowledge base. Level 2: Takes the set of rules from Level 1 and further minimizes the rules. In addition, it also tunes the knowledge base. Level 3: Makes one last attempt to further tune the architecture of the knowledge base.

30 Rule Discovery Search for k possible rules from the set of p possible rules. k is a input parameter of the GA application. Discover the smallest value of k, therefore reducing the number of rules needed. Example Rules: If INPUT_1 is low AND INPUT_2 is medium THEN OUTPUT_1 is high If INPUT_1 is high THEN OUTPUT_1 is low

31 Relevance to the Army Collaborators: Jeff Passner, John Raby (ARL) IMETS weather modeling Post processing used to predict additional parameters Visibility, Turbulence, Fog, etc. Use of Knowledge Discovery to Predict Parameters

32 Visibility Application Generate and tune a system that can predict visibility based on input parameters Tasks for the fuzzy genetic system Search for a set of k rules from p possible rules that describe the relationship of the input parameters with the output (visibility) Concurrently discover the architecture, and optimize the performance of the knowledge-bases in relation to the k rules

33 Results for Low Visibility Classifier

34 Results for Medium Visibility Classifier