Download presentation
Presentation is loading. Please wait.
Published byJulianna Bennett Modified over 9 years ago
1
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314
2
2 Research Clustering Algorithms for Data Mining Spatio-Temporal Domain Parallelization of Algorithms Algorithms for Feature Extraction and Knowledge Discovery
3
3 Challenges of Geographical Data Complexities associated with data volume Terabyte databases Domain complexities Interesting signals hidden by stronger patterns Complexities caused by local variation Systems are interconnected Data gathering and sampling Interpretation of aggregated data Formalizing the domain
4
4 Background: Issues with Hard Clustering Issue: Force data with imprecision and/or uncertainty into discrete classes Result: Missing important outliers, boundary patterns Approach: Use of Approximate Clustering Technique
5
5 Background: K-Means Clustering Partition the data into K Clusters that are homogenous Algorithm Select K time series as initial centroids Assign all time series to the most similar centroid Re-compute the centeroids Repeat till centroids do not change Variations based on different measures of similarity
6
6 Unsupervised Fuzzy K-Means (UKFM) Clustering Choose the initial number of clusters Develop a clustering using the Fuzzy K- Means Merge the cluster pair that have maximum correlation Compute validity measure Repeat till until termination condition reached
7
7 UKFM Results Weather Data Set Initial: 11 ClustersOptimal: 8 Clusters Final: 4 Clusters
8
8 Global Earth Science Data Collaborative Effort with V. Kumar (UMinn) Test bed for UKFM (comparison with existing techniques) Data Set Global Sea Pressure (1989 – 1993) Ocean Climate Indices Capture Teleconnections Result UKFM can capture even weaker OCI’s using coarse clusters
9
9 Global Climate Data (Sea Level Pressure) Intermediate: 60 Clusters
10
10 Global Climate Data (Sea Level Pressure) Final: 26 Clusters
11
11 Relation with SOI
12
12 Integrating Multi Datasets in UFKM Clustering Motivation: Data-based approach of Determining “interesting” clusters Validate using multi datasets Rule: Retain clusters that have supporting data Applicable in Data Rich Environment
13
13 UKFM Clustering with Multi- Dataset Validation Choose the initial number of clusters Develop a clustering using the Fuzzy K- Means Validate cluster with other datasets D i=1,n Merge if clusters is uncorrelated Else Consider next candidate pair to merge Repeat till until termination condition reached
14
14 UKFM Multi-Dataset Results Height Pressure Temperature Windspeed
15
15 Multi-threading Parallel Algorithm For each clustering stage For each iteration Slaves: Calculate M for each cluster Master: Normalize M Slaves: Calculate C for each cluster Master: Normalize C
16
16 Multi-threading Result Implemented on Sun Fire workstation with four 900-MHz UltraSPARC® III processors Near Linear Speed Up Obtained
17
17 Relevance to the Army Directly supports the FBKOF STO (B. Broome) Development of the Weather Information and Tactical Support (WITS) System
18
18 Weather Information and Tactical Support (WITS) Objective: Extraction of patterns from weather to be extracted and fused with external databases (logistics, terrain, forces, etc.) for higher level planning
19
19 Approach Development of an OLAP Weather Repository GA Weather (1981-2002) Sources: Nat. Weather Svc, GA Env. Network Development of WITS Modules Ad-hoc Querying Real time Analysis and Planning Effects on Army Systems Integration with IWEDA Abstract Data Representation
20
20 WITS System Design
21
21 WITS/IQ
22
22 WITS/IQ
23
23 WITS/IWEDA
24
24 WITS/Analysis
25
25 WITS/Analysis
26
26 Work in Progress Characterization of Analysis Queries Incorporation into Data Mining Algorithms into WITS Enhancement of WITS/TAPS Implementation of WITS/Real
27
27 Hybrid Genetic Fuzzy Systems for Feature Extraction and Knowledge Discovery
28
28 Project Goals Design and implement hybrid genetic fuzzy system for knowledge discovery. Develop API/Tools. Apply tools to Army related problems.
29
29 Contribution Hybrid system based on the Simple Genetic Algorithm (SGA). Enhanced the SGA by adding three levels of knowledge discovery. Level 1: Discovers up to k possible rules for a given set of inputs and outputs. It then attempts to minimize the number of rules and tune the knowledge base. Level 2: Takes the set of rules from Level 1 and further minimizes the rules. In addition, it also tunes the knowledge base. Level 3: Makes one last attempt to further tune the architecture of the knowledge base.
30
30 Rule Discovery Search for k possible rules from the set of p possible rules. k is a input parameter of the GA application. Discover the smallest value of k, therefore reducing the number of rules needed. Example Rules: If INPUT_1 is low AND INPUT_2 is medium THEN OUTPUT_1 is high If INPUT_1 is high THEN OUTPUT_1 is low
31
31 Relevance to the Army Collaborators: Jeff Passner, John Raby (ARL) IMETS weather modeling Post processing used to predict additional parameters Visibility, Turbulence, Fog, etc. Use of Knowledge Discovery to Predict Parameters
32
32 Visibility Application Generate and tune a system that can predict visibility based on input parameters Tasks for the fuzzy genetic system Search for a set of k rules from p possible rules that describe the relationship of the input parameters with the output (visibility) Concurrently discover the architecture, and optimize the performance of the knowledge-bases in relation to the k rules
33
33 Results for Low Visibility Classifier
34
34 Results for Medium Visibility Classifier
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.