Name: Sujing Wang Advisor: Dr. Christoph F. Eick

Slides:



Advertisements
Similar presentations
Incremental Clustering for Trajectories
Advertisements

Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Swarm: Mining Relaxed Temporal Moving Object Clusters
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Mining Frequent Spatio-temporal Sequential Patterns
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
Graduate : Sheng-Hsuan Wang
Reported by Sujing Wang UH-DMML Group Meeting Nov. 22, 2010.
Chung Sheng CHEN, Nauful SHAIKH, Panitee CHAROENRATTANARUK, Christoph F. EICK, Nouhad RIZK and Edgar GABRIEL Department of Computer Science, University.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
The Evolution of Spatial Outlier Detection Algorithms - An Analysis of Design CSci 8715 Spatial Databases Ryan Stello Kriti Mehra.
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
On the use of hierarchical prediction structures for efficient summary generation of H.264/AVC bitstreams Luis Herranz, Jose´ M. Martı´nez Image Communication.
Cascading Spatio-Temporal Pattern Discovery P. Mohan, S.Shekhar, J. Shine, J. Rogers CSci 8715 Presented by: Atanu Roy Akash Agrawal.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Data Mining – Intro.
EXAMPLE 1 Using a Variable Expression Hot Air Balloons You are riding in a hot air balloon. After traveling 5 miles, the balloon speed changes to 6 miles.
1 Modeling Evolution in Spatial Datasets Paul Amalaman 2/17/2012 Dr Eick Christoph Nouhad Rizk Zechun Cao Sujing Wang Data Mining and Machine Learning.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for the Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
A N A RCHITECTURE AND A LGORITHMS FOR M ULTI -R UN C LUSTERING Rachsuda Jiamthapthaksin, Christoph F. Eick and Vadeerat Rinsurongkawong Computer Science.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Department of Computer Science 2015 Research Areas and Projects 1.Data Mining and Machine Learning Group (UH-DMML) Its research is focusing on: 1.Spatial.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
1. Data Mining (or KDD) Let us find something interesting! Definition := “Data Mining is the non-trivial process of identifying valid, novel, potentially.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
MOSAIC: A Proximity Graph Approach for Agglomerative Clustering Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-shen Chen, Ulvi Celepcikay, Christian Guisti,
Department of Computer Science 1 KDD / Data Mining Let us find something interesting!  Motivation: We are drowning in data, but we are staving for knowledge.
Data Mining & Machine Learning Group ADMA09 Rachsuda Jianthapthaksin, Christoph F. Eick and Ricardo Vilalta University of Houston, Texas, USA A Framework.
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
DBSCAN Data Mining algorithm Dr Veljko Milutinović Milan Micić
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Change Analysis in Spatial Datasets by Interestingness Comparison Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University.
Project Seminar on STABLE CLUSTERING ALGORITHM TO IDENTIFY CPU USAGE OF COMPUTERS BEHAVIOR IN GRID ENVIRONMENT Under the guidance of Prof. Lakshmi Rajamani.
A Distributed Multimedia Data Management over the Grid Kasturi Chatterjee Advisors for this Project: Dr. Shu-Ching Chen & Dr. Masoud Sadjadi Distributed.
Department of Computer Science Research Focus of UH-DMML Christoph F. Eick Data Mining Geographical Information Systems (GIS) High Performance Computing.
1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Department of Computer Science 1 Data Mining / KDD Let us find something interesting! Definition := “KDD is the non-trivial process of identifying valid,
Extracting stay regions with uncertain boundaries from GPS trajectories a case study in animal ecology Haidong Wang.
Ch. Eick: Some Ideas for Task4 Project2 Ideas on Creating Summaries that Characterize Clustering Results Focus: Primary Focus Cluster Summarization (what.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Data Mining & Machine Learning Group UH-DMML: Ongoing Data Mining Research Data Mining and Machine Learning Group, Computer Science Department, University.
ViSOM - A Novel Method for Multivariate Data Projection and Structure Visualization Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Hujun Yin.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
What Else is Important in AI we Did not Cover?
More on Clustering in COSC 4335
CACTUS-Clustering Categorical Data Using Summaries
Urban Sensing Based on Human Mobility
Research Focus Objectives: The Data Analysis and Intelligent Systems (DAIS) Lab  aims at the development of data analysis, data mining, GIS and artificial.
ST-COPOT---Spatial Temporal Clustering with Contour Polygon Trees
Research Areas and Projects
University of Houston, USA
Data Analysis and Intelligent Systems Lab
Research Focus Objectives: The Data Analysis and Intelligent Systems (DAIS) Lab  aims at the development of data analysis, data mining, GIS and artificial.
Research Areas Christoph F. Eick
Yongli Zhang and Christoph F. Eick University of Houston, USA
Data Analysis and Intelligent Systems Lab
Section 4: see other Slide Show
Section 4: see other Slide Show
Spatial Data Mining Definition: Spatial data mining is the process of discovering interesting patterns from large spatial datasets; it organizes by location.
Paper ID: XX Track: Track Name
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Presentation transcript:

A Polygon-based Clustering and Analysis Framework for Mining Spatial Dataset Name: Sujing Wang Advisor: Dr. Christoph F. Eick Data Mining & Machine Learning Group

Outline Introduction Framework Architecture Methodology Case Study Conclusion and Future Work Data Mining & Machine Learning Sujing Wang 2

Introduction Spatial Data Mining (SDM): Spatial object structures: the process of analyzing and discovering interesting and useful patterns, associations, or relationships from large spatial datasets. Spatial object structures: (<spatial attributes>;<non-spatial attributes>) Example: Data Mining & Machine Learning Sujing Wang 3

Introduction Spatial objects: point, trajectory(line) polygon(region) Data Mining & Machine Learning Sujing Wang 4

Introduction Challenges: Motivation: Research goal: Complexity of spatial data types Spatial relationships Spatial autocorrelation Motivation: Polygons, specially overlapping polygons are very important for mining spatial datasets. Traditional Clustering algorithms do not work for spatial polygons. Research goal: Develop new distance functions and new spatial clustering algorithms for polygons clustering. Implement novel post-clustering techniques with plug-in reward functions to capture domain experts notation of interestingness. Data Mining & Machine Learning Sujing Wang 5

A Polygon-based Clustering and Analysis Framework for Mining Spatial Datasets Geospatial Datasets Reward Functions Spatial Clusters Poly_SNN Post-processing Domain Experts Notion of Interestingness DCONTOUR Meta Clusters Summaries and Interesting Patterns

Methodology 1. Domain Driven Final Clustering Generation Methodology Inputs: A meta-clustering M={X1, …, Xk} —at most one object will be selected from each meta-cluster Xi (i=1,...k). The user provides the individual cluster reward function RewardU whose values are in [0,). A reward threshold U —clusters with low rewards are not included in the final clusterings. A cluster distance threshold d, which expresses to what extent the user would like to tolerate cluster overlap. A cluster distance function dist. Find ZX1…Xk that maximizes: subject to:  xZ x’Z (xx’  Dist(x,x’)>d)  xZ (RewardU(x)>U)  xZ x’Z ((x Xi  x’ Xk  xx’ )  ik) Data Mining & Machine Learning Sujing Wang 7

Methodology 2. Finding interesting clusters with respect to continuous non spatial variable V: Let Xi 2A be a cluster in the A-space  be the variance of v with respect in dataset D (Xi) be the variance of variable v in a cluster Xi mv(Xi) the mean value of variable v in a cluster Xi t10 a mean value reward threshold and t21 be a variance reward threshold Interestingness function  for each cluster: ( Xi) = max (0, |mv(Xi)| - t1) × max(0, - ((Xi) × t2)) Data Mining & Machine Learning Sujing Wang 8

Case Study 1. Meta-clusters generated from multiple spatial datasets: Data Mining & Machine Learning Sujing Wang 9

Case Study 2. Final Clusters with area of polygons as plug-in reward function Polygon ID 13 21 80 125 150 Temperature (oF) 79.0 86.35 89.10 84.10 88.87 Solar Radiation (Langleys per minute) N/A 1.33 1.17 0.13 1.10 Wind Speed (Miles per hour) 4.50 6.10 6.20 4.90 5.39 Time of Day 6 p.m. 1 p.m. 2 p.m. 12 p.m. Data Mining & Machine Learning Sujing Wang 10

3. Finding interesting meta-clusters with respect to solar radiation: Case Study 3. Finding interesting meta-clusters with respect to solar radiation: Cluster ID Mean Variance Number of Polygon 5 -0.9144 0.1981 15 1.1218 0.1334 21 1.0184 0.0350 3 Data Mining & Machine Learning Sujing Wang 11

Conclusion & future work Conclusions: Our framework can effectively cluster spatial overlapping polygons similar in size, shape and locations. Our post-clustering techniques with different plug-in reward functions can guide the knowledge extraction of interesting patterns and generate summaries from large spatial datasets. Future Works: Develop novel spatial-temporal clustering techniques and embed them to our framework. Investigating novel change analysis techniques to identify spatial and temporal changes of spatial data. Evaluate our framework in challenging case studies. Data Mining & Machine Learning Sujing Wang 12

Publication: S. Wang, C.S. Chen, V. Rinsourongkawong, F. Akdag, C.F. Eick, “Polygon-based Methodology for Mining Related Spatial Datasets”, ACM SIGSPATIAL GIS Workshop on Data Mining for Geoinformatics (DMG) in conjunction with ACM SIGSPATIAL GIS 2010, San Jose, CA, Nov. 2010. NSF travel Award for ACM GIS 2010  S. Wang, C. Eick, Q. Xu, “A Space-Time Analysis Framework for Mining Geospatial Datasets”, CyberGIS’12 the First International Conference on Space, Time, and CyberGIS, University of Illinois at Urbana-Champaign, Champaign, IL Aug 6-9, 2012. NSF travel Award for CyberGIS 2012 C. Eick, G. Forestier, S. Wang, Z. Cao, S. Goyal, “A Methodology for Finding Uniform Regions in Spatial Data”, CyberGIS’12 the First International Conference on Space, Time, and CyberGIS, University of Illinois at Urbana-Champaign, Champaign, IL Aug 6-9, 2012. S. Wang, C.F. Eick, “A Polygon-based Clustering and Analysis Framework for Mining Spatial Datasets”, Geoinformatica, (Under Review). Data Mining & Machine Learning Sujing Wang 13

Thank you! Data Mining & Machine Learning Sujing Wang 14