Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group (http://www2.cs.uh.edu/~UH-DMML/index.html), research.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Chung Sheng CHEN, Nauful SHAIKH, Panitee CHAROENRATTANARUK, Christoph F. EICK, Nouhad RIZK and Edgar GABRIEL Department of Computer Science, University.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Data Mining – Intro.
Data mining By Aung Oo.
UH Data Mining & Machine Learning Group May 1, 2009 Christoph F. Eick Department of Computer Science University of Houston A Domain-Driven Framework.
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick Department of Computer Science, University of Houston 1.Motivation: Examples of.
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
Cyber-Infrastructure for Agro-Threats Steve Goddard Computer Science & Engineering University of Nebraska-Lincoln.
Department of Computer Science 1 Data Mining / KDD Let us find something interesting! Definition := “KDD is the non-trivial process of identifying valid,
Data Mining GyuHyeon Choi. ‘80s  When the term began to be used  Within the research community.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for the Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Frameworks and Algorithms for Regional Knowledge Discovery Christoph F. Eick Department of Computer Science, University of Houston 1.Motivation: Why is.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Discovering Interesting Regions in Spatial Data Sets using Supervised Clustering Christoph F. Eick, Banafsheh Vaezian, Dan Jiang, Jing Wang PKDD Conference,
Chapter 1 Introduction to Data Mining
Name: Sujing Wang Advisor: Dr. Christoph F. Eick
A N A RCHITECTURE AND A LGORITHMS FOR M ULTI -R UN C LUSTERING Rachsuda Jiamthapthaksin, Christoph F. Eick and Vadeerat Rinsurongkawong Computer Science.
Extracting Regional Knowledge from Spatial Datasets Christoph F. Eick Department of Computer Science, University of Houston 1.Motivation: Why is Regional.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Department of Computer Science 2015 Research Areas and Projects 1.Data Mining and Machine Learning Group (UH-DMML) Its research is focusing on: 1.Spatial.
1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
1. Data Mining (or KDD) Let us find something interesting! Definition := “Data Mining is the non-trivial process of identifying valid, novel, potentially.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining & Machine Learning Group ACM-GIS08 Christoph Eick (University of Houston, USA), Rachana Parmar (University of Houston, USA), Wei Ding.
MOSAIC: A Proximity Graph Approach for Agglomerative Clustering Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-shen Chen, Ulvi Celepcikay, Christian Guisti,
Department of Computer Science 1 KDD / Data Mining Let us find something interesting!  Motivation: We are drowning in data, but we are staving for knowledge.
1 Eick, Zeidat, Vilalta: Using Representative-based Clustering for NN Dataset Editing (ICDM04) Using Representative-Based Clustering For Nearest Neighbour.
Data Mining & Machine Learning Group ADMA09 Rachsuda Jianthapthaksin, Christoph F. Eick and Ricardo Vilalta University of Houston, Texas, USA A Framework.
Data Mining & Machine Learning Group UH-DMML: Ongoing Data Mining Research Data Mining and Machine Learning Group, Computer Science Department,
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
Change Analysis in Spatial Datasets by Interestingness Comparison Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University.
Data Mining and Machine Learning Group (UH-DMML) Wei Ding Rachana Parmar Ulvi Celepcikay Ji Yeon Choo Chun-Sheng Chen Abraham Bagherjeiran Soumya Ghosh.
Department of Computer Science Research Focus of UH-DMML Christoph F. Eick Data Mining Geographical Information Systems (GIS) High Performance Computing.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Department of Computer Science 1 Data Mining / KDD Let us find something interesting! Definition := “KDD is the non-trivial process of identifying valid,
Using Clustering to Enhance Classifiers Christoph F. Eick Organization of the Talk 1.Brief Introduction to KDD 2.Using Clustering a. for Nearest Neighbour.
AegisDB: Integrated realtime geo-stream processing and monitoring system Chengyang Zhang Computer Science Department University of North Texas.
Data Mining & Machine Learning Group UH-DMML: Ongoing Data Mining Research Data Mining and Machine Learning Group, Computer Science Department, University.
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
What Else is Important in AI we Did not Cover?
Eick: Introduction Machine Learning
Research Focus Objectives: The Data Analysis and Intelligent Systems (DAIS) Lab  aims at the development of data analysis, data mining, GIS and artificial.
Research Areas and Projects
COSC 6335 Data Mining Fall 2009: Assignment3a Post Analysis
Data Analysis and Intelligent Systems Lab
Research Focus Objectives: The Data Analysis and Intelligent Systems (DAIS) Lab  aims at the development of data analysis, data mining, GIS and artificial.
Data Analysis and Intelligent Systems Lab
Research Areas Christoph F. Eick
Yongli Zhang and Christoph F. Eick University of Houston, USA
UH-COSC Events Today, 4-6p: Student Welcome Party
Data Analysis and Intelligent Systems Lab
Data Warehousing and Data Mining
UH-DMML: Ongoing Data Mining Research
Section 4: see other Slide Show
Data Analysis and Intelligent Systems Lab
Brainstorming How to Analyze the 3AuCountHand Datasets
Spatial Data Mining Definition: Spatial data mining is the process of discovering interesting patterns from large spatial datasets; it organizes by location.
Christoph F. Eick: A Gentle Introduction to Machine Learning
Promising “Newer” Technologies to Cope with the
Presentation transcript:

Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research is focusing on: 1.Spatial Data Mining 2.Clustering 3.Helping Scientists to Find Interesting Patterns in their Data 4.Classification and Prediction 2.Current Projects 1.Extracting Regional Knowledge from Spatial Datasets 2.Analyzing Related Spatial Datasets 3.Mining Location Data (Trajectory Mining, Co-location Mining,…) 4.Repository Clustering 5.Frameworks and Algorithms for Task-driven Clustering Christoph F. Eick

Department of Computer Science KDD / Data Mining Let us find something interesting!  Motivation: We are drowning in data, but we are staving for knowledge.  Definition := “KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” (Fayyad)  Many commercial and experimental tools and tool suites are available (see  Data mining has become a large research field with top conferences attracting paper submissions Christoph F. Eick

Data Mining & Machine Learning Group ACM-GIS08

Department of Computer Science Extracting Regional Knowledge from Spatial Datasets—Part 1 RD-Algorithm Application 1: Supervised Clustering [EVJW07] Application 2: Regional Association Rule Mining and Scoping [DEWY06, DEYWN07] Application 3: Find Interesting Regions with respect to a Continuous Variables [CRET08] Application 4: Regional Co-location Mining Involving Continuous Variables [EPWSN08] Application 5: Find “representative” regions (Sampling) Application 6: Regional Regression [CE09] Application 7: Multi-Objective Clustering [JEV09] Application 8: Change Analysis in Spatial Datasets [RE09] Wells in Texas: Green: safe well with respect to arsenic Red: unsafe well  =1.01  =1.04 Christoph F. Eick

Department of Computer Science Extracting Regional Knowledge from Spatial Datasets—Part 2 Framework for Mining Regional Knowledge Spatial Databases Integrated Data Set Domain Experts Fitness Functions Family of Clustering Algorithms Regional Association Rule Mining Algorithms Ranked Set of Interesting Regions and their Properties Measures of interestingness Regional Knowledge Regional Knowledge Objective: Develop and implement an integrated framework to automatically discover interesting regional patterns in spatial datasets. Hierarchical Grid-based & Density-based Algorithms Spatial Risk Patterns of Arsenic Christoph F. Eick

Department of Computer Science Mining Spatial Trajectories  Goal: Understand and Characterize Motion Patterns  Themes investigated: Clustering and summarization of trajectories, classification based ontrajectories, likelihood assessment of trajectories, prediction of trajectories. Christoph F. Eick

Department of Computer Science Finding Regional Co-location Patterns in Spatial Datasets Objective: Find co-location regions using various clustering algorithms and novel fitness functions. Applications: 1. Finding regions on planet Mars where shallow and deep ice are co-located, using point and raster datasets. In figure 1, regions in red have very high co- location and regions in blue have anti co-location. 2. Finding co-location patterns involving chemical concentrations with values on the wings of their statistical distribution in Texas ’ ground water supply. Figure 2 indicates discovered regions and their associated chemical patterns. Figure 1: Co-location regions involving deep and shallow ice on Mars Figure 2: Chemical Co-location patterns in Texas Water Supply Christoph F. Eick

Department of Computer Science Subtopics: Disparity Analysis/Emergent Pattern Discovery (“how do two groups differ with respect to their patterns?”) Change Analysis ( “what is new/different?”) Correspondence Clustering (“mining interesting relationships between two or more datasets”) Meta Clustering (“find similarities between multiple datasets”) Analyzing Relationships between Polygonal Cluster Models Example: Analyze Changes with Respect to Regions of High Variance of Earthquake Depth. Novelty (r’) = (r’—(r1  …  rk)) Emerging regions based on the novelty change predicate Time 1 Time 2 Christoph F. Eick Methodologies and Tools to Analyze Related Spatial Datasets

Department of Computer Science Selected Related Publications 1.T. Stepinski, W. Ding, and C. F. Eick, Controlling Patterns of Geospatial Phenomena, to appear in Geoinformatica, Spring V. Rinsurongkawong and C.F. Eick, Correspondence Clustering: An Approach to Cluster Multiple Related Spatial Datasets, to appear in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 10%, Hyderabad, India, June C.-S. Chen, V. Rinsurongkawong, A.Nagar, and C. F. Eick, Mining Trajectories using Non-Parametric Density Functions, submitted to a conference, February W. Ding, T. Stepinski, D. Jiang, R. Parmar and C. F. Eick, Discovery of Feature-based Hot Spots Using Supervised Clustering, in International Journal of Computers & Geosciences, Elsevier, March Discovery of Feature-based Hot Spots Using Supervised Clustering 5.R. Jiamthapthaksin, C. F. Eick, and V. Rinsurongkawong, An Architecture and Algorithms for Multi-Run Clustering, CIDM, Nashville, Tennessee, April An Architecture and Algorithms for Multi-Run Clustering 6.C.-S. Chen, V. Rinsurongkawong, C. F. Eick, M. Twa, Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density Functions in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 29%, Bangkok, May Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density Functions 7.J. Thomas, and C. F. Eick, Online Learning of Spacecraft Simulation Models, acceptance rate: 30%, in Proc. of the 21st Innovative Applications of Artificial Intelligence Conference (IAAI), Pasadena, California, July Online Learning of Spacecraft Simulation Models 8.R. Jiamthapthaksin, C. F. Eick, and R. Vilalta, A Framework for Multi-Objective Clustering and its Application to Co-Location Mining, in Proc. Fifth International Conference on Advanced Data Mining and Applications (ADMA), acceptance rate: 12%, Beijing, China, August A Framework for Multi-Objective Clustering and its Application to Co-Location Mining 9.O.U. Celepcikay and C. F. Eick, REG^2: A Regional Regression Framework for Geo-Referenced Datasets, in Proc. 17th ACM SIGSPATIAL International Conference on Advances in GIS (ACM-GIS), acceptance rate: 20%, Seattle, Washington, November REG^2: A Regional Regression Framework for Geo-Referenced Datasets 10.W. Ding, R. Jiamthapthaksin, R. Parmar, D. Jiang, T. Stepinski, and C. F. Eick, Towards Region Discovery in Spatial Datasets, in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 12%, Osaka, Japan, May Towards Region Discovery in Spatial Datasets 11.C. F. Eick, R. Parmar, W. Ding, T. Stepinki, and J.-P. Nicot, Finding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets, in Proc. 16th ACM SIGSPATIAL International Conference on Advances in GIS (ACM-GIS), acceptance rate: 19%, Irvine, California, November Finding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets 12.J. Choo, R. Jiamthapthaksin, C.-S. Chen, O. Celepcikay, C. Giusti, and C. F. Eick, MOSAIC: A Proximity Graph Approach to Agglomerative Clustering, in Proc. 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK), acceptance rate: 29%, Regensburg, Germany, September MOSAIC: A Proximity Graph Approach to Agglomerative Clustering 13.C. F. Eick, B. Vaezian, D. Jiang, and J. Wang, Discovery of Interesting Regions in Spatial Datasets Using Supervised Clustering, in Proc. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), acceptance rate: 13%, Berlin, Germany, September Discovery of Interesting Regions in Spatial Datasets Using Supervised Clustering 14.W. Ding, C. F. Eick, J. Wang, and X. Yuan, A Framework for Regional Association Rule Mining in Spatial Datasets, in Proc. IEEE International Conference on Data Mining (ICDM), acceptance Rate: 19%, Hong Kong, China, December A Framework for Regional Association Rule Mining in Spatial Datasets 15.A. Bagherjeiran, C. F. Eick, C.-S. Chen, and R. Vilalta, Adaptive Clustering: Obtaining Better Clusters Using Feedback and Past Experience, in Proc. Fifth IEEE International Conference on Data Mining (ICDM), acceptance rate: 21%, Houston, Texas, November Adaptive Clustering: Obtaining Better Clusters Using Feedback and Past Experience 16.C. F. Eick, N. Zeidat, and Z. Zhao, Supervised Clustering --- Algorithms and Benefits, in Proc. International Conference on Tools with AI (ICTAI), acceptance rate: 30%, Boca Raton, Florida, November Supervised Clustering --- Algorithms and Benefits 17.C. F. Eick, N. Zeidat, and R. Vilalta, Using Representative-Based Clustering for Nearest Neighbor Dataset Editing, in Proc. Fourth IEEE International Conference on Data Mining (ICDM), acceptance rate: 22%, Brighton, England, November Using Representative-Based Clustering for Nearest Neighbor Dataset Editing Christoph F. Eick