UH-COSC Events Today, 4-6p: Student Welcome Party

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

Chung Sheng CHEN, Nauful SHAIKH, Panitee CHAROENRATTANARUK, Christoph F. EICK, Nouhad RIZK and Edgar GABRIEL Department of Computer Science, University.
Data Mining – Intro.
UH Data Mining & Machine Learning Group May 1, 2009 Christoph F. Eick Department of Computer Science University of Houston A Domain-Driven Framework.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick Department of Computer Science, University of Houston 1.Motivation: Examples of.
Department of Computer Science 1 Data Mining / KDD Let us find something interesting! Definition := “KDD is the non-trivial process of identifying valid,
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for the Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Discovering Interesting Regions in Spatial Data Sets using Supervised Clustering Christoph F. Eick, Banafsheh Vaezian, Dan Jiang, Jing Wang PKDD Conference,
Chapter 1 Introduction to Data Mining
Name: Sujing Wang Advisor: Dr. Christoph F. Eick
A N A RCHITECTURE AND A LGORITHMS FOR M ULTI -R UN C LUSTERING Rachsuda Jiamthapthaksin, Christoph F. Eick and Vadeerat Rinsurongkawong Computer Science.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
Extracting Regional Knowledge from Spatial Datasets Christoph F. Eick Department of Computer Science, University of Houston 1.Motivation: Why is Regional.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Department of Computer Science 2015 Research Areas and Projects 1.Data Mining and Machine Learning Group (UH-DMML) Its research is focusing on: 1.Spatial.
1. Data Mining (or KDD) Let us find something interesting! Definition := “Data Mining is the non-trivial process of identifying valid, novel, potentially.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
MOSAIC: A Proximity Graph Approach for Agglomerative Clustering Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-shen Chen, Ulvi Celepcikay, Christian Guisti,
Department of Computer Science 1 KDD / Data Mining Let us find something interesting!  Motivation: We are drowning in data, but we are staving for knowledge.
1 Eick, Zeidat, Vilalta: Using Representative-based Clustering for NN Dataset Editing (ICDM04) Using Representative-Based Clustering For Nearest Neighbour.
Data Mining & Machine Learning Group ADMA09 Rachsuda Jianthapthaksin, Christoph F. Eick and Ricardo Vilalta University of Houston, Texas, USA A Framework.
Data Mining & Machine Learning Group UH-DMML: Ongoing Data Mining Research Data Mining and Machine Learning Group, Computer Science Department,
Change Analysis in Spatial Datasets by Interestingness Comparison Vadeerat Rinsurongkawong, and Christoph F. Eick Department of Computer Science, University.
Data Mining and Machine Learning Group (UH-DMML) Wei Ding Rachana Parmar Ulvi Celepcikay Ji Yeon Choo Chun-Sheng Chen Abraham Bagherjeiran Soumya Ghosh.
Department of Computer Science Research Focus of UH-DMML Christoph F. Eick Data Mining Geographical Information Systems (GIS) High Performance Computing.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
Department of Computer Science 1 Data Mining / KDD Let us find something interesting! Definition := “KDD is the non-trivial process of identifying valid,
Data Mining & Machine Learning Group UH-DMML: Ongoing Data Mining Research Data Mining and Machine Learning Group, Computer Science Department, University.
Department of Computer Science Research Areas and Projects 1. Data Mining and Machine Learning Group ( research.
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Discovering Interesting Regions in Spatial Data Sets Christoph F. Eick for Data Mining Class 1.Motivation: Examples of Region Discovery 2.Region Discovery.
What Else is Important in AI we Did not Cover?
Data Mining – Intro.
Who am I? Work in Probabilistic Machine Learning Like to teach 
Model Discovery through Metalearning
Meeting 02/27/2017 Short Overview UH-DAIS Lab Research
Eick: Introduction Machine Learning
Meetings 05/22/2017 Research Interests in Flooding
Meeting 03/24/2017 Short Overview UH-DAIS Lab Research
Location Prediction and Spatial Data Mining (S. Shekhar)
Research Focus Objectives: The Data Analysis and Intelligent Systems (DAIS) Lab  aims at the development of data analysis, data mining, GIS and artificial.
Research Areas and Projects
Meeting 02/27/2017 Short Overview UH-DAIS Lab Research
COSC 6335 Data Mining Fall 2009: Assignment3a Post Analysis
Data Analysis and Intelligent Systems Lab
Research Focus Objectives: The Data Analysis and Intelligent Systems (DAIS) Lab  aims at the development of data analysis, data mining, GIS and artificial.
(Geo) Informatics across Disciplines!
Data Analysis and Intelligent Systems Lab
Using Supervised Clustering to Enhance Classifiers
Research Areas Christoph F. Eick
Data Analysis and Intelligent Systems Lab
Data Warehousing and Data Mining
UH-DMML: Ongoing Data Mining Research
Section 4: see other Slide Show
Section 4: see other Slide Show
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Data Analysis and Intelligent Systems Lab
Frameworks and Algorithms for Regional Knowledge Discovery
Brainstorming How to Analyze the 3AuCountHand Datasets
3.1.1 Introduction to Machine Learning
Spatial Data Mining Definition: Spatial data mining is the process of discovering interesting patterns from large spatial datasets; it organizes by location.
Christoph F. Eick: A Gentle Introduction to Machine Learning
Topological Signatures For Fast Mobility Analysis
Machine Learning for Space Systems: Are We Ready?
Promising “Newer” Technologies to Cope with the
Presentation transcript:

UH-COSC Events Today, 4-6p: Student Welcome Party Friday, February 10 (whole day), 2012 PhD Showcase Event Proposal: Friday, March 30 (half day including lunch): Computer Science Student Dreams 2012 (Students make 3-10 minute presentations what the like to see happening in the Computer Science Department and in the Field of Computer Science in general, about their vision about teaching, research and their own future, future jobs, and the role of computer science in our society) … Christoph F. Eick

Research Focus of UH-DMML Helping Scientists to Make Sense of their Data Geographical Information Systems (GIS) Machine Learning Data Mining High Performance Computing Output: Graduated 12 PhD students (5 in 2009-11) and 76 Master Students Christoph F. Eick

Some UH-DMML Graduates 1 Tae-wan Ryu, Professor, Department of Computer Science, California State University, Fullerton Dr. Wei Ding, Assistant Professor Department of Computer Science, University of Massachusetts, Boston Sharon M. Tuttle, Professor, Department of Computer Science, Humboldt State University, Arcata, California Christoph F. Eick

Some UH-DMML Graduates 2 Ruth Miller PhD Postdoc Washington University in St. Louis, Department of Genetics, Conrad Lab – Human Genetics and Reproductive Biology Rachsuda Jiamthapthaksin PhD Lecturer Assumption University, Bangkok, Thailand Justin Thomas MS Section Supervisor at Johns Hopkins University Applied Physics Laboratory Meikang Wu MS Microsoft, Bellevue, Washington Jing Wang MS AOL, California Christoph F. Eick

Research Areas and Projects Data Mining and Machine Learning Group (http://www2.cs.uh.edu/~UH-DMML/index.html), research is focusing on: Spatial Data Mining Clustering Helping Scientists to Make Sense out of their Data Classification and Prediction Current Projects Spatial Clustering Algorithms with Plug-in Fitness Functions and Other Non-Traditional Clustering Approaches Modeling and Understanding Progression in Spatial Datasets Mining Complex Spatial Objects (polygons, trajectories) Data Mining with a lot of Cores Christoph F. Eick

Non-Traditional Clustering Algorithms With plug-in Fitness Functions Interestingness Hotspot Discovery in Spatial Datasets Mining Related Datasets Parallel CLEVER Parallel Computing Randomized Hill Climbing With a Lot of Cores UH-DMML

Discovering Spatial Interestingness Hotspots Interestingness hotspots of areas where both income and CTR is high. Ch. Eick

Models for Progression of Hotspots and Other Spatial Objects ? Ozone Hotspot Evolution ? Building Evolution ? Progression of Glaucoma Ch. Eick

Models for Progression of Hotspots and Other Spatial Objects ? Task: The goal is to develop models of progression Those models allow to predict the next states, following a given sequence of states Models are learnt, like ordinary machine learning models Challenges: Representation of Models of Change (e.g. How do we describe changes in building structures? 2. Learning Models of Change from Training examples Ch. Eick

Helping Scientists to Make Sense out of their Data Figure 1: Co-location regions involving deep and shallow ice on Mars Figure 2: Chemical co-location patterns in Texas Water Supply Figure 3: Mining Hurricane Trajectories Ch. Eick

UH-DMML Mission Statement The Data Mining and Machine Learning Group at the University of Houston aims at the development of data analysis, data mining, and machine-learning techniques and to apply those techniques to challenging problems in geology, astronomy, environmental sciences, social sciences and medicine. In general, our research group has a strong background in the areas of clustering and spatial data mining. Areas of our current research include: meta-learning, density-based clustering and clustering with plug-in fitness functions, association analysis, interestingness hotspot discovery, geo-regression , change and progression analysis, polygon and trajectory mining and using machine learning for simulation. Website: http://www2.cs.uh.edu/~UH-DMML/index.html Research Group Publications: http://www2.cs.uh.edu/~ceick/pub.html Data Mining Course Website: http://www2.cs.uh.edu/~ceick/DM/DM11.html Ch. Eick

Clustering and Hotspot Discovery in Labeled Graphs Potential Problems to be investigated: 1. Clustering Protein Based on Their Interactions 2. Generalize Region Discovery Framework to Graphs Partitioning Using Plug-in Interestingness Functions 3. … 4. … Ch. Eick

Methodologies and Tools to Analyze and Mine Related Datasets Subtopics: Disparity Analysis/Emergent Pattern Discovery (“how do two groups differ with respect to their patterns?”) [SDE10] Change Analysis ( “what is new/different?”) [CVET09] Correspondence Clustering (“mining interesting relationships between two or more datasets”) [RE10] Meta Clustering (“cluster cluster models of multiple datasets”) Analyzing Relationships between Polygonal Cluster Models Example: Analyze Changes with Respect to Regions of High Variance of Earthquake Depth. Time 1 Time 2 Novelty (r’) = (r’—(r1 … rk)) Emerging regions based on the novelty change predicate UH-DMML

Mining Related Datasets Using Polygon Analysis Work on a methodology that does the following: Generate polygons from spatial cluster extensions / from continuous density or interpolation functions. Meta cluster polygons / set of polygons Extract interesting patterns / create summaries from polygonal meta clusters Analysis of Glaucoma Progression Analysis of Ozone Hotspots Christoph F. Eick

Mining Spatial Trajectories Goal: Understand and Characterize Motion Patterns Themes investigated: Clustering and summarization of trajectories, classification based on trajectories, likelihood assessment of trajectories, prediction of trajectories. Arctic Tern Arctic Tern Migration Hurricanes in the Golf of Mexico UH-DMML

Current UH-DMML Activities Mining Related Datasets & Polygon Analysis Regional Knowledge Extraction Cluster Correspondence Analysis Yahoo! User Modeling Strasbourg Building Evolution Understanding Glaucoma Knowledge Scoping POLY/TRAJ- SNN Regional Association Analysis Discrepancy Mining Polygonal Meta Clustering Air Pollution Analysis Parallel CLEVER TRAJ-CLEVER Poly-CLEVER Regional Regression Classification Clustering Cluster Polygon Generation SCMRG Sub-Trajectory Mining Trajectory Density Estimation MOSAIC Repository Clustering Trajectory Mining Animal Motion Analysis Cougar^2 Spatial Clustering Algorithms With Plug-in Fitness Functions Christoph F. Eick

What Courses Should You Take to Conduct Data Mining Research? Data Mining (COSC 6335) Machine Learning Parallel Programming/High Performance Computing, AI, Software Design, Data Structures, Databases, Sensor Networks,… UH-DMML

Mining Motion Pattern of Animals Diverse animal groups, such as birds, fish, mammals (terrestrial/marine/flying: wildebeest/whales/bats), reptiles (e.g. sea turtles), amphibians, insects and marine invertebrates undertake migration. Bird Flu/H5N1 Wildebeest Understanding Motion Patterns Predicting Future Events Primary goals: Why is Mining Animal Motion Patterns Important? Understanding of the ecology, life history, and behavior Effective conservation and effective control Conserving the dwindling population of endangered species Early detection and prevention of disease outbreaks Correlating climate change with animal motion patterns UH-DMML

In the last 4 years, our research group developed spatial data mining methodologies, algorithms and tools. One of our main contributions is a region discovery framework. The framework provides search engine type capabilities to scientists to “find interesting places in spatial datasets”. A second contribution is the development of a family spatial clustering algorithms with plug-in fitness functions. Plug-in fitness functions enable scientists to describe the characteristics of clusters they are interested in. A third contribution are co-location and correlation mining frameworks. The figure on the upper left depicts a data mining result concerning co-location patterns between deep and shallow ice on Mars. The area in red indicate regions on Mars in which deep and shallow ice are co-located, and the areas in blue indicate regions where deep and shallow ice are anti-co-located. Finally, more recently, we started some new research centering on change analysis in spatial datasets.

Selected Related Publications T. Stepinski, W. Ding, and C. F. Eick, Controlling Patterns of Geospatial Phenomena, to appear in Geoinformatica, Spring 2010. V. Rinsurongkawong and C.F. Eick, Correspondence Clustering: An Approach to Cluster Multiple Related Spatial Datasets, to appear in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 10%, Hyderabad, India, June 2010. C.-S. Chen, V. Rinsurongkawong, A.Nagar, and C. F. Eick, Mining Trajectories using Non-Parametric Density Functions, submitted to a conference, February 2010. W. Ding, T. Stepinski, D. Jiang, R. Parmar and C. F. Eick, Discovery of Feature-based Hot Spots Using Supervised Clustering, in International Journal of Computers & Geosciences, Elsevier, March 2009. R. Jiamthapthaksin, C. F. Eick, and V. Rinsurongkawong, An Architecture and Algorithms for Multi-Run Clustering, CIDM, Nashville, Tennessee, April 2009. C.-S. Chen, V. Rinsurongkawong, C. F. Eick, M. Twa, Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density Functions in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 29%, Bangkok, May 2009. J. Thomas, and C. F. Eick, Online Learning of Spacecraft Simulation Models, acceptance rate: 30%, in Proc. of the 21st Innovative Applications of Artificial Intelligence Conference (IAAI), Pasadena, California, July 2009. R. Jiamthapthaksin, C. F. Eick, and R. Vilalta, A Framework for Multi-Objective Clustering and its Application to Co-Location Mining, in Proc. Fifth International Conference on Advanced Data Mining and Applications (ADMA), acceptance rate: 12%, Beijing, China, August 2009. O.U. Celepcikay and C. F. Eick, REG^2: A Regional Regression Framework for Geo-Referenced Datasets, in Proc. 17th ACM SIGSPATIAL International Conference on Advances in GIS (ACM-GIS), acceptance rate: 20%, Seattle, Washington, November 2009. W. Ding, R. Jiamthapthaksin, R. Parmar, D. Jiang, T. Stepinski, and C. F. Eick, Towards Region Discovery in Spatial Datasets, in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 12%, Osaka, Japan, May 2008. C. F. Eick, R. Parmar, W. Ding, T. Stepinki, and J.-P. Nicot, Finding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets, in Proc. 16th ACM SIGSPATIAL International Conference on Advances in GIS (ACM-GIS), acceptance rate: 19%, Irvine, California, November 2008. J. Choo, R. Jiamthapthaksin, C.-S. Chen, O. Celepcikay, C. Giusti, and C. F. Eick, MOSAIC: A Proximity Graph Approach to Agglomerative Clustering, in Proc. 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK), acceptance rate: 29%, Regensburg, Germany, September 2007. C. F. Eick, B. Vaezian, D. Jiang, and J. Wang, Discovery of Interesting Regions in Spatial Datasets Using Supervised Clustering, in Proc. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), acceptance rate: 13%, Berlin, Germany, September 2006. W. Ding, C. F. Eick, J. Wang, and X. Yuan, A Framework for Regional Association Rule Mining in Spatial Datasets, in Proc. IEEE International Conference on Data Mining (ICDM), acceptance Rate: 19%, Hong Kong, China, December 2006. A. Bagherjeiran, C. F. Eick, C.-S. Chen, and R. Vilalta, Adaptive Clustering: Obtaining Better Clusters Using Feedback and Past Experience, in Proc. Fifth IEEE International Conference on Data Mining (ICDM), acceptance rate: 21%, Houston, Texas, November 2005. C. F. Eick, N. Zeidat, and Z. Zhao, Supervised Clustering --- Algorithms and Benefits, in Proc. International Conference on Tools with AI (ICTAI), acceptance rate: 30%, Boca Raton, Florida, November 2004. C. F. Eick, N. Zeidat, and R. Vilalta, Using Representative-Based Clustering for Nearest Neighbor Dataset Editing, in Proc. Fourth IEEE International Conference on Data Mining (ICDM), acceptance rate: 22%, Brighton, England, November 2004. UH-DMML

Extracting Regional Knowledge from Spatial Datasets Application 1: Supervised Clustering [EVJW07] Application 2: Regional Association Rule Mining and Scoping [DEWY06, DEYWN07] Application 3: Find Interesting Regions with respect to a Continuous Variables [CRET08] Application 4: Regional Co-location Mining Involving Continuous Variables [EPWSN08] Application 5: Find “representative” regions (Sampling) Application 6: Regional Regression [CE09] Application 7: Multi-Objective Clustering [JEV09] Application 8: Change Analysis in Spatial Datasets [RE09] b=1.01 RD-Algorithm In contrast to other work in spatial data mining, our work centers on extracting regional or local knowledge from spatial datasets, and not on finding global patters. In particular, we are interested in assisting scientists in finding interesting regions in spatial datasets based on their particular notation of interestingness. b=1.04 Wells in Texas: Green: safe well with respect to arsenic Red: unsafe well UH-DMML

Finding Regional Co-location Patterns in Spatial Datasets Figure 1: Co-location regions involving deep and shallow ice on Mars Figure 2: Chemical Co-location patterns in Texas Water Supply Objective: Find co-location regions using various clustering algorithms and novel fitness functions. Applications: 1. Finding regions on planet Mars where shallow and deep ice are co-located, using point and raster datasets. In figure 1, regions in red have very high co-location and regions in blue have anti co-location. 2. Finding co-location patterns involving chemical concentrations with values on the wings of their statistical distribution in Texas’ ground water supply. Figure 2 indicates discovered regions and their associated chemical patterns. UH-DMML

Selected Related Publications T. Stepinski, W. Ding, and C. F. Eick, Controlling Patterns of Geospatial Phenomena, to appear in Geoinformatica, Spring 2010. V. Rinsurongkawong and C.F. Eick, Correspondence Clustering: An Approach to Cluster Multiple Related Spatial Datasets, to appear in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 10%, Hyderabad, India, June 2010. C.-S. Chen, V. Rinsurongkawong, A.Nagar, and C. F. Eick, Mining Trajectories using Non-Parametric Density Functions, submitted to a conference, February 2010. W. Ding, T. Stepinski, D. Jiang, R. Parmar and C. F. Eick, Discovery of Feature-based Hot Spots Using Supervised Clustering, in International Journal of Computers & Geosciences, Elsevier, March 2009. R. Jiamthapthaksin, C. F. Eick, and V. Rinsurongkawong, An Architecture and Algorithms for Multi-Run Clustering, CIDM, Nashville, Tennessee, April 2009. C.-S. Chen, V. Rinsurongkawong, C. F. Eick, M. Twa, Change Analysis in Spatial Data by Combining Contouring Algorithms with Supervised Density Functions in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 29%, Bangkok, May 2009. J. Thomas, and C. F. Eick, Online Learning of Spacecraft Simulation Models, acceptance rate: 30%, in Proc. of the 21st Innovative Applications of Artificial Intelligence Conference (IAAI), Pasadena, California, July 2009. R. Jiamthapthaksin, C. F. Eick, and R. Vilalta, A Framework for Multi-Objective Clustering and its Application to Co-Location Mining, in Proc. Fifth International Conference on Advanced Data Mining and Applications (ADMA), acceptance rate: 12%, Beijing, China, August 2009. O.U. Celepcikay and C. F. Eick, REG^2: A Regional Regression Framework for Geo-Referenced Datasets, in Proc. 17th ACM SIGSPATIAL International Conference on Advances in GIS (ACM-GIS), acceptance rate: 20%, Seattle, Washington, November 2009. W. Ding, R. Jiamthapthaksin, R. Parmar, D. Jiang, T. Stepinski, and C. F. Eick, Towards Region Discovery in Spatial Datasets, in Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), acceptance rate: 12%, Osaka, Japan, May 2008. C. F. Eick, R. Parmar, W. Ding, T. Stepinki, and J.-P. Nicot, Finding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets, in Proc. 16th ACM SIGSPATIAL International Conference on Advances in GIS (ACM-GIS), acceptance rate: 19%, Irvine, California, November 2008. J. Choo, R. Jiamthapthaksin, C.-S. Chen, O. Celepcikay, C. Giusti, and C. F. Eick, MOSAIC: A Proximity Graph Approach to Agglomerative Clustering, in Proc. 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK), acceptance rate: 29%, Regensburg, Germany, September 2007. C. F. Eick, B. Vaezian, D. Jiang, and J. Wang, Discovery of Interesting Regions in Spatial Datasets Using Supervised Clustering, in Proc. 10th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), acceptance rate: 13%, Berlin, Germany, September 2006. W. Ding, C. F. Eick, J. Wang, and X. Yuan, A Framework for Regional Association Rule Mining in Spatial Datasets, in Proc. IEEE International Conference on Data Mining (ICDM), acceptance Rate: 19%, Hong Kong, China, December 2006. A. Bagherjeiran, C. F. Eick, C.-S. Chen, and R. Vilalta, Adaptive Clustering: Obtaining Better Clusters Using Feedback and Past Experience, in Proc. Fifth IEEE International Conference on Data Mining (ICDM), acceptance rate: 21%, Houston, Texas, November 2005. C. F. Eick, N. Zeidat, and Z. Zhao, Supervised Clustering --- Algorithms and Benefits, in Proc. International Conference on Tools with AI (ICTAI), acceptance rate: 30%, Boca Raton, Florida, November 2004. C. F. Eick, N. Zeidat, and R. Vilalta, Using Representative-Based Clustering for Nearest Neighbor Dataset Editing, in Proc. Fourth IEEE International Conference on Data Mining (ICDM), acceptance rate: 22%, Brighton, England, November 2004. UH-DMML

A Framework for Extracting Regional Knowledge from Spatial Datasets Objective: Develop and implement an integrated framework to automatically discover interesting regional patterns in spatial datasets. Hierarchical Grid-based & Density-based Algorithms Framework for Mining Regional Knowledge Spatial Databases Integrated Data Set Domain Experts Fitness Functions Family of Clustering Algorithms Regional Association Rule Mining Algorithms Ranked Set of Interesting Regions and their Properties Measures of interestingness Regional Knowledge Given: A dataset O with a schema R A distance function d defined on instances of R A fitness function q(X) that evaluates clustering X={c1,…,ck} as follows: q(X)= cX reward(c)=cX interestingness(c)*size(c) with b>1 Objective: Find c1,…,ck  O such that: cicj= if ij X={c1,…,ck} maximizes q(X) All cluster ciX are contiguous (each pair of objects belonging to ci has to be delaunay-connected with respect to ci and to d) c1,…,ck  O c1,…,ck are usually ranked based on the reward each cluster receives, and low reward clusters are frequently not reported Spatial Risk Patterns of Arsenic UH-DMML

Methodologies and Tools to Analyze and Mine Related Datasets Subtopics: Disparity Analysis/Emergent Pattern Discovery (“how do two groups differ with respect to their patterns?”) [SDE10] Change Analysis ( “what is new/different?”) [CVET09] Correspondence Clustering (“mining interesting relationships between two or more datasets”) [RE10] Meta Clustering (“cluster cluster models of multiple datasets”) Analyzing Relationships between Polygonal Cluster Models Example: Analyze Changes with Respect to Regions of High Variance of Earthquake Depth. Time 1 Time 2 Novelty (r’) = (r’—(r1 … rk)) Emerging regions based on the novelty change predicate UH-DMML

REG^2: a Regional Regression Framework Motivation: Regression functions spatially vary, as they are not constant over space Goal: To discover regions with strong relationships between dependent & independent variables and extract their regional regression functions. Discovered Regions and Regression Functions REG^2 Outperforms Other Models in SSE_TR Clustering algorithms with plug-in fitness functions are employed to find such region; the employed fitness functions reward regions with a low generalization error. Various schemes are explored to estimate the generalization error: example weighting, regularization, penalizing model complexity and using validation sets,… AIC Fitness VAL Fitness RegVAL Fitness WAIC Fitness Arsenic 5.01% 11.19% 3.58% 13.18% Boston 29.80% 35.69% 38.98% 36.60% Regularization Improves Prediction Accuracy UH-DMML

Research Areas and Projects Data Mining and Machine Learning Group (http://www2.cs.uh.edu/~UH-DMML/index.html), research is focusing on: Spatial Data Mining Clustering Helping Scientists to Make Sense out of their Data Classification and Prediction Current Projects Spatial Clustering Algorithms with Plug-in Fitness Functions and Other Non-Traditional Clustering Approaches Modeling and Understanding Progression in Spatial Datasets Mining Complex Spatial Objects (polygons, trajectories) Data Mining with a lot of Cores Christoph F. Eick