DATA MINING Spatial Clustering

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Hierarchical Clustering, DBSCAN The EM Algorithm
CS690L: Clustering References:
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
BIRCH: Is It Good for Databases? A review of BIRCH: An And Efficient Data Clustering Method for Very Large Databases by Tian Zhang, Raghu Ramakrishnan.
Birch: Balanced Iterative Reducing and Clustering using Hierarchies By Tian Zhang, Raghu Ramakrishnan Presented by Vladimir Jelić 3218/10
DBSCAN – Density-Based Spatial Clustering of Applications with Noise M.Ester, H.P.Kriegel, J.Sander and Xu. A density-based algorithm for discovering clusters.
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Chapter 3: Cluster Analysis
Tian Zhang Raghu Ramakrishnan Miron Livny Presented by: Peter Vile BIRCH: A New data clustering Algorithm and Its Applications.
Clustering II.
4. Clustering Methods Concepts Partitional (k-Means, k-Medoids)
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Cluster Analysis.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Birch: An efficient data clustering method for very large databases
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Cluster Analysis Part II. Learning Objectives Hierarchical Methods Density-Based Methods Grid-Based Methods Model-Based Clustering Methods Outlier Analysis.
9/03Data Mining – Clustering G Dong (WSU) 1 4. Clustering Methods Concepts Partitional (k-Means, k-Medoids) Hierarchical (Agglomerative & Divisive, COBWEB)
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
BIRCH: An Efficient Data Clustering Method for Very Large Databases Tian Zhang, Raghu Ramakrishnan, Miron Livny University of Wisconsin-Maciison Presented.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
Presented by Ho Wai Shing
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Other Clustering Techniques
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Clustering.
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
CSE 4705 Artificial Intelligence
Hierarchical Clustering: Time and Space requirements
DATA MINING Introductory and Advanced Topics Part III – Web Mining
Slides by Eamonn Keogh (UC Riverside)
What Is the Problem of the K-Means Method?
BIRCH: An Efficient Data Clustering Method for Very Large Databases
CS 685: Special Topics in Data Mining Jinze Liu
数据挖掘 Introduction to Data Mining
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
The University of Adelaide, School of Computer Science
CSE572, CBS598: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
CS 485G: Special Topics in Data Mining
DATA MINING Introductory and Advanced Topics Part II - Clustering
CS 685G: Special Topics in Data Mining
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
CSE572, CBS572: Data Mining by H. Liu
CSE572, CBS572: Data Mining by H. Liu
Birch presented by : Bahare hajihashemi Atefeh Rahimi
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Clustering Large Datasets in Arbitrary Metric Space
CSE572: Data Mining by H. Liu
Lecture 10 Clustering.
CS 685: Special Topics in Data Mining Jinze Liu
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
CS 685G: Special Topics in Data Mining
Presentation transcript:

DATA MINING Spatial Clustering Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002. © Prentice Hall

Nearest Neighbor Items are iteratively merged into the existing clusters that are closest. Incremental Threshold, t, used to determine if items are added to existing clusters or a new cluster is created. © Prentice Hall

Nearest Neighbor Algorithm © Prentice Hall

PAM Partitioning Around Medoids (PAM) (K-Medoids) Handles outliers well. Ordering of input does not impact results. Does not scale well. Each cluster represented by one item, called the medoid. Initial set of k medoids randomly chosen. © Prentice Hall

PAM © Prentice Hall

PAM Cost Calculation At each step in algorithm, medoids are changed if the overall cost is improved. Cjih – cost change for an item tj associated with swapping medoid ti with non-medoid th. © Prentice Hall

PAM Algorithm © Prentice Hall

BIRCH Balanced Iterative Reducing and Clustering using Hierarchies Incremental, hierarchical, one scan Save clustering information in a tree Each entry in the tree contains information about one cluster New nodes inserted in closest entry in tree © Prentice Hall

Clustering Feature CT Triple: (N,LS,SS) N: Number of points in cluster LS: Sum of points in the cluster SS: Sum of squares of points in the cluster CF Tree Balanced search tree Node has CF triple for each child Leaf node represents cluster and has CF value for each subcluster in it. Subcluster has maximum diameter © Prentice Hall

BIRCH Algorithm © Prentice Hall

Improve Clusters © Prentice Hall

DBSCAN Density Based Spatial Clustering of Applications with Noise Outliers will not effect creation of cluster. Input MinPts – minimum number of points in cluster Eps – for each point in cluster there must be another point in it less than this distance away. © Prentice Hall

DBSCAN Density Concepts Eps-neighborhood: Points within Eps distance of a point. Core point: Eps-neighborhood dense enough (MinPts) Directly density-reachable: A point p is directly density-reachable from a point q if the distance is small (Eps) and q is a core point. Density-reachable: A point si density-reachable form another point if there is a path from one to the other consisting of only core points. © Prentice Hall

Density Concepts © Prentice Hall

DBSCAN Algorithm © Prentice Hall

CURE Clustering Using Representatives Use many points to represent a cluster instead of only one Points will be well scattered © Prentice Hall

CURE Approach © Prentice Hall

CURE Algorithm © Prentice Hall

CURE for Large Databases © Prentice Hall

Comparison of Clustering Techniques © Prentice Hall