Clustering John Owen Sarah Smith.

Slides:



Advertisements
Similar presentations
Copyright Jiawei Han, modified by Charles Ling for CS411a
Advertisements

CS690L: Clustering References:
Microarray GEO – Microarray sets database
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
What is Cluster Analysis
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Anomaly Detection. Anomaly/Outlier Detection  What are anomalies/outliers? The set of data points that are considerably different than the remainder.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
CLUSTERING (Segmentation)
Data Mining – Intro.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Introduction to machine learning
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
MACHINE LEARNING 張銘軒 譚恆力 1. OUTLINE OVERVIEW HOW DOSE THE MACHINE “ LEARN ” ? ADVANTAGE OF MACHINE LEARNING ALGORITHM TYPES  SUPERVISED.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Data Clustering 1 – An introduction
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Unsupervised learning introduction
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by.
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Prepared by: Mahmoud Rafeek Al-Farra
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Clustering.
Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Mining Job Monitoring Data Automatic Error.
Data Mining Tarek Soukieh 11/18/2010. Agenda 1.The Evolution of Database Technology 2.Introduction 3.Data Preprocessing 4.OLAP vs. Data Mining 5.Data.
Data Mining – Intro.
What Is Cluster Analysis?
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Clustering CSC 600: Data Mining Class 21.
Chapter 15 – Cluster Analysis
Data Mining Soongsil University
Machine Learning overview Chapter 18, 21
Machine Learning overview Chapter 18, 21
DATA MINING © Prentice Hall.
Data Mining K-means Algorithm
Topic 3: Cluster Analysis
©Jiawei Han and Micheline Kamber Department of Computer Science
What is Pattern Recognition?
An Excel-based Data Mining Tool
Outlier Discovery/Anomaly Detection
John Nicholas Owen Sarah Smith
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
CSE572, CBS598: Data Mining by H. Liu
Data Warehousing and Data Mining
Data Mining Anomaly/Outlier Detection
An Introduction to Supervised Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
DATA MINING Introductory and Advanced Topics Part II - Clustering
CSCI N317 Computation for Scientific Applications Unit Weka
CSE572, CBS572: Data Mining by H. Liu
Clustering Wei Wang.
Topic 5: Cluster Analysis
CSE572: Data Mining by H. Liu
Data Mining Anomaly Detection
Machine Learning overview Chapter 18, 21
Data Mining Anomaly Detection
Presentation transcript:

Clustering John Owen Sarah Smith

What is clustering? Grouping together objects that are like one another and not like the objects other clusters. Like sorting laundry…

What is Clustering? Has its routes in statistical analysis In data mining, clustering is used to give a user a high level view of what is going on in their database.

Clustering Approach Algorithms can be complex The general approach contains five steps Pattern representation Identify the pattern proximity relative to the data domain Grouping or Clustering of the data. Data abstraction Assessment of output.

Four Clustering Methods Partitioning (k-means clustering) Hierarchical Density-Based Grid-Based

Partitioning (k-means clustering) Classification of the data into k groups, which meet two requirements each group must contain at least one object, and each object must belong to exactly one group The analyst decides how many clusters there should be, then creates the best fit of points to a cluster The analyst must know the data to do this

Partitioning Example (Source k-means clustering http://www.togaware.com/datamining/survivor/K_Means.html)

Hierarchical Clustering Analyst need not know the data Designed primarily for creating micro-clusters in large database sets Hierarchal method is either agglomerative (bottom-up) or divisive (top-down)

Hierarchical Example (Source http://genome.imim.es/~eblanco/seminars/docs/clustering/index_types.html#hierarchy)

Density-Based Clustering Defines the data by the density of the data distribution Does not require the user to identify the number of clusters before beginning the data analysis Useful for dealing with outliers

Density-Based Examples (Source: http://klimt.iwr.uni-heidelberg.de/mip/research/hader_clust/)

Grid-Based Clustering Adaptation of Density-Based Clustering Data points are placed in a data grid Each data grid is of equal size Grids can be decomposed into smaller grids

Grid-Based Example

Business Uses of Clustering Marketing Identifying customers/clients who are outliers Detection of Credit Card Fraud Scientific inquiry Human genome

Future of Clustering AI Unsupervised Learning from pattern recognition