1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.

Slides:



Advertisements
Similar presentations
Sheeraza Bandi Sheeraza Bandi CS-635 Advanced Machine Learning Zahid Irfan February 2004 Picture © Greg Martin,
Advertisements

Clustering II.
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
LOGO Clustering Lecturer: Dr. Bo Yuan
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Introduction to Bioinformatics
X0 xn w0 wn o Threshold units SOM.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Lecture 09 Clustering-based Learning
Introduction to undirected Data Mining: Clustering
Clustering Unsupervised learning Generating “classes”
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
Hierarchical clustering & Graph theory
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
KOHONEN SELF ORGANISING MAP SEMINAR BY M.V.MAHENDRAN., Reg no: III SEM, M.E., Control And Instrumentation Engg.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Artificial Neural Networks Dr. Abdul Basit Siddiqui Assistant Professor FURC.
More on Microarrays Chitta Baral Arizona State University.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Microarrays.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Clustering.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Cluster Analysis.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Machine Learning Queens College Lecture 7: Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Data Mining and Text Mining. The Standard Data Mining process.
Semi-Supervised Clustering
Machine Learning Clustering: K-means Supervised Learning
Unsupervised Learning - Clustering 04/03/17
Unsupervised Learning - Clustering
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
DATA MINING Introductory and Advanced Topics Part II - Clustering
Cluster Analysis in Bioinformatics
Introduction to Cluster Analysis
SEEM4630 Tutorial 3 – Clustering.
Artificial Neural Networks
Hierarchical Clustering
Presentation transcript:

1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts

What is Clustering? Find K clusters (or a classification that consists of K clusters) so that the objects of one cluster are similar to each other whereas objects of different clusters are dissimilar. (Bacher 1996)

The Goals of Clustering Determine the intrinsic grouping in a set of unlabeled data. What constitutes a good clustering? All clustering algorithms will produce clusters, regardless of whether the data contains them There is no golden standard, depends on goal: data reduction “natural clusters” “useful” clusters outlier detection

Stages in clustering

Taxonomy of Clustering Approaches

Hierarchical Clustering Agglomerative clustering treats each data point as a singleton cluster, and then successively merges clusters until all points have been merged into a single remaining cluster. Divisive clustering works the other way around.

Single link Agglomerative Clustering In single-link hierarchical clustering, we merge in each step the two clusters whose two closest members have the smallest distance.

Complete link Agglomerative Clustering In complete-link hierarchical clustering, we merge in each step the two clusters whose merger has the smallest diameter.

Example – Single Link AC BAFIMINARMTO BA FI MI NA RM TO

Example – Single Link AC

BAFIMI/TONARM BA FI MI/TO NA RM

Example – Single Link AC

BAFIMI/TONA/RM BA FI MI/TO NA/RM

Example – Single Link AC

BA/NA/RMFIMI/TO BA/NA/RM FI MI/TO

Example – Single Link AC

BA/FI/NA/RMMI/TO BA/FI/NA/RM0295 MI/TO2950

Example – Single Link AC

Taxonomy of Clustering Approaches

Square error

K-Means Step 0: Start with a random partition into K clusters Step 1: Generate a new partition by assigning each pattern to its closest cluster center Step 2: Compute new cluster centers as the centroids of the clusters. Step 3: Steps 1 and 2 are repeated until there is no change in the membership (also cluster centers remain the same)

K-Means

K-Means – How many K’s ?

Locating the ‘knee’ The knee of a curve is defined as the point of maximum curvature.

Leader - Follower Online Specify threshold distance Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster

Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster

Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center Distance < Threshold

Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center

Leader - Follower Find the closest cluster center Distance above threshold ? Create new cluster Or else, add instance to cluster and update cluster center Distance > Threshold

Kohonen SOM’s The Self-Organizing Map (SOM) is an unsupervised artificial neural network algorithm. It is a compromise between biological modeling and statistical data processing

Kohonen SOM’s  Each weight is representative of a certain input.  Input patterns are shown to all neurons simultaneously.  Competitive learning: the neuron with the largest response is chosen.

Kohonen SOM’s Initialize weights Repeat until convergence Select next input pattern Find Best Matching Unit Update weights of winner and neighbours Decrease learning rate & neighbourhood size Learning rate & neighbourhood size

Kohonen SOM’s Distance related learning

Kohonen SOM’s

Some nice illustrations

Kohonen SOM’s Kohonen SOM Demo (from ai-junkie.com):Demo mapping a 3D colorspace on a 2D Kohonen map

Performance Analysis K-Means Depends a lot on a priori knowledge (K) Very Stable Leader Follower Depends a lot on a priori knowledge (Threshold) Faster but unstable

Performance Analysis Self Organizing Map Stability and Convergence Assured Principle of self-ordering Slow and many iterations needed for convergence Computationally intensive

Conclusion No Free Lunch theorema Any elevated performance over one class, is exactly paid for in performance over another class Ensemble clustering ? Use SOM and Basic Leader Follower to identify clusters and then use k-mean clustering to refine.

Any Questions ? ?