DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested.

Slides:



Advertisements
Similar presentations
Clustering Overview Algorithm Begin with all sequences in one cluster While splitting some cluster improves the objective function: { Split each cluster.
Advertisements

Algorithms and applications
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
K-means algorithm 1)Pick a number (k) of cluster centers 2)Assign every gene to its nearest cluster center 3)Move each cluster center to the mean of its.
Take It To the Limit Limits of Functions.
Incremental Clustering Previous clustering algorithms worked in “batch” mode: processed all points at essentially the same time. Some IR applications cluster.
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Machine Learning and Data Mining Clustering
Introduction to Bioinformatics
UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
Local Clustering Algorithm DISCOVIR Image collection within a client is modeled as a single cluster. Current Situation.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering Color/Intensity
K-means clustering CS281B Winter02 Yan Wang and Lihua Lin.
Lecture 09 Clustering-based Learning
Evaluating Performance for Data Mining Techniques
$100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300.
Cluster Analysis Market Segmentation Document Similarity.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
In this section, we will investigate how to take the derivative of the product or quotient of two functions.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Landsat unsupervised classification Zhuosen Wang 1.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
COMP Data Mining: Concepts, Algorithms, and Applications 1 K-means Arbitrarily choose k objects as the initial cluster centers Until no change,
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Genotype Calling Matt Schuerman. Biological Problem How do we know an individual’s SNP values (genotype)? Each SNP can have two values (A/B) Each individual.
Flat clustering approaches
Conic Sections The Parabola. Introduction Consider a ___________ being intersected with a __________.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
K means ++ and K means Parallel Jun Wang. Review of K means Simple and fast Choose k centers randomly Class points to its nearest center Update centers.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Jawad Tahsin Danish Mustafa Zaidi Kazim Zaidi Zulfiqar Hadi.
Data Mining – Algorithms: K Means Clustering
Clustering (1) Clustering Similarity measure Hierarchical clustering
Data Mining: Basic Cluster Analysis
Clustering MacKay - Chapter 20.
Clustering 1 (Introduction and kmean)
Clustering and Segmentation
ASSIGNMENT NO.-2.
Spectral Clustering.
Topic 3: Cluster Analysis
Clustering.
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Different NaCl “like” structures
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Clustering 77B Recommender Systems
Making Change Coins: 2 and
Clustering Wei Wang.
Topic 5: Cluster Analysis
Similarities Differences
Data Mining CSCI 307, Spring 2019 Lecture 24
Presentation transcript:

DATA MINING CLUSTERING ANALYSIS

Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested in clustering of balls of the three different colours into three different groups. The balls of same colour are clustered into a group as shown below : Concept Definition (Cluster, Cluster analysis)

Data Mining (by R.S.K. Baber) 3 CLUSTERING Which is a good cluster? Data structures in data mining / clustering Types of data in cluster analysis Types of clustering K-means:  Concept  Algorithm  Example  Comments

Data Mining (by R.S.K. Baber) 4 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center

Data Mining (by R.S.K. Baber) 5 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

Data Mining (by R.S.K. Baber) 6 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

Data Mining (by R.S.K. Baber) 7 The K-Means Clustering Method How it works?  Suppose, we have 8 points A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9) and 3 clusters initially.  Initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2).  Distance function between two points a=(x1, y1) and b=(x2, y2) is d(a, b) = |x2 – x1| + |y2 – y1|.

Data Mining (by R.S.K. Baber) 8 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

Data Mining (by R.S.K. Baber) 9 The K-Means Clustering Method Iteration # 1:  Pointmean1  x1, y1x2, y2  (2, 10) (2, 10)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |10 – 10| = = 0  Pointmean2  x1, y1x2, y2  (2, 10) (5, 8)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = = 5  Pointmean3  x1, y1x2, y2  (2, 10) (1, 2)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = = 9

Data Mining (by R.S.K. Baber) 10 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

Data Mining (by R.S.K. Baber) 11 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5)5643 A3(8, 4)12792 A4(5, 8)50102 A5(7, 5)10592 A6(6, 4)10572 A7(1, 2)91003 A8(4, 9)32102

Data Mining (by R.S.K. Baber) 12 The K-Means Clustering Method Iteration # 1: New clusters:  Cluster 1: (2, 10)  Cluster 2: (8, 4) (5, 8) (7, 5) (6, 4) (4, 9)  Cluster 3: (2, 5) (1, 2) New means:  For Cluster 1, we only have one point A1(2, 10), which was the old mean, so the cluster center remains the same.  Cluster 2: ( ( )/5, ( )/5 ) = (6, 6)  Cluster 3: ( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)

Data Mining (by R.S.K. Baber) 13 The K-Means Clustering Method After Iteration 1:

Data Mining (by R.S.K. Baber) 14 The K-Means Clustering Method After Iteration 2 & 3:

Data Mining (by R.S.K. Baber) 15