Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.

Slides:

Advertisements

Similar presentations

Advertisements

Hierarchical Clustering, DBSCAN The EM Algorithm

PARTITIONAL CLUSTERING

Data Mining Techniques: Clustering

Introduction to Bioinformatics

Clustering and Dimensionality Reduction Brendan and Yifang April

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.

Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.

What is Clustering? Also called unsupervised learning, sometimes called classification by statisticians and sorting by psychologists and segmentation by.

Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

What is Cluster Analysis?

Measurement of Similarity and Clustering

Read “chainLetters.pdf” in my public directory C. H. Bennett, M. Li, and B. Ma, "Chain letters and evolutionary histories", Scientific Amer., pp. 76 -

Clustering Unsupervised learning Generating “classes”

Data mining and machine learning A brief introduction.

Clustering Slides by Eamonn Keogh.

1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.

1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.

Organizing data into classes such that there is high intra-class similarity low inter-class similarity Finding the class labels and the number of classes.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:

Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)

Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)

Text Clustering Hongning Wang

Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.

Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.

Initial slides by Eamonn Keogh Clustering. Organizing data into classes such that there is high intra-class similarity low inter-class similarity Finding.

Classification Problem Class F1 F2 F3 F

Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.

Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.

Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.

The next 6 pages are mini reviews of your ML classification skills.

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

Data Mining and Text Mining. The Standard Data Mining process.

CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.

COMP24111 Machine Learning K-means Clustering Ke Chen.

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

Introduction to Clustering

What Is Cluster Analysis?

Semi-Supervised Clustering

Web-Mining Agents Clustering

Machine Learning Clustering: K-means Supervised Learning

Slides by Eamonn Keogh (UC Riverside)

The next 6 pages are mini reviews of your ML classification skills

Data Mining -Cluster Analysis. What is a clustering ? Clustering is the process of grouping data into classes, or clusters, so that objects within a cluster.

Dipartimento di Ingegneria «Enzo Ferrari»,

CSE572, CBS598: Data Mining by H. Liu

Data Mining 資料探勘分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育

DATA MINING Introductory and Advanced Topics Part II - Clustering

CSE572, CBS572: Data Mining by H. Liu

Clustering Wei Wang.

Text Categorization Berlin Chen 2003 Reference:

Junheng, Shengming, Yunsheng 11/09/2018

Topic 5: Cluster Analysis

CSE572: Data Mining by H. Liu

Presentation transcript:

Machine Learning CS 165B Spring

Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks (Ch. 4) Linear classifiers Support Vector Machines Bayesian Learning (Ch. 6) Genetic Algorithms (Ch. 9) Instance-based Learning (Ch. 8) Clustering Computational learning theory (Ch. 7) 2

Organizing data into classes such that there is high intra-class similarity low inter-class similarity Finding the class labels and the number of classes directly from the data (in contrast to classification). More informally, finding natural groupings among objects. What is clustering? 3 Remember Fischer’s discriminant

What is a natural grouping among these objects? 4

School Employees Simpson's Family MalesFemales Clustering is subjective What is a natural grouping among these objects? 5

What is similarity? The quality or state of being similar; likeness; resemblance; as, a similarity of features. Similarity is hard to define, but… “We know it when we see it” The real meaning of similarity is a philosophical question. We will take a more pragmatic approach. Webster's Dictionary 6

Defining distance measures Definition: Let O 1 and O 2 be two objects from the universe of possible objects. The distance (dissimilarity) between O 1 and O 2 is a real number denoted by D(O 1,O 2 ) PeterPiotr 7

What properties should a distance measure have? D(A,B) = D(B,A)Symmetry D(A,B) = 0 iff A= B Reflexive D(A,B)  D(A,C) + D(B,C)Triangle Inequality PeterPiotr 3 d('', '') = 0 d(s, '') = d('', s) = |s| -- i.e. length of s d(s1+ch1, s2+ch2) = min( d(s1, s2) + if ch1=ch2 then 0 else 1 fi, d(s1+ch1, s2) + 1, d(s1, s2+ch2) + 1 ) When we peek inside one of these black boxes, we see some function on two variables. These functions might very simple or very complex. In either case it is natural to ask, what properties should these functions have? 8

Two types of clusteringHierarchical Partitional algorithms: Construct various partitions and then evaluate them by some criterion Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion Partitional 9

Desirable Properties of clustering algorithm Scalability (in terms of both time and space) Ability to deal with different data types Minimal requirements for domain knowledge to determine input parameters Able to deal with noise and outliers Insensitive to order of input records Incorporation of user-specified constraints Interpretability and usability 10

Summarizing similarity measurements In order to better appreciate and evaluate the examples given in the early part of this talk, we will now introduce the dendrogram. The similarity between two objects in a dendrogram is represented as the height of the lowest internal node they share. 11

Pedro (Portuguese) Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian) Cristovao (Portuguese) Christoph (German), Christophe (French), Cristobal (Spanish), Cristoforo (Italian), Kristoffer (Scandinavian), Krystof (Czech), Christopher (English) Miguel (Portuguese) Michalis (Greek), Michael (English), Mick (Irish!) Hierarchical clustering using string edit distance Piotr Pyotr Petros Pietro Pedro Pierre Piero Peter Pede r Peka Peadar Michalis Michael Miguel Mick Cristovao Christopher Christophe Christoph Crisdean Cristobal Cristoforo Kristoffer Krystof 12

(How-to) Hierarchical clustering The number of dendrograms with n leafs = (2n -3)!/[(2 (n -2) ) (n -2)!] Number Number of Possible of LeafsDendrograms … 10 34,459,425 Since we cannot test all possible trees we will have to use heuristics: Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. Top-Down (divisive): Starting with all the data in a single cluster, consider every possible way to divide the cluster into two. Choose the best division and recursively operate on both sides. 13

D(, ) = 8 D(, ) = 1 We begin with a distance matrix which contains the distances between every pair of objects in our database. 14 Distance matrix

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. … Consider all possible merges… Choose the best 15 Bottom-Up (agglomerative)

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. … Consider all possible merges… Choose the best Consider all possible merges… … Choose the best 16 Bottom-Up (agglomerative)

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. … Consider all possible merges… Choose the best Consider all possible merges… … Choose the best Consider all possible merges… Choose the best … 17 Bottom-Up (agglomerative)

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together. … Consider all possible merges… Choose the best Consider all possible merges… … Choose the best Consider all possible merges… Choose the best … 18 Bottom-Up (agglomerative)

We know how to measure the distance between two objects, but defining the distance between an object and a cluster, or defining the distance between two clusters is not obvious. Single linkage (nearest neighbor): Single linkage (nearest neighbor): In this method the distance between two clusters is determined by the distance of the two closest objects (nearest neighbors) in the different clusters. Complete linkage (farthest neighbor): Complete linkage (farthest neighbor): In this method, the distances between clusters are determined by the greatest distance between any two objects in the different clusters (i.e., by the "furthest neighbors"). Group average linkage Group average linkage: In this method, the distance between two clusters is calculated as the average distance between all pairs of objects in the two different clusters. 19 Extending distance measure to clusters

No need to specify the number of clusters in advance. Hierarchal nature maps nicely onto human intuition for some domains They do not scale well: time complexity of at least O(n 2 ), where n is the number of total objects. Like any heuristic search algorithms, local optima are a problem. 20 Summary of hierarchal clustering

Partitional clustering Nonhierarchical, each instance is placed in exactly one of K nonoverlapping clusters. Since only one set of clusters is output, the user normally has to input the desired number of clusters K. 21

1. Decide on a value for k. 2. Initialize the k cluster centers (randomly, if necessary). 3. Decide the class memberships of the N objects by assigning them to the nearest cluster center. 4. Re-estimate the k cluster centers, by assuming the memberships found above are correct. 5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto Algorithm k-means

K-means clustering: step 1 Algorithm: k-means, Distance Metric: Euclidean Distance k1k1 k2k2 k3k3 23

K-means clustering: step 2 Algorithm: k-means, Distance Metric: Euclidean Distance k1k1 k2k2 k3k3 24

K-means clustering: step 3 Algorithm: k-means, Distance Metric: Euclidean Distance k1k1 k2k2 k3k3 25

K-means clustering: step 4 Algorithm: k-means, Distance Metric: Euclidean Distance k1k1 k2k2 k3k3 26

K-means clustering: step 5 Algorithm: k-means, Distance Metric: Euclidean Distance k1k1 k2k2 k3k3 27

Evaluation of K-means Strength –Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. –Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms Weakness –Applicable only when mean is defined, then what about categorical data? –Need to specify k, the number of clusters, in advance –Unable to handle noisy data and outliers –Not suitable for clusters with non-convex shapes 28

Other clustering methods Density based clustering Subspace clustering EM clustering Gaussian Mixture Models Other concepts Dimensionality reduction PCA 29