Machine Learning Clustering: K-means Supervised Learning

Slides:



Advertisements
Similar presentations
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Albert Gatt Corpora and Statistical Methods Lecture 13.
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
Introduction to Bioinformatics
1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Clustering Unsupervised learning Generating “classes”
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Data mining and machine learning A brief introduction.
Text Clustering.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Data Mining and Text Mining. The Standard Data Mining process.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Machine Learning Supervised Learning Classification and Regression
Clustering (1) Clustering Similarity measure Hierarchical clustering
Machine Learning Fisher’s Criteria & Linear Discriminant Analysis
Unsupervised Learning: Clustering
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Sampath Jayarathna Cal Poly Pomona
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Clustering Patrice Koehl Department of Biological Sciences
Clustering CSC 600: Data Mining Class 21.
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Constrained Clustering -Semi Supervised Clustering-
Machine Learning Lecture 9: Clustering
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Canadian Bioinformatics Workshops
Data Clustering Michael J. Watts
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
Revision (Part II) Ke Chen
PCA, Clustering and Classification by Agnieszka S. Juncker
Roberto Battiti, Mauro Brunato
Information Organization: Clustering
KAIST CS LAB Oh Jong-Hoon
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Revision (Part II) Ke Chen
Multivariate Statistical Methods
DATA MINING Introductory and Advanced Topics Part II - Clustering
Dimension reduction : PCA and Clustering
CS 391L: Machine Learning Clustering
INTRODUCTION TO Machine Learning
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
Machine Learning Support Vector Machine Supervised Learning
Machine Learning Perceptron: Linearly Separable Supervised Learning
Unsupervised Learning: Clustering
What is Artificial Intelligence?
Presentation transcript:

Machine Learning Clustering: K-means Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron: Linearly Separable Multilayer Perceptron & EBP & Deep Learning, RBF Network Support Vector Machine Ensemble Learning: Voting, Boosting(Adaboost) Unsupervised Learning Dimensionality Reduction: Principle Component Analysis Independent Component Analysis Clustering: K-means Semi-supervised Learning & Reinforcement Learning

Classification vs. Clustering Classification (known categories) Clustering (creation of new categories)

Clustering Clustering: the process of grouping a set of objects into classes of similar objects high intra-class similarity low inter-class similarity It is the commonest form of unsupervised learning Unsupervised learning: learning from unlabelled data, as opposed to supervised data where a classification of examples is given A common and important task that finds many applications in science and engineering Group genes that perform the same function Group individuals that have similar political view Categorize documents of similar topics Identify similar objects from photos

Clustering examples People Images Language Species

Issues for clustering “Groupness": What is a natural grouping among these objects? “Similarity/Distance": What makes objects related? “Representation": How do we represent objects? Vectors? Do we normalise? “Parameters": How many clusters? Fixed a priori? Data-driven? “Algorithms": Partitioning the data? Hierarchical algorithm? Formal foundation and convergence

Groupness: What is a natural grouping among these objects?

How do we define similarity?

What properties should a distance measure have? Symmetry Self-similarity Separation Triangular inequality

Distance measures

An example

Hamming distance Manhattan distance is called Hamming distance when all features are binary E.g. Gene expression levels under 17 conditions (1-High, 0-Low)

Similarity measures: Correlation coefficient

Edit distance: a generic technique for measuring similarity To measure the similarity between two objects, transform one of the objects into the other, and measure how much effort it took. The measure of effort becomes the distance measure.

Clustering algorithms

Hierarchical Clustering algorithms

Hierarchical Clustering Bottom-up agglomerative clustering Starts with each object in a separate cluster Then repeatedly joins the closest pair of clusters Until there is only one cluster Top-down divisive clustering Starts with all data in a single cluster Considers every possible way to divide the cluster into two and chooses the best division Recurse on both sides

Distance between clusters We can define the distance between two clusters as Single-link nearest neighbor: their closest members Complete-link farthest neighbor: their farthest members Centroid: the centrois (centers of gravity) of the two clusters Average: the average of all cross-cluster pairs

Single-link

Complete-link

Dendograms

Partitioning Algorithms Partitioning method: construct a partition of n objects into a set of K clusters Given: a set of objects and the number K Find: a partition of K clusters that optimizes the chosen partitioning criterion Globally optimal: exhaustively enumerate all partitions Effective heuristic methods: K-means algorithm

K-means Clustering

K-means Clustering

K-means Clustering Algorithm

K-means Algorithm Decide on a value for k Initialize the k cluster centers randomly Decide the class memberships of the N objects by assigning them to the nearest cluster centroids (aka the center of gravity or mean) Re-estimate the k cluster centers, by assuming the memberships found above are correct If none of the N objects changed membership in the last iteration, exit. Otherwise go back to 3.

K-means

K-means

K-means

K-means

K-means: seed selection

K-means: number of clusters

K-means: criteria for cluster quality

K-means: purity