Unsupervised Learning

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

What we Measure vs. What we Want to Know
Multivariate Description. What Technique? Response variable(s)... Predictors(s) No Predictors(s) Yes... is one distribution summary regression models...
Basic Gene Expression Data Analysis--Clustering
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
Clustering Categorical Data The Case of Quran Verses
CLUSTERING PROXIMITY MEASURES
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Introduction to Bioinformatics
Important clustering methods used in microarray data analysis Steve Horvath Human Genetics and Biostatistics UCLA.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Clustering Petter Mostad. Clustering vs. class prediction Class prediction: Class prediction: A learning set of objects with known classes A learning.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Introduction to Bioinformatics - Tutorial no. 12
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Neural Network Homework Report: Clustering of the Self-Organizing Map Professor : Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Clustering Unsupervised learning Generating “classes”
Kansas State University Department of Computing and Information Sciences KDD Group Presentation #7 Fall’01 Friday, November 9, 2001 Cecil P. Schmidt Department.
Pattern Recognition Introduction to bioinformatics 2006 Lecture 4.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.
Multidimensional Scaling Vuokko Vuori Based on: Data Exploration Using Self-Organizing Maps, Samuel Kaski, Ph.D. Thesis, 1997 Multivariate Statistical.
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Clustering.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Chapter 2: Getting to Know Your Data
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Clustering (1) Clustering Similarity measure Hierarchical clustering
Machine Learning Clustering: K-means Supervised Learning
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Similarity and Dissimilarity
John Nicholas Owen Sarah Smith
Revision (Part II) Ke Chen
Clustering and Multidimensional Scaling
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Revision (Part II) Ke Chen
Dimension reduction : PCA and Clustering
Clustering The process of grouping samples so that the samples are similar within each group.
SEEM4630 Tutorial 3 – Clustering.
Machine Learning – a Probabilistic Perspective
Cluster analysis Presented by Dr.Chayada Bhadrakom
Data Mining: Concepts and Techniques — Chapter 2 —
Presentation transcript:

Unsupervised Learning A little knowledge…

A few “synonyms”… Agminatics Aciniformics Q-analysis Botryology Systematics Taximetrics Clumping Morphometrics Nosography Nosology Numerical taxonomy Typology Clustering

Exploratory Data Analysis Visualization methods with little or no numerical manipulation Dimensionality Reduction Multidimensional Scaling Principal Components Analysis, Factor Analysis Self-Organizing Maps Cluster Analysis Hierarchical Agglomerative Divisive Non-hierarchical Unsupervised: No GOLD STANDARD

Outline Proximity Multidimensional Scaling Clustering Distance Metrics Similarity Measures Multidimensional Scaling Clustering Hierarchical Clustering Agglomerative Criterion Functions for Clustering Graphical Representations

Algorithms, similarity measures, and graphical representations Most algorithms are not necessarily linked to a particular metric or similarity measure Also not necessarily linked to a particular graphical representation Be aware of the fact that old algorithms are being reused under new names!

Common mistakes Refer to dendrograms as meaning “hierarchical clustering” in general Misinterpretation of tree-like graphical representations Refer to self-organizing maps as clustering Ill definition of clustering criterion Declare a clustering algorithm as “best” Expect classification model from clusters Expect robust results with little/poor data

Unsupervised Learning Dimensionality Reduction MDS SOM Raw Data Graphical Representation Distance or Similarity Matrix Clustering Hierarchical Non-hierar. Validation Internal Extermal

Metrics

Minkowski r-metric Manhattan (city-block) Euclidean j i

Metric spaces Positivity Reflexivity Symmetry Triangle inequality i j h

More metrics Ultrametric Four-point additive condition j i h

Similarity measures Similarity function For binary, “shared attributes” i = [1,0,1] j = [0,0,1]

Variations… Fraction of d attributes shared Tanimoto coefficient j = [0,0,1]

More variations… Correlation Entropy-based Ad-hoc Linear Rank Mutual information Ad-hoc Neural networks

Dimensionality Reduction

Multidimensional Scaling Geometrical models Uncover structure or pattern in observed proximity matrix Objective is to determine both dimensionality d and the position of points in the d-dimensional space

Metric and non-metric MDS Metric (Torgerson 1952) Non-metric (Shepard 1961) Estimates nonlinear form of the monotonic function

Stress and goodness-of-fit 20 10 5 2.5 Goodness of fit Poor Fair Good Excellent Perfect

Clustering

Non-Hierarchical: Distance threshold Duda et al., “Pattern Classification”

Hierarchical

Additive Trees Commonly the minimum spanning tree Nearest neighbor approach to hierarchical clustering Single-linkage

Other linkages Single-linkage: proximity to the closest element in another cluster Complete-linkage: proximity to the most distant element Average-linkage: average proximity Mean: proximity to the mean (centroid) [only that does not require all distances]

Hierarchical Clustering Agglomerative Technique Successive “fusing” cases Respect (or not) definitions of intra- and /or inter-group proximity Visualization Dendrogram, Tree, Venn diagram

Graphical Representations

Data Visualization