Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST.

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering, DBSCAN The EM Algorithm
Albert Gatt Corpora and Statistical Methods Lecture 13.
The 5th annual UK Workshop on Computational Intelligence London, 5-7 September 2005 Department of Electronic & Electrical Engineering University College.
Hierarchical Dirichlet Processes
K Means Clustering , Nearest Cluster and Gaussian Mixture
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Clustering and Dimensionality Reduction Brendan and Yifang April
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
By Fernando Seoane, April 25 th, 2006 Demo for Non-Parametric Classification Euclidean Metric Classifier with Data Clustering.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
Lecture 4 Cluster analysis Species Sequence P.symA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTTTTATTTCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG P.xanA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTAATATTCCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG.
Clustering.
POSTER TEMPLATE BY: Note: in high dimensions, the data are sphered prior to distance matrix calculation. Three Groups Example;
Multivariate Data Analysis Chapter 9 - Cluster Analysis
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Clustering and MDS Exploratory Data Analysis. Outline What may be hoped for by clustering What may be hoped for by clustering Representing differences.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Main Clustering Algorithms §K-Means §Hierarchical §SOM.
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
1 Gene Ontology Javier Cabrera. 2 Outline Goal: How to identify biological processes or biochemical pathways that are changed by treatment.Goal: How to.
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Randomized Algorithms for Bayesian Hierarchical Clustering
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Clustering Gene Expression Data BMI/CS 576 Colin Dewey Fall 2010.
Clustering.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Cluster Analysis.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
Unsupervised Learning
What is a metric embedding?Embedding ultrametrics into R d An embedding of an input metric space into a host metric space is a mapping that sends each.
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Multivariate statistical methods Cluster analysis.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Clustering (1) Clustering Similarity measure Hierarchical clustering
Multivariate statistical methods
Machine Learning for the Quantified Self
Discrimination and Classification
Special Topics In Scientific Computing
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
Introduction to particle filter
Multivariate community analysis
SEG 4630 E-Commerce Data Mining — Final Review —
Revision (Part II) Ke Chen
Clustering and Multidimensional Scaling
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Introduction to particle filter
Revision (Part II) Ke Chen
Clustering The process of grouping samples so that the samples are similar within each group.
SEEM4630 Tutorial 3 – Clustering.
Presentation transcript:

Weighted Chinese Restaurant Process for clustering barcodes Javier Cabrera John Lau Albert Lo DIMACS, Bristol U, and HKUST

Cluster Analysis: Group the observations into k distinct natural groups. Non Bayesian Cluster Analysis: Hierarchical clustering: Build a hierarchical tree - SIMILARITY: Inter point distance: Euclidean, Manhattan… - Inter cluster distance: Single Linkage, Complete, Average, Ward -Build a hierarchical tree Non Hierarchical clustering: -K-means -Divisive -PAM -Model Based -Many Other Methods

Specimen 1 Specimen 2 Specimen 3 Specimen 4 Specimen 5 Specimen 6 Specimen 7 Hierarchical Clustering

Weighted Chinese Restaurant Process 1.The Restaurant is full of tables. 2. Customers are sited on tables by a sitting rule. 3. Customers are allowed to move from one table to another or to a new empty one. Partition: Each sitting arrangement for all the customers in the restaurant.

Partitions: p : Partition of specimens into species. p  P : {Space of all possible partitions. All arrangements of specimens into species} Bayes basics: Prior Distribution: π(p) Likelihood: f(x|p) =  1  i  n(p) k(x j, j  C i ). Posterior: π(p|data)  f(x|p)  π(p)

Weighted Chinese Restaurant Process Approximate Posterior distribution with WCRP Run the process for a while and obtain frequency table of partitions visited. Estimate final partition with posterior mode. Compare posterior probabilities of most probable partitions. New Specimens: -Placed in one existing table. -Open a new table=>New Species

Future Work WCRP Algorithm for Barcode data: Data Visualization: Final partition => similarities => Euclidean Representation -Multidimensional Scaling -Multivariate Data Visualization (used in taxonomy) -Projection Pursuit Entropy scanning Lo (1984), Ishwaran and James (2003b), Cabrera, Lau, Lo (2006) Javier Cabrera John Lau Albert Lo