RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and.

Slides:

Advertisements

Similar presentations

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Advertisements

Hierarchical Clustering, DBSCAN The EM Algorithm

PARTITIONAL CLUSTERING

Unsupervised Learning

Adaptive Control of a Multi-Bias S-Parameter Measurement System Dr Cornell van Niekerk Microwave Components Group University of Stellebosch South Africa.

Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.

K Means Clustering , Nearest Cluster and Gaussian Mixture

Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.

K-means clustering Hongning Wang

Scalable Data Clustering with GPUs Student: Andrew D. Pangborn 1 Advisors: Dr. Muhammad Shaaban 1, Dr. Gregor von Laszewski 2, Dr. James Cavenaugh 3, Dr.

Speaker Clustering using MDL Principles Kofi Boakye Stat212A Project December 3, 2003.

Overview Of Clustering Techniques D. Gunopulos, UCR.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.

Contrast Enhancement Crystal Logan Mentored by: Dr. Lucia Dettori Dr. Jacob Furst.

Optimal Bandwidth Selection for MLS Surfaces

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

Scalable Training of Mixture Models via Coresets Daniel Feldman Matthew Faulkner Andreas Krause MIT.

EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.

Regression Eric Feigelson. Classical regression model ``The expectation (mean) of the dependent (response) variable Y for a given value of the independent.

Advanced Statistical Methods for Research Math 736/836

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.

Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 

Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.

DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.

ArrayCluster: an analytic tool for clustering, data visualization and module ﬁnder on gene expression proﬁles 組員：李祥豪謝紹陽江建霖.

DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.

Computational Biology, Part E Basic Principles of Computer Graphics Robert F. Murphy Copyright  1996, 1999, 2000, All rights reserved.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

Gregor von Laszewski Rochester Institute of Technology.

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

Bahman Bahmani Stanford University

1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.

Jakob Verbeek December 11, 2009

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.

Analyzing Expression Data: Clustering and Stats Chapter 16.

Flat clustering approaches

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 

CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.

Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.

Unsupervised Learning

Bayesian Generalized Product Partition Model

Classification of unlabeled data:

Inference for the mean vector

Haim Kaplan and Uri Zwick

Clustering (3) Center-based algorithms Fuzzy k-means

Application of Independent Component Analysis (ICA) to Beam Diagnosis

Clustering and Multidimensional Scaling

Multivariate Statistical Methods

Flow Cytometry Journal of Investigative Dermatology

Stochastic Optimization Maximization for Latent Variable Models

Bootstrap Segmentation Analysis and Expectation Maximization

Hairong Qi, Gonzalez Family Professor

Unsupervised Learning

Presentation transcript:

RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and want to find it rapidly and consistently (e.g., monitoring cytokines in a clinical trial). Case II: You want to explore the data. GAFF: Gating Assistance For Flow Here one starts with a seed population and draws a very crude gate for what you're interested in. It then back-gates to find the “friends” of the seed population but at that point is independent of the exact choice of seed. Hence, different operators should be able to draw slightly different seed populations and should robustly get the same final answer. Step 1: Define an approximate seed population. Step 2: Backgate on the seed population. After the 1st iteration, the seed no longer matters. Clustering here is based on Gaussian mixture modeling with EM algorithm. Step 3: Find the particular cells you're interested in – in this case, live CD4 T cells which make IL-2 or IFN-γ or both. Step 4: Continue with batch analysis (soon to be implemented). James S. Cavenaugh♣♦, Jonathan Rebhahn♣, Andrew Pangborn♪, Iftekhar Naim☼, Jeremy Espenshade♪, Sid Pendleberry♪, Gregor von Laszewski♫, Suprakash Datta☻, Gaurav Sharma☼, Axel Wismueller&, Marcus Huber&, J-C. Ernest Wang♣, Sally Quataert♣, Hulin Wu♦, Tim R. Mosmann♣James S. Cavenaugh♣♦, Jonathan Rebhahn♣, Andrew Pangborn♪, Iftekhar Naim☼, Jeremy Espenshade♪, Sid Pendleberry♪, Gregor von Laszewski♫, Suprakash Datta☻, Gaurav Sharma☼, Axel Wismueller&, Marcus Huber&, J-C. Ernest Wang♣, Sally Quataert♣, Hulin Wu♦, Tim R. Mosmann♣ ♣ Center for Vaccine Biology and Immunology and Rochester Human Immunology Center, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642♦ Department of Biostatistics and Computational Biology and Center for Biodefense Immune Modeling, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642☼ Image Processing Laboratory, Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY ☻ Department of Computer Science, York University, Toronto, ON, Canada& Rochester Center for Brain Imaging, Department of Bioengineering, and Department of Radiology, School of Medicine and Dentistry, University of Rochester, Rochester, NY♪ Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY♫ Pervasive Technology Institute, Indiana University, Bloomington, Indiana 47408♣ Center for Vaccine Biology and Immunology and Rochester Human Immunology Center, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642♦ Department of Biostatistics and Computational Biology and Center for Biodefense Immune Modeling, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642☼ Image Processing Laboratory, Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY ☻ Department of Computer Science, York University, Toronto, ON, Canada& Rochester Center for Brain Imaging, Department of Bioengineering, and Department of Radiology, School of Medicine and Dentistry, University of Rochester, Rochester, NY♪ Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY♫ Pervasive Technology Institute, Indiana University, Bloomington, Indiana T r i p l i c a t e t i t r a t i o n s o f h u m a n c e l l s w i t h m o u s e c e l l s s t a i n e d f o r s a m e a n t i g e n s ( d i f f e r e n t d y e s ), b u t a l s o w i t h C F S E. I d e a i s t h a t c l u s t e r s f o r h u m a n c e l l s a n d c l u s t e r s f o r m o u s e c e l l s s h o u l d b e a b l e t o b e i d e n t i f i e d, a n d C F S E c o u l d b e u s e d a s a c h e c k. A n t i g e n s u s e d : C D 3, C D 4, C D 8, C D 1 1, N K 1 1 b, C D 1 6, C D 1 9, l i v e / d e a d, C F S E = 1 6, p o s s i b l e b i n a r y c l u s t e r s. N ~ H o w m a n y r e a l c l u s t e r s ? Model dataset with mouse, human dataModel dataset with mouse, human data We use multiple approaches: Mixture models (currently only Gaussian) SWIFT sampling: Scalable Weighted Iterative Sampling for Flow Cytometry finds rare populations. CUDA architecture increases speed fold. MDL: Minimum Description Length principle can be used as an information theoretic criterion for estimating the best number of clusters. Scatter matrices are an extension of Euclidean space partitional algorithms (fuzzy derivatives of K-means) which overcome the bias towards spherical clusters. Exhaustive bivariate clustering uses low dimensional clustering to identify many more higher dimensional clusters. XOM is a novel nonlinear dimension reduction technique which we are applying to FC data. What's wrong with the current paradigm? Tedious and scales poorly. The number of bivariate plots increases with dimensions as (d choose 2) = d(d-1)/2. Arbitrary, imprecise. UR results with experienced immunologists show 10-fold variation in gating results for some populations! False sense of precision. Once set, people tend to believe arbitrary gates. No provision for overlapping (soft) cell populations. Obfuscates theoretical justification for statistical inferences on cell populations. Dataset with 50% human cells and 50% mice cells Left: x axis is CFSE (mouse cells only as a control); y axis is mCD4 PerCP, clustered with 2 variables using different scatter matrix algorithms (Rousseeuw et al., Computational Statistics and Data Analysis 23: , 1996) Right: same data seen in a pseudo-color density plot in FlowJo Far upper right: Clustering on 15 variables with 80 clusters using Gaussian mixture models with 2 size variables (FSC- A and SSC-A) and 13 Fluorophores (except CFSE) Far right: graphical user interface for clustering adaptive distances, 7 clustersmaximum likelihood, 7 clusters SAND, 7 clustersSAND, 8 clusters hCD3 vs hCD19 80 clusters are shown with repeated axes for clarity.80 clusters are shown with repeated axes for clarity. This work is supported in part by NIH R24 AI (Mosmann, PI).© 2009, James S. Cavenaugh, Ph.D.This work is supported in part by NIH R24 AI (Mosmann, PI).© 2009, James S. Cavenaugh, Ph.D.