RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and.

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
Unsupervised Learning
Adaptive Control of a Multi-Bias S-Parameter Measurement System Dr Cornell van Niekerk Microwave Components Group University of Stellebosch South Africa.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
K-means clustering Hongning Wang
Clustering II.
Scalable Data Clustering with GPUs Student: Andrew D. Pangborn 1 Advisors: Dr. Muhammad Shaaban 1, Dr. Gregor von Laszewski 2, Dr. James Cavenaugh 3, Dr.
Speaker Clustering using MDL Principles Kofi Boakye Stat212A Project December 3, 2003.
Overview Of Clustering Techniques D. Gunopulos, UCR.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Contrast Enhancement Crystal Logan Mentored by: Dr. Lucia Dettori Dr. Jacob Furst.
Optimal Bandwidth Selection for MLS Surfaces
Clustering.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Scalable Training of Mixture Models via Coresets Daniel Feldman Matthew Faulkner Andreas Krause MIT.
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Regression Eric Feigelson. Classical regression model ``The expectation (mean) of the dependent (response) variable Y for a given value of the independent.
Advanced Statistical Methods for Research Math 736/836
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.
Computational Biology, Part E Basic Principles of Computer Graphics Robert F. Murphy Copyright  1996, 1999, 2000, All rights reserved.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Generative Topographic Mapping by Deterministic Annealing Jong Youl Choi, Judy Qiu, Marlon Pierce, and Geoffrey Fox School of Informatics and Computing.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Gregor von Laszewski Rochester Institute of Technology.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Bahman Bahmani Stanford University
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Jakob Verbeek December 11, 2009
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Unsupervised Learning
Bayesian Generalized Product Partition Model
Classification of unlabeled data:
Inference for the mean vector
Haim Kaplan and Uri Zwick
Clustering (3) Center-based algorithms Fuzzy k-means
Application of Independent Component Analysis (ICA) to Beam Diagnosis
Clustering and Multidimensional Scaling
Multivariate Statistical Methods
Flow Cytometry Journal of Investigative Dermatology
Stochastic Optimization Maximization for Latent Variable Models
Bootstrap Segmentation Analysis and Expectation Maximization
Hairong Qi, Gonzalez Family Professor
Unsupervised Learning
Presentation transcript:

RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and want to find it rapidly and consistently (e.g., monitoring cytokines in a clinical trial). Case II: You want to explore the data. GAFF: Gating Assistance For Flow Here one starts with a seed population and draws a very crude gate for what you're interested in. It then back-gates to find the “friends” of the seed population but at that point is independent of the exact choice of seed. Hence, different operators should be able to draw slightly different seed populations and should robustly get the same final answer. Step 1: Define an approximate seed population. Step 2: Backgate on the seed population. After the 1st iteration, the seed no longer matters. Clustering here is based on Gaussian mixture modeling with EM algorithm. Step 3: Find the particular cells you're interested in – in this case, live CD4 T cells which make IL-2 or IFN-γ or both. Step 4: Continue with batch analysis (soon to be implemented). James S. Cavenaugh♣♦, Jonathan Rebhahn♣, Andrew Pangborn♪, Iftekhar Naim☼, Jeremy Espenshade♪, Sid Pendleberry♪, Gregor von Laszewski♫, Suprakash Datta☻, Gaurav Sharma☼, Axel Wismueller&, Marcus Huber&, J-C. Ernest Wang♣, Sally Quataert♣, Hulin Wu♦, Tim R. Mosmann♣James S. Cavenaugh♣♦, Jonathan Rebhahn♣, Andrew Pangborn♪, Iftekhar Naim☼, Jeremy Espenshade♪, Sid Pendleberry♪, Gregor von Laszewski♫, Suprakash Datta☻, Gaurav Sharma☼, Axel Wismueller&, Marcus Huber&, J-C. Ernest Wang♣, Sally Quataert♣, Hulin Wu♦, Tim R. Mosmann♣ ♣ Center for Vaccine Biology and Immunology and Rochester Human Immunology Center, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642♦ Department of Biostatistics and Computational Biology and Center for Biodefense Immune Modeling, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642☼ Image Processing Laboratory, Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY ☻ Department of Computer Science, York University, Toronto, ON, Canada& Rochester Center for Brain Imaging, Department of Bioengineering, and Department of Radiology, School of Medicine and Dentistry, University of Rochester, Rochester, NY♪ Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY♫ Pervasive Technology Institute, Indiana University, Bloomington, Indiana 47408♣ Center for Vaccine Biology and Immunology and Rochester Human Immunology Center, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642♦ Department of Biostatistics and Computational Biology and Center for Biodefense Immune Modeling, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642☼ Image Processing Laboratory, Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY ☻ Department of Computer Science, York University, Toronto, ON, Canada& Rochester Center for Brain Imaging, Department of Bioengineering, and Department of Radiology, School of Medicine and Dentistry, University of Rochester, Rochester, NY♪ Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY♫ Pervasive Technology Institute, Indiana University, Bloomington, Indiana T r i p l i c a t e t i t r a t i o n s o f h u m a n c e l l s w i t h m o u s e c e l l s s t a i n e d f o r s a m e a n t i g e n s ( d i f f e r e n t d y e s ), b u t a l s o w i t h C F S E. I d e a i s t h a t c l u s t e r s f o r h u m a n c e l l s a n d c l u s t e r s f o r m o u s e c e l l s s h o u l d b e a b l e t o b e i d e n t i f i e d, a n d C F S E c o u l d b e u s e d a s a c h e c k. A n t i g e n s u s e d : C D 3, C D 4, C D 8, C D 1 1, N K 1 1 b, C D 1 6, C D 1 9, l i v e / d e a d, C F S E = 1 6, p o s s i b l e b i n a r y c l u s t e r s. N ~ H o w m a n y r e a l c l u s t e r s ? Model dataset with mouse, human dataModel dataset with mouse, human data We use multiple approaches: Mixture models (currently only Gaussian) SWIFT sampling: Scalable Weighted Iterative Sampling for Flow Cytometry finds rare populations. CUDA architecture increases speed fold. MDL: Minimum Description Length principle can be used as an information theoretic criterion for estimating the best number of clusters. Scatter matrices are an extension of Euclidean space partitional algorithms (fuzzy derivatives of K-means) which overcome the bias towards spherical clusters. Exhaustive bivariate clustering uses low dimensional clustering to identify many more higher dimensional clusters. XOM is a novel nonlinear dimension reduction technique which we are applying to FC data. What's wrong with the current paradigm? Tedious and scales poorly. The number of bivariate plots increases with dimensions as (d choose 2) = d(d-1)/2. Arbitrary, imprecise. UR results with experienced immunologists show 10-fold variation in gating results for some populations! False sense of precision. Once set, people tend to believe arbitrary gates. No provision for overlapping (soft) cell populations. Obfuscates theoretical justification for statistical inferences on cell populations. Dataset with 50% human cells and 50% mice cells Left: x axis is CFSE (mouse cells only as a control); y axis is mCD4 PerCP, clustered with 2 variables using different scatter matrix algorithms (Rousseeuw et al., Computational Statistics and Data Analysis 23: , 1996) Right: same data seen in a pseudo-color density plot in FlowJo Far upper right: Clustering on 15 variables with 80 clusters using Gaussian mixture models with 2 size variables (FSC- A and SSC-A) and 13 Fluorophores (except CFSE) Far right: graphical user interface for clustering adaptive distances, 7 clustersmaximum likelihood, 7 clusters SAND, 7 clustersSAND, 8 clusters hCD3 vs hCD19 80 clusters are shown with repeated axes for clarity.80 clusters are shown with repeated axes for clarity. This work is supported in part by NIH R24 AI (Mosmann, PI).© 2009, James S. Cavenaugh, Ph.D.This work is supported in part by NIH R24 AI (Mosmann, PI).© 2009, James S. Cavenaugh, Ph.D.