Presentation is loading. Please wait.

Presentation is loading. Please wait.

RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and.

Similar presentations


Presentation on theme: "RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and."— Presentation transcript:

1 RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and want to find it rapidly and consistently (e.g., monitoring cytokines in a clinical trial). Case II: You want to explore the data. GAFF: Gating Assistance For Flow Here one starts with a seed population and draws a very crude gate for what you're interested in. It then back-gates to find the “friends” of the seed population but at that point is independent of the exact choice of seed. Hence, different operators should be able to draw slightly different seed populations and should robustly get the same final answer. Step 1: Define an approximate seed population. Step 2: Backgate on the seed population. After the 1st iteration, the seed no longer matters. Clustering here is based on Gaussian mixture modeling with EM algorithm. Step 3: Find the particular cells you're interested in – in this case, live CD4 T cells which make IL-2 or IFN-γ or both. Step 4: Continue with batch analysis (soon to be implemented). James S. Cavenaugh♣♦, Jonathan Rebhahn♣, Andrew Pangborn♪, Iftekhar Naim☼, Jeremy Espenshade♪, Sid Pendleberry♪, Gregor von Laszewski♫, Suprakash Datta☻, Gaurav Sharma☼, Axel Wismueller&, Marcus Huber&, J-C. Ernest Wang♣, Sally Quataert♣, Hulin Wu♦, Tim R. Mosmann♣James S. Cavenaugh♣♦, Jonathan Rebhahn♣, Andrew Pangborn♪, Iftekhar Naim☼, Jeremy Espenshade♪, Sid Pendleberry♪, Gregor von Laszewski♫, Suprakash Datta☻, Gaurav Sharma☼, Axel Wismueller&, Marcus Huber&, J-C. Ernest Wang♣, Sally Quataert♣, Hulin Wu♦, Tim R. Mosmann♣ ♣ Center for Vaccine Biology and Immunology and Rochester Human Immunology Center, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642♦ Department of Biostatistics and Computational Biology and Center for Biodefense Immune Modeling, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642☼ Image Processing Laboratory, Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627-0126☻ Department of Computer Science, York University, Toronto, ON, Canada& Rochester Center for Brain Imaging, Department of Bioengineering, and Department of Radiology, School of Medicine and Dentistry, University of Rochester, Rochester, NY♪ Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY♫ Pervasive Technology Institute, Indiana University, Bloomington, Indiana 47408♣ Center for Vaccine Biology and Immunology and Rochester Human Immunology Center, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642♦ Department of Biostatistics and Computational Biology and Center for Biodefense Immune Modeling, School of Medicine and Dentistry, University of Rochester, Rochester, NY 14642☼ Image Processing Laboratory, Department of Electrical and Computer Engineering, University of Rochester, Rochester, NY 14627-0126☻ Department of Computer Science, York University, Toronto, ON, Canada& Rochester Center for Brain Imaging, Department of Bioengineering, and Department of Radiology, School of Medicine and Dentistry, University of Rochester, Rochester, NY♪ Department of Computer Engineering, Rochester Institute of Technology, Rochester, NY♫ Pervasive Technology Institute, Indiana University, Bloomington, Indiana 47408 T r i p l i c a t e t i t r a t i o n s o f h u m a n c e l l s w i t h m o u s e c e l l s s t a i n e d f o r s a m e a n t i g e n s ( d i f f e r e n t d y e s ), b u t a l s o w i t h C F S E. I d e a i s t h a t c l u s t e r s f o r h u m a n c e l l s a n d c l u s t e r s f o r m o u s e c e l l s s h o u l d b e a b l e t o b e i d e n t i f i e d, a n d C F S E c o u l d b e u s e d a s a c h e c k. A n t i g e n s u s e d : C D 3, C D 4, C D 8, C D 1 1, N K 1 1 b, C D 1 6, C D 1 9, l i v e / d e a d, C F S E... 2 1 4 = 1 6, 3 8 4 p o s s i b l e b i n a r y c l u s t e r s. N ~ 1 0 6. H o w m a n y r e a l c l u s t e r s ? Model dataset with mouse, human dataModel dataset with mouse, human data We use multiple approaches: Mixture models (currently only Gaussian) SWIFT sampling: Scalable Weighted Iterative Sampling for Flow Cytometry finds rare populations. CUDA architecture increases speed 50-100 fold. MDL: Minimum Description Length principle can be used as an information theoretic criterion for estimating the best number of clusters. Scatter matrices are an extension of Euclidean space partitional algorithms (fuzzy derivatives of K-means) which overcome the bias towards spherical clusters. Exhaustive bivariate clustering uses low dimensional clustering to identify many more higher dimensional clusters. XOM is a novel nonlinear dimension reduction technique which we are applying to FC data. What's wrong with the current paradigm? Tedious and scales poorly. The number of bivariate plots increases with dimensions as (d choose 2) = d(d-1)/2. Arbitrary, imprecise. UR results with experienced immunologists show 10-fold variation in gating results for some populations! False sense of precision. Once set, people tend to believe arbitrary gates. No provision for overlapping (soft) cell populations. Obfuscates theoretical justification for statistical inferences on cell populations. Dataset with 50% human cells and 50% mice cells Left: x axis is CFSE (mouse cells only as a control); y axis is mCD4 PerCP, clustered with 2 variables using different scatter matrix algorithms (Rousseeuw et al., Computational Statistics and Data Analysis 23: 135-151, 1996) Right: same data seen in a pseudo-color density plot in FlowJo Far upper right: Clustering on 15 variables with 80 clusters using Gaussian mixture models with 2 size variables (FSC- A and SSC-A) and 13 Fluorophores (except CFSE) Far right: graphical user interface for clustering adaptive distances, 7 clustersmaximum likelihood, 7 clusters SAND, 7 clustersSAND, 8 clusters hCD3 vs hCD19 80 clusters are shown with repeated axes for clarity.80 clusters are shown with repeated axes for clarity. This work is supported in part by NIH R24 AI054953-06 (Mosmann, PI).© 2009, James S. Cavenaugh, Ph.D.This work is supported in part by NIH R24 AI054953-06 (Mosmann, PI).© 2009, James S. Cavenaugh, Ph.D.


Download ppt "RAPID, THEORETICALLY SOUND MULTIVARIATE CLUSTERING FOR A PARADIGM SHIFT IN FLOW CYTOMETRY DATA ANALYSIS Case I: You know what you're interested in and."

Similar presentations


Ads by Google