Greg Finak, Ph.D., Senior Staff Scientist Marylou Ingram ISAC Scholar

Slides:



Advertisements
Similar presentations
Copyright © 2008, SAS Institute Inc. All rights reserved. Discovering Meaningful Patterns in Genomics Data with JMP Genomics Jordan Hiller JMP Genomics.
Advertisements

Chapter 4 – Reliability Observed Scores and True Scores Error
2013 Duke CFAR Flow Cytometry Workshop Data Analysis.
1 Using Biostatistics to Evaluate Vaccines and Medical Tests Holly Janes Fred Hutchinson Cancer Research Center.
Mathematics 191 Research Seminar in Mathematical Modeling Lecture 6/7 February 24 th, 2005.
Variable Selection for Optimal Decision Making Lacey Gunter University of Michigan Statistics Department Michigan Student Symposium for Interdisciplinary.
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Flow Cytometry and Reproducible Analysis Cliburn Chan Department of Biostatistics and Bioinformatics, DUMC.
Simple Linear Regression
How To Do Multivariate Pattern Analysis
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Medical Expert Systems Eddie Lai. History  1950s – scientists tried to use computers for “probabilistic reasoning and statistical pattern recognition”
BIOMARKER STUDIES IN CLINICAL TRIALS Vicki Seyfert-Margolis, PhD.
1/20 Remco Chang (Computer Science) Paul Han (Tufts Medical / Maine Medical) Holly Taylor (Psychology) Improving Health Risk Communication: Designing Visualizations.
Influenza Mortality Surveillance… Making Real-Time National Mortality Surveillance a Reality National Center for Health Statistics Division of Vital Statistics.
United Nations Economic Commission for Europe Statistical Division Seasonal Adjustment Process with Demetra+ Anu Peltola Economic Statistics Section, UNECE.
Flow cytometry to evaluate vaccine-induced T cell responses: standardized analysis of large numbers of FCS files Stephen De Rosa, M.D. HVTN Laboratory.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Topic (ii): New and Emerging Methods Maria Garcia (USA) Jeroen Pannekoek (Netherlands) UNECE Work Session on Statistical Data Editing Paris, France,
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Software for Flow Cytometry Data Analysis Hélène Dujardin, PhD TreeStar / Celeza GmbH.
Data Processing of the 2010 Population and Housing Census September 2008, Bangkok, Thailand National Statistical Office, Thailand.
Analysis of Experiments
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
1 ITN clinical trial analysis design – Transplantation example Cross-sectional and longitudinal clinical trials to assess tolerance Disease areas: Transplantation.
John Gatimu University of Nairobi/University of Washington Research Laboratory What Is Not Mentioned In User Manuals: A Case Study of Quality Control in.
Understanding unstructured texts via Latent Dirichlet Allocation Raphael Cohen DSaaS, EMC IT June 2015.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
Introduction to Quality Assurance. Quality assurance vs. Quality control.
Core Research Competencies:
CS Fall 2016 (Shavlik©), Lecture 5
Computational Campaign Coverage with PollyVote.com
Complex Geometry Visualization TOol
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
G. Suarez, J. Soares, S. Lopez, I. Obeid and J. Picone
Dr Lindi Coetzee Ms Keshendree Moodley Prof D.K. Glencross
Sample Size Estimation
Statistical Data Analysis
Introduction.
Chapter 7 The Hierarchy of Evidence
Strategies for Implementing Flexible Clinical Trials Jerald S. Schindler, Dr.P.H. Cytel Pharmaceutical Research Services 2006 FDA/Industry Statistics Workshop.
The Omics Dashboard Suzanne Paley Pathway Tools Workshop 2018
The basic notions related to machine learning
CSc4730/6730 Scientific Visualization
Personal Software Process Software Estimation
EPSY 5245 EPSY 5245 Michael C. Rodriguez
Volume 160, Issue 1, Pages (January 2015)
Tree Net algorithm contruction
Introduction Sample text for bullet And another bullet goes here
Noah Snavely.
Overfitting and Underfitting
Statistical Data Analysis
Machine Learning in Practice Lecture 27
HP Quality Center 10.0 The Test Plan Module
Volume 11, Issue 11, Pages (June 2015)
The Omics Dashboard.
DeltaV Neural - Expert In Expert mode, the user can select the training parameters, recommend you use the defaults for most applications.
Volume 24, Issue 5, Pages (July 2018)
Quality Reporting in CBS
TensorFlow: A System for Large-Scale Machine Learning
Emulator of Cosmological Simulation for Initial Parameters Study
Chapter 1 The Science of Biology
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lesson Overview 1.1 What Is Science?.
Introduction Sample text for bullet And another bullet goes here
Regulatory-Industry Statistics Workshop , 2019
Presentation transcript:

Validation of an Automated Gating Pipeline for ICS Assays in Vaccine Trials Greg Finak, Ph.D., Senior Staff Scientist Marylou Ingram ISAC Scholar Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center

Flow Cytometry At the Fred Hutch Intracellular Cytokine Staining Assay is the bread- and-butter assay for the HVTN. 1000s of patient samples / year. Manual analysis and current tools limit the number of cell populations that can be explored. 10 colors 12 colors 8 colors 17 colors 27 colors 2002 Today In development CD4 / CD8 T-cells &3 cytokines TfH, Memory, NK & 7 cytokines 16 colors

Challenges for automated gating Reproducibility Validation Ease of use

OpenCyto General framework for reproducible automated analysis. Built around R/BioConductor. Automated hierarchical analysis. Flexible and powerful but… Requires expert knowledge (R, programming, statistics, machine learning) Makes it difficult to use for the uninitiated. Finak G, et al. PLoS Comput. Biol. (2014)

Cell Population Definition OpenCyto: How it works Abstract Gating Template Gating Algorithms (12+ available) Pick one! Tuning Parameters Pick these too! Cell Population Definition Name Parent Channels + or - Import, compensate, transform, automated gating, summarize and export Raw FCS files Summary Tables, Plots, Analysis Reports

Choosing algorithms and tuning parameters requires OpenCyto: How it works Gating Algorithms (12+ available) Pick one! Tuning Parameters Pick these too! Choosing algorithms and tuning parameters requires translating expert knowledge from one domain to another.

If you’re not a computational person… How to use automated gating tools

OpenCyto: a simple example Cytokine gating routine Three tuning parameters. Affect the position of the gate Defaults are not always optimal. Manual tuning not feasible for hundreds of gates and samples.

Learn Gating Parameters from Data Manual gates are a great source of expert knowledge. Scan through each manual gate. Heuristics guess at the type of gate. Short-list algorithms. Optimize tuning parameters to maximize average accuracy, F-measure, relative to training data. Choose algorithm with best performance

Inference of OpenCyto Templates from Manual Gates Build templates for hundreds of cell populations. Push the problem of QC downstream post-gating (where it’s much easier). Examine gate thresholds / population statistics across samples to identify and correct outliers automatically. Gate Imputation Look for “nearest” sample that passed QC Copy its gate to the outlier. QC at each gate. Prevent errors from propagating through gating tree. Gates are still data-driven. Use data to pick gating algorithm and tuning parameters. Need relatively few gated samples (as few as 15).

Some Results

HVTN Trial ICS Data Study of 82 subjects in 3 treatment groups. 5 stimulations (4 HIV proteins, 1 negative control - 2x replicate). 384 samples. 8.2 Gb of data. 165 manually gated cell subsets. Data processed with openCyto Learned template from 15 samples. Gated remaining data using the template.

Estimated Bandwidth Parameters for Cytokine Gates

Trained gates have reasonable accuracy down to tree depth 7 F-measure as a metric of training success. Rare cell populations: smaller F-measure doesn’t necessarily mean gate is wrong. Sensitive for small populations.

How do the gates compare? We are dealing with exceedingly rare cell subsets

Training: Cell population statistics are consistent with manual gates Tells us which populations should be trusted and which ones should be carefully reviewed (or the template edited by hand).

Training: Cell Counts are consistent with manual gates

Test Set Results on (N = 379) 56,092 cell populations Absolute error << relative error Relative error in rare cell subsets Does it matter?

Key Question: Do the differences affect inference? Fit a linear model to the background-adjusted proportions for each of the 408 (6 cytokines x 4 stims x 17 parents) cell populations. Assess difference in magnitude between treatment arms. Compare effect size between manual and automated gating.

Consistent Inference between Automated an Manually Gated Data Excellent agreement 20 cell subsets / stimulations differ between treatment arms.

Visualize Background Adjusted Cell Population Proportions

Summary and ongoing work New tool for learning openCyto templates from manual gates Few examples needed for training Good agreement with expert manual gates Performs well for rare cell subsets Ongoing Continue validation on retrospective data sets Implement as a regular pipeline component. Standardize interface and release R package.

OpenCyto Command Line Tools http://www.github.com/RGLab/opencytoCL A command-line interface to OpenCyto Beta release Do simple things well… Import FJ workspaces process FCS files (compensate, transform, annotate) gate data using openCyto templates export population statistics

Acknowledgements RGLab HVTN VISC FlowJo VRC Raphael Gottardo Steve De Rosa Mike Jiang Julie McElrath Phu Van Peter Gilbert Evan Greene Kirsten Cohen VISC FlowJo Paul Obrecht Mike Stadnisky Jay Almarode VRC Mario Roederer Kathy Foulds