Validation of an Automated Gating Pipeline for ICS Assays in Vaccine Trials Greg Finak, Ph.D., Senior Staff Scientist Marylou Ingram ISAC Scholar Vaccine and Infectious Disease Division Fred Hutchinson Cancer Research Center
Flow Cytometry At the Fred Hutch Intracellular Cytokine Staining Assay is the bread- and-butter assay for the HVTN. 1000s of patient samples / year. Manual analysis and current tools limit the number of cell populations that can be explored. 10 colors 12 colors 8 colors 17 colors 27 colors 2002 Today In development CD4 / CD8 T-cells &3 cytokines TfH, Memory, NK & 7 cytokines 16 colors
Challenges for automated gating Reproducibility Validation Ease of use
OpenCyto General framework for reproducible automated analysis. Built around R/BioConductor. Automated hierarchical analysis. Flexible and powerful but… Requires expert knowledge (R, programming, statistics, machine learning) Makes it difficult to use for the uninitiated. Finak G, et al. PLoS Comput. Biol. (2014)
Cell Population Definition OpenCyto: How it works Abstract Gating Template Gating Algorithms (12+ available) Pick one! Tuning Parameters Pick these too! Cell Population Definition Name Parent Channels + or - Import, compensate, transform, automated gating, summarize and export Raw FCS files Summary Tables, Plots, Analysis Reports
Choosing algorithms and tuning parameters requires OpenCyto: How it works Gating Algorithms (12+ available) Pick one! Tuning Parameters Pick these too! Choosing algorithms and tuning parameters requires translating expert knowledge from one domain to another.
If you’re not a computational person… How to use automated gating tools
OpenCyto: a simple example Cytokine gating routine Three tuning parameters. Affect the position of the gate Defaults are not always optimal. Manual tuning not feasible for hundreds of gates and samples.
Learn Gating Parameters from Data Manual gates are a great source of expert knowledge. Scan through each manual gate. Heuristics guess at the type of gate. Short-list algorithms. Optimize tuning parameters to maximize average accuracy, F-measure, relative to training data. Choose algorithm with best performance
Inference of OpenCyto Templates from Manual Gates Build templates for hundreds of cell populations. Push the problem of QC downstream post-gating (where it’s much easier). Examine gate thresholds / population statistics across samples to identify and correct outliers automatically. Gate Imputation Look for “nearest” sample that passed QC Copy its gate to the outlier. QC at each gate. Prevent errors from propagating through gating tree. Gates are still data-driven. Use data to pick gating algorithm and tuning parameters. Need relatively few gated samples (as few as 15).
Some Results
HVTN Trial ICS Data Study of 82 subjects in 3 treatment groups. 5 stimulations (4 HIV proteins, 1 negative control - 2x replicate). 384 samples. 8.2 Gb of data. 165 manually gated cell subsets. Data processed with openCyto Learned template from 15 samples. Gated remaining data using the template.
Estimated Bandwidth Parameters for Cytokine Gates
Trained gates have reasonable accuracy down to tree depth 7 F-measure as a metric of training success. Rare cell populations: smaller F-measure doesn’t necessarily mean gate is wrong. Sensitive for small populations.
How do the gates compare? We are dealing with exceedingly rare cell subsets
Training: Cell population statistics are consistent with manual gates Tells us which populations should be trusted and which ones should be carefully reviewed (or the template edited by hand).
Training: Cell Counts are consistent with manual gates
Test Set Results on (N = 379) 56,092 cell populations Absolute error << relative error Relative error in rare cell subsets Does it matter?
Key Question: Do the differences affect inference? Fit a linear model to the background-adjusted proportions for each of the 408 (6 cytokines x 4 stims x 17 parents) cell populations. Assess difference in magnitude between treatment arms. Compare effect size between manual and automated gating.
Consistent Inference between Automated an Manually Gated Data Excellent agreement 20 cell subsets / stimulations differ between treatment arms.
Visualize Background Adjusted Cell Population Proportions
Summary and ongoing work New tool for learning openCyto templates from manual gates Few examples needed for training Good agreement with expert manual gates Performs well for rare cell subsets Ongoing Continue validation on retrospective data sets Implement as a regular pipeline component. Standardize interface and release R package.
OpenCyto Command Line Tools http://www.github.com/RGLab/opencytoCL A command-line interface to OpenCyto Beta release Do simple things well… Import FJ workspaces process FCS files (compensate, transform, annotate) gate data using openCyto templates export population statistics
Acknowledgements RGLab HVTN VISC FlowJo VRC Raphael Gottardo Steve De Rosa Mike Jiang Julie McElrath Phu Van Peter Gilbert Evan Greene Kirsten Cohen VISC FlowJo Paul Obrecht Mike Stadnisky Jay Almarode VRC Mario Roederer Kathy Foulds