Automating NGS Gene Panel Analysis Workflows

Slides:



Advertisements
Similar presentations
Grant review at NIH for statistical methodology Jeremy M G Taylor Michelle Dunn Marie Davidian.
Advertisements

Test Automation Success: Choosing the Right People & Process
Next–generation DNA sequencing technologies – theory & practice
Key Considerations for Report Generation & Customization Richard Wzorek Director, Production IT Confidential © Almac Group 2012.
DNAseq analysis Bioinformatics Analysis Team
Variant Calling Workshop Chris Fields Variant Calling Workshop v2 | Chris Fields1 Powerpoint by Casey Hanson.
NGS data processing Bioinformatics tips, tools of the trade and pipeline writing Na Cai 4 th year DPhil in Clinical Medicine Supervisor: Jonathan Flint.
Pathogen Informatics 21 st Nov 2014 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics.
High Throughput Sequencing
Bioinformatics Tips NGS data processing and pipeline writing
NGS Analysis Using Galaxy
Whole Exome Sequencing for Variant Discovery and Prioritisation
DRAW+SneakPeek: Analysis Workflow and Quality Metric Management for DNA-Seq Experiments O. Valladares 1,2, C.-F. Lin 1,2, D. M. Childress 1,2, E. Klevak.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Dr. Andreas Scherer President and CEO Golden Helix, Inc. Twitter: andreasscherer Utilizing cancer sequencing in the clinic: Best.
NIH Extracellular RNA Communication Consortium 2 nd Investigators’ Meeting May 19 th, 2014 Sai Lakshmi Subramanian – (Primary
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Variant Calling Workshop.
Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.
Tutorial 6 High Throughput Sequencing. HTS tools and analysis Review of resequencing pipeline Visualization - IGV Analysis platform – Galaxy Tuning up.
The iPlant Collaborative
© 2012 Genomatix GeneGrid finding disease causing variants in NGS data Claudia Gugenmus Genomatix Software GmbH Bayerstrasse 85a
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Personalized genomics
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
From Reads to Results Exome-seq analysis at CCBR
Million Veteran Program: Industry Day Genomic Data Processing and Storage Saiju Pyarajan, PhD and Philip Tsao, PhD Million Veteran Program: Industry Day.
Challenges in interpreting and counseling of Next Generation Sequencing (NGS) results Sara Taghizadeh PhD student of medical genetic in Genetics Research.
Canadian Bioinformatics Workshops
Expediting Precision Medicine Initiatives for Clinical Genomics and Pharma through the Use of Knowledge Automation and Analytics Presenters: Dr. Scott.
Information Retrieval in Practice
Quality Control Metrics for DNA Sequencing
Konstantin Okonechnikov Qualimap v2: advanced quality control of
Data and Hartwig Medical Foundation
Virginia Commonwealth University
Using command line tools to process sequencing data
NGS File formats Raw data from various vendors => various formats
Canadian Bioinformatics Workshops
Lesson: Sequence processing
Cancer Genomics Core Lab
Next Generation Sequencing Analysis
University of Chicago and ANL
CyVerse Discovery Environment
Variant Calling Workshop
Figure 2: Make a component
MiSeq Validation Pipeline
EMC Galaxy Course November 24-25, 2014
Assessment of HaloPlex Amplification for Sequence Capture and Massively Parallel Sequencing of Arrhythmogenic Right Ventricular Cardiomyopathy–Associated.
Workshop on Microbiome and Health
Yonglan Zheng Galaxy Hands-on Demo Step-by-step Yonglan Zheng
Using Galaxy for Molecular Assay Design
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Validation of a Next-Generation Sequencing Pipeline for the Molecular Diagnosis of Multiple Inherited Cancer Predisposing Syndromes  Paula Paulo, Pedro.
Assessment of HaloPlex Amplification for Sequence Capture and Massively Parallel Sequencing of Arrhythmogenic Right Ventricular Cardiomyopathy–Associated.
Deep Phenotyping for Deep Learning (DPDL): Progress Report
RNA-SEQ IN PPMI Whole-Blood samples
Validation and Implementation of a Custom Next-Generation Sequencing Clinical Assay for Hematologic Malignancies  Michael J. Kluk, R. Coleman Lindsley,
Maximize read usage through mapping strategies
Yating Liu July 2018 G-OnRamp workshop
BF528 - Genomic Variation and SNP Analysis
Canadian Bioinformatics Workshops
TS Tumor Panel (15 Genes) Overview
Computational Pipeline Strategies
Genomic & RNA Profiling Core Facility
Alignment and CNV analysis in cattle
Quality Control & Nascent Sequencing
Presentation transcript:

Automating NGS Gene Panel Analysis Workflows Gabe Rudy, VP of Product & Engineering 20 Most Promising Biotech Technology Providers Top 10 Analytics Solution Providers Hype Cycle for Life sciences

NIH Grant Funding Acknowledgments Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under: Award Number R43GM128485 Award Number 2R44 GM125432-01 Award Number 2R44 GM125432-02 Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005 PI is Dr. Andreas Scherer, CEO Golden Helix. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Who Are We? Golden Helix is a global bioinformatics company founded in 1998 Filtering and Annotation ACMG Guidelines Clinical Reports CNV Analysis Pipeline: Run Workflows Variant Warehouse Centralized Annotations Hosted Reports Sharing and Integration CNV Analysis GWAS | Genomic Prediction Large-N Population Studies RNA-Seq Large-N CNV-Analysis

Cited in 1,000s of Peer-Reviewed Publications

Over 400 Customers Globally

When you choose Golden Helix, you receive more than just the software SOFTWARE IS VETTED 20,000+ users at 400+ organizations Quality & feedback DEEPLY ENGRAINED IN SCIENTIFIC COMMUNITY Give back to the community Contribute content and support SIMPLE, SUBSCRIPTION- BASED BUSINESS MODEL Yearly licensing fee Unlimited training & support INNOVATIVE SOFTWARE SOLUTIONS Cited in 1,000s of publications

Motivation for Automation Reduce hands-on steps Remove chance for human error Increase throughput of the lab Maximize the time spent by lab personnel on interpretation

Outline Review NGS gene panel analysis process Discuss strategies & guidelines to automate each step Example automated pipeline demonstration

NGS Analysis Process FASTQ BAM Report VCF Raw Seq Data Target Coverage CNV Calling CNV Interpret Report VCF Variant Annotation Filter & Rank ACMG Scoring

Raw Seq Data ➜ FASTQ Convert raw image data to FASTQ Demultiplexing: Using barcodes to split lanes into per-sample FASTQ files Integrated Onboard MiniSeq and MiSeq NovaSeq, HiSeq, NextSeq: “bcl2fastq” Input: Run Output Folder (BCL Files) sample_sheet.csv or Manifest File Output: One directory per sample, or one pair of FASTQ files per sample

FASTQ ➜ BAM + VCF Per-Sample Steps: Align with BWA-MEM, Sort Mark Duplicates Realign Insertions/Deletions Recalibrate Base Quality Scores Call Variants Input: Per-Sample FASTQ Reference Sequence Known InDel Sights (for Realign) dbSNP (for Identifiers) Variant Caller Parameters Output: Polished BAM Recalibration Plots Per-Sample VCF files

BAM ➜ Called CNVs VS-CNV can call CNVs from NGS coverage Normalizes coverage and compares to a pool of reference samples Uses multiple metrics to make calls from single targets to whole chromosome aneuoploidy Input: Target Regions CNV Reference Samples Output: Per-Sample CNV Calls

CNV Filtering and Analysis Multiple QC metrics provided per CNV call Quality flags Average Z-Score / Ratios P-Value Annotations help remove benign and highlight candidate clinical CNVs Input: Raw CNV Calls Filtering Parameters CNV Annotations Output: Annotated, High Quality Calls

VCF ➜ Prioritized Variants Quality metrics from variant caller provide utility for optimizing precision Annotate public and proprietary annotation sources Algorithms for scoring, prioritizing by phenotype Input: Raw Variant Calls Filtering Parameters Variant Annotations Sample Phenotypes / Gene Lists Output: Annotated Candidate Variants

ACMG Scoring Variants Candidate variants should be evaluated with appropriate guidelines Previous interpretations incorporated Workflow support for following guidelines accurately and efficiently Partly automated, but ultimately requires hands on interpretation of novel variants Input: Candidate variants Output: Scored and interpreted variants ready for clinical reporting

Clinical Report Deliverable of the clinical genetic test Lab and test specific report template that incorporates all relevant output Manually reviewed and signed off by Lab Director Input: Patient information Interpreted CNVs Interpreted Variants Output: HTML, PDF or other structured data format

Automation Guidelines and Strategies Use a script to chain together command line tools Allow the script to take input parameters that may change Have consistent naming and output structure Logs as part of output structure Precompute as much as possible, making the “jump in” point for analysis quick to open

Automation Demo Starting Point: Per-sample FASTQ Files Samples.csv with patient information File system watcher for samples.csv alongside a batch of FASTQ files Kick off automation pipeline Let’s start it and watch!

Automated Pipeline Components Sentieon Secondary: Alignment with BWA-Mem Sort, Dedup, Realign, Recalibrate Call Variants VarSeq (via VSPipeline) Create Project for Batch Steps defined by Project Template: VS-CNV Coverage & Call Annotate & Filter CNVs and Variants VSClinical ACMG Auto-Classifier VSReports Auto-Fill

Hand-On Steps Outputs of Automation: Open project, review sample stats BAM, Recalibration PDF, VCF files Excel Spreadsheet with variants + CNVs Draft HTML report Prepared project Open project, review sample stats Per Sample: QC and Interpret CNVs Interpret Candidate Variants Finalize Report Export as PDF

NIH Grant Funding Acknowledgments Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under: Award Number R43GM128485 Award Number 2R44 GM125432-01 Award Number 2R44 GM125432-02 Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005 PI is Dr. Andreas Scherer, CEO Golden Helix. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

GHI Updates ACMG 2019 – Seattle, WA – April 2-6, 2019 New eBook Release: Clinical Variant Analysis – Applying the ACMG Guidelines to Analyze Germline Diseases ACMG 2019 – Seattle, WA – April 2-6, 2019 Stop by the Golden Helix booth #622 for one of our live demos or one-on-one conversation