HL7 Clinical Sequencing Symposium Oncology Use Cases Ellen Beasley, Ph.D.September 14, 2011 VP, Ion Bioinformatics.

Slides:



Advertisements
Similar presentations
Functional Genomics with Next-Generation Sequencing
Advertisements

An Introduction to Studying Expression Data Through RNA-seq
Acquisition of tumour multidrug resistance inevitable in most advanced solid tumours – Failing to cure the majority of advanced solid tumours – Declining.
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Transcriptomics Jiri Zavadil, PhD Molecular Mechanisms and Biomarkers
Reported by R5 李霖昆 Supervised by 楊慕華 大夫 Genomics-Driven Oncology: Framework for an Emerging Paradigm Review article Journal of Clinical Oncology 31, 15,
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Bioinformatics at Molecular Epidemiology - new tools for identifying indels in sequencing data Kai Ye
Peter Tsai Bioinformatics Institute, University of Auckland
Next-generation sequencing
Canadian Bioinformatics Workshops
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Transcriptomics Jim Noonan GENE 760.
Introduction Integrative Analysis of Genomic Variants in Carcinogenesis Syed Haider, Arek Kasprzyk, Pietro Lio Artificial Intelligence and Computational.
I inherited What??? You and Your Genes: The Explosive New World of Genetics David Finegold, M.D.
High Throughput Sequencing
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Supplementary Figure 1. Somatic mutation spectrum # Substitutions # Substitutions per Mb b c a Repeats Pseudogenes Whole genome Splice sites Non-coding.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Assay Development Breakout (red) Who was in the room? About half of attendees are active NGS users N=1 doing whole genome analyses Everyone else doing.
Genetics-multistep tumorigenesis genomic integrity & cancer Sections from Weinberg’s ‘the biology of Cancer’ Cancer genetics and genomics Selected.
HaloPlexHS Get to Know Your DNA. Every Single Fragment.
E XOME SEQUENCING AND COMPLEX DISEASE : practical aspects of rare variant association studies Alice Bouchoms Amaury Vanvinckenroye Maxime Legrand 1.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
No reference available
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Accessing and visualizing genomics data
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
INTERPRETING GENETIC MUTATIONAL DATA FOR CLINICAL ONCOLOGY Ben Ho Park, M.D., Ph.D. Associate Professor of Oncology Johns Hopkins University May 2014.
Recent Advances in Genomic Science Julian Sampson Institute of Medical Genetics, Cardiff.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
From Reads to Results Exome-seq analysis at CCBR
A comparison of somatic mutation callers in breast cancer samples and matched blood samples THOMAS BRETONNET BIOINFORMATICS AND COMPUTATIONAL BIOLOGY UNIT.
Emerging Genomic Technologies: Extending the Application of Genomics to Prediction, Diagnosis, Monitoring, and Early Detection Luis A. Diaz, M.D. Johns.
Interpreting exomes and genomes: a beginner’s guide
Canadian Bioinformatics Workshops
Lesson: Sequence processing
Cancer Genomics Core Lab
RNA-Seq analysis in R (Bioconductor)
Gene expression.
Comparison of Clinical Targeted Next-Generation Sequence Data from Formalin-Fixed and Fresh-Frozen Tissue Specimens  David H. Spencer, Jennifer K. Sehn,
DNA Extraction of Lung Cancer Samples for Advanced Diagnostic Testing
What makes a mutant?.
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Sequencing Data Analysis
A Targeted High-Throughput Next-Generation Sequencing Panel for Clinical Screening of Mutations, Gene Amplifications, and Fusions in Solid Tumors  Rajyalakshmi.
Annotation of Sequence Variants in Cancer Samples
Annotation of Sequence Variants in Cancer Samples
RNA sequencing (RNA-Seq) and its application in ovarian cancer
The Genetic Basis for Cancer Treatment Decisions
Diverse abnormalities manifest in RNA
BF528 - Genomic Variation and SNP Analysis
BF528 - Whole Genome Sequencing and Genomic Variation
Canadian Bioinformatics Workshops
TS Tumor Panel (15 Genes) Overview
Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours By: Anh Pham.
Sequencing Data Analysis
Presentation transcript:

HL7 Clinical Sequencing Symposium Oncology Use Cases Ellen Beasley, Ph.D.September 14, 2011 VP, Ion Bioinformatics

Uses and Complexities –what we need to detect –what samples do we need Workflows Bioinformatics Standards Needs Overview

Identify genetic variants causing tumor formation, progression or therapy sensitivity/resistance Current applications for sequencing in cancer are not readily identified with existing molecular methods Over the next 1-5 years, sequencing likely to be employed in instances when traditional tumor profiling fails to definitively select a treatment Adoption of sequencing methods will be cancer type and stage dependent Uses

Most biological variants have been implicated in cancer Large Scale (Structural; SV) –Deletion, Duplication, Transposition, Inversion Small Scale –SNP/SNV, STR, microsatellites –Insertion, Deletion, colocalized Ins+Del Epigenetic –DNA Methylation, Histone modification RNA –Expression, Splicing, Localization, siRNA, ncRNA Protein –Translation, Folding, Modification, Localization Cancer mutations: overview

MR Stratton et al. Nature 458, (2009) doi: /nature07943 Heterogeneity and Cellularity: Somatic mutation accumulation in life & cancer

Higher coverage is required for low frequency allele detection in mixtures Allele ratio Coverage NormalTumor Clonality and cellularity varies between cancers and tumor samples Ability to detect mutations at <25% mixture is important Courtesy of Richard Gibbs, BCM

Tumor: Formalin-fixed, paraffin-embedded (FFPE) samples –Standard for clinic samples –Usually only tumor sample, no control tissue –Difficult to extract and poor preservation Fresh/frozen tumor sequencing –Solid tumor sample obtained during biopsy –Less common practice Paired tumor and control samples: Solid tumor sample obtained during biopsy Control either blood and/or adjacent “normal” tissue For blood tumors, flow-sorted cells may be obtained, normal and diseased Cancer sequencing samples

Paired samples for cancer transcriptome Primary cancer tumor Adjacent normal tissue Blood cells ?

Allelic Ratios NT Allelic imbalance in expression DNA methylation Allelic Ratios NT A X X X Transcribed somatic mutations are detectable Analogous to allelic imbalance between two samples Problem is to distinguish from clonally amplified library errors

Heterogeneity within and across tumor types – difficult to identify rules for interpretation High rate of abnormalities (driver vs. passenger) – prioritization of results becomes a larger challenge than detection Quality of tissue directly impacts the quality data generated Large scale data generation requires an analytical pipeline to ensure close to a “real-time” interpretation of the results Complexities: Lessons Learned

Sample quality and quantity influences required depth of coverage and power to detect low frequency variants –Pathology report indicating tumor cellularity (ideal >70% cellularity) –SNP array-based methods can be used to estimate cellularity and ploidy to guide/interpret sequencing Sample availability –Sample availability depends on standard of care Therefore, sample availability will vary with cancer type and regional treatment norms Circulating tumor cells could result in mutant alleles showing up in normal DNA sequence Confounding factors & challenges

WORKFLOWS Workflow variations Somatic Tumor / Normal Comparison Germline / Somatic Comparison Gene Expression Bioinformatics

Sample to Reads Library Construction Sequencing Sample Extraction FFPE Deparaffinization Reverse cross-linking Enrichment

Gene Panels (Amplicon/Enrichment) (>500x) –Tens to hundreds of loci targeted, regions where mutants are known to be associated with cancer Whole Exome (>60x) –Fragment or paired-end approach Whole Genome (>30X) –Standard single chemistry whole genome approach –Mixed library whole genome approach –Shallow mate (10X) + deep targeted (30X exome) approach Transcriptome –Single tumor sample –Paired samples Tumor and adjacent “normal” tissue (best) Tumor and reference normals Cancer sequencing formats

Pipeline for cancer genomics: Tumor + Normal

Due to library prep and sequencing biases, most analyses are best carried out as comparisons between two conditions (normal vs tumor) Normal tissue samples (adjacent) are difficult to obtain –Not indicated in primary tumor resection on normal care, only by special research protocol –Adjacent tissue may not be the same tissue and may be contaminated with cancer cells Sample amount may be limiting for small tumors –RNA can be degraded –FFPE samples Transcriptional analysis of cancer

Transcription pipeline for cancer: Tumor and adjacent

Requires high coverage of transcripts Reduction of redundancy due to clonal amplification of fragments would make this more cost-effective Analysis needs to work hand in hand with specific library preparation protocol DNA sequence (tumor and normal) information if available should inform analysis Somatic mutations in RNA

BIOINFORMATICS Calls on Instrument Mapping Variant Calling Annotation Interpretation

A standard cancer genome analysis bioinformatics pipeline is needed to discover and report all relevant somatic alterations occurring in a tumor –Alignment or Assembly –Variant detection Point mutation detection (SNP, SNV) Small indel (insertions, deletions, colocalized insertion/deletion) Copy number variation Structural variation (inversions, translocations, breakpoint resolution) Detection requirements (for discussion) –99% of mutations as low as 5%; 10% FP? Tabular reporting and visualization –Graphical presentation Analysis pipeline Detect Annotate Map/ Align Map/ Align Interpret Generate Data Sample Prep

Raw read alignment counts across genome and per annotation Differential gene expression: coding and non-coding – Paired samples Alternative splicing –Single sample –Paired samples Novel transcripts (non-coding RNA, exons) – Single sample Allelic imbalance –Single samples – Paired samples Gene Fusions –Single sample –Paired sample Expressed mutations –Single sample –Paired sample Breadth of transcript analysis tools

Integration of point mutation data with structural variation can dramatically change the impact of genomic alterations Effect of gene dosage is important Homozygous point mutation or a combination of a mutation within a region of copy number change could destroy a gene activity Correlation of genomic alterations with gene expression pattern may point to mechanistic significance e.g. allelic imbalance with CNV; translocations with fusion transcripts Output of analysis algorithms should facilitate this analysis Common indexing to genome Content: e.g. indel sequence provided, etc. Data formats – need to expand as this unfolds RNA Editing Epigenetics (Methylation, Histone modification) Integrated analyses are more powerful … and more difficult to automate

Most of the annotations are already covered in HL7 CG WG draft documents: HL7 Version 2 Implementation Guide: Clinical Genomics; Fully LOINC-Qualified Genetic Variation Model, Release 2 May need to extend this to add other relevant annotations (e.g., COSMIC ID) Bioinformatics – Annotation

Greatest gap today: interpretation norms, databases and visualization Most interpretation is expert – integration of data types and knowledge of disease, pathways, drugs, etc. No approved sets of variant to disease/drug annotations are in common practice Most interpretive reports are currently unstructured Utility of interpretive reports would benefit from structure and prioritization Bioinformatics – Interpretation

Biology is complex! We’ll need to distinguish between research, translational, and clinical uses to prioritize common clinical uses for standards development Semantic standards for adding biological/clinical annotations to variants and evidence trails (citations) Metrics to describe quality/uncertainty of annotations Structured formats for interpretive reports In order to learn, genomic data must be integrated with downstream treatment decisions and outcomes Standards Development Needs

Thank You! © 2011 Life Technologies Corporation. All rights reserved. The trademarks mentioned herein are the property of Life Technologies Corporation or their respective owners. For Research Use Only. Not intended for animal or human therapeutic or diagnostic use.

APPENDIX Depth of coverage example – DNA Depth of coverage example – Somatic DNA Depth of coverage example – RNA Transcription Gene Fusion example

Variant calling: DNA ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC Prior: Probability: G=0.8 A=0.001 T=0.001 G=0.001 Call: G/G P-val: 0.1

Variant calling: DNA ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC GACCTGCTAGGCTAGGCTTAGGCATTA Prior: Probability: G=0.7 A=0.01 T=0.001 G=0.001 Call: G/G P-val: 0.05

Variant calling: DNA ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC GACCTGCTAGGCTAGACTTAGGCATTA Prior: Probability: G=0.7 A=0.01 T=0.001 G=0.001 Call: G/G P-val: 0.1

Variant calling: DNA ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC GACCTGCTAGGCTAGACTTAGGCATTA CGTGGTAGGACCTGCTAGGCTAGACT TAGGACCTGCTAGGCTAGACTTAGGC Probability: G=0.4 A=0.5 T=0.001 G=0.001 Call: G/A P-val: 0.01 Prior:

1. Call variants in normal sample at low stringency 2. Call tumor at high stringency 3. Subtract germline variants to produce a list of putative somatic mutation Subtractive calling of somatic mutations Trade-off between true positives and false positives in variant calling High stringency calling Low stringency calling

Variant Calling: Somatic DNA Mutations ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC GACCTGCTAGGCTAGACTTAGGCATTA CGTGGTAGGACCTGCTAGGCTAGACT TAGGACCTGCTAGGCTAGACTTAGGC Probability: G=0.5 A=0.5 T=0 G=0 Call: A P-val: TAGGCTT Germline : Priors: G=0.997 A=0.001 T=0.001 G=0.001

Variant Calling: Somatic DNA Mutations ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC GACCTGCTAGGCTAGACTTAGGCATTA CGTGGTAGGACCTGCTAGGCTAGACT TAGGACCTGCTAGGCTAGACTTAGGC Probability: G=0.5 A=0.5 T=0 G=0 Call: No call P-val: TAGACTT TAGGCTT Germline : Priors: G=0.5 A=0.5 T=0 G=0

Variant Calling: Somatic RNA Mutations ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC GACCTGCTAGGCTAGACTTAGGCATTA Probability: G=0.9 A=0.001 T=0.001 G=0.001 Call: G P-val: 0.01 GACCTGCTAGGCTAGACTTAGGCATTA Same start point

Variant Calling: Somatic RNA Mutations ACGTGGTAGGACCTGCTAGGCTAGGCTTAGGCATTAGGCATTGGCTTAC GGACCTGCTAGGCTAGGCTTAGGCATT CTGCTAGGCTAGGCTTAGGCATTAGGC GACCTGCTAGGCTAGACTTAGGCATTA Probability: G=0.9 A=0.001 T=0.001 G=0.001 Call: A P-val: GACCTGCTAGGCTAGACTTAGGCATTA CGTGGTAGGACCTGCTAGGCTAGACT TAGGACCTGCTAGGCTAGACTTAGGC High depth required

Gene Fusions Transcripts [Maher 2009]