Finding the Lost Treasure of NGS Data Yan Guo, PhD.

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
RNAseq.
DNAseq analysis Bioinformatics Analysis Team
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis (DNA) Yan Guo.
RNA-seq Analysis in Galaxy
High Throughput Sequencing
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Considerations for Analyzing Targeted NGS Data BRCA Tim Hague,CTO.
Next generation sequencing Xusheng Wang 4/29/2010.
Whole Exome Sequencing for Variant Discovery and Prioritisation
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Expression Analysis of RNA-seq Data
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Next Generation DNA Sequencing
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Cancer genomics Yao Fu March 4, Cancer is a genetic disease In the early 1970’s, Janet Rowley’s microscopy studies of leukemia cell chromosomes.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
The iPlant Collaborative
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Moderní metody analýzy genomu - analýza Mgr. Nikola Tom Brno,
Personalized genomics
Canadian Bioinformatics Workshops
Introduction to Variant Analysis of Exome- and Amplicon sequencing data Lecture by: Date: Training: Extended version see: Dr. Christian Rausch 29 May 2015.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Molecular Biology of Cancer AND Cancer Informatics (omics) David Boone.
From Reads to Results Exome-seq analysis at CCBR
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
SNP and Genomic analysis SNP/genomic signature Clinical sampling Personalized chemotherapy Personalized Targeted therapy Personalized RNA therapy Personalized.
Canadian Bioinformatics Workshops
Genomon a high-integrity pipeline for cancer genome and transcriptome sequence analysis Kenichi Chiba(1), Yuichi Shiraishi(1), Ai Okada(1), Hiroko.
Cancer Genomics Core Lab
Cloud based NGS data analysis
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Canadian Bioinformatics Workshops
EMC Galaxy Course November 24-25, 2014
Canadian Bioinformatics Workshops
Genomic alterations in breast cancer cell line MDA-MB-231.
Diverse abnormalities manifest in RNA
BF528 - Genomic Variation and SNP Analysis
Additional file 2: RNA-Seq data analysis pipeline
Canadian Bioinformatics Workshops
Sequence Analysis - RNA-Seq 2
Figure Genetic characterization of the novel GYG1 gene mutation (A) GYG1_cDNA sequence and position of primers used. Genetic characterization of the novel.
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Finding the Lost Treasure of NGS Data Yan Guo, PhD

VANGARD

Modules Overview for DNA-sequence Exome / whole-Genome Bam files bwa alignment FastQC bamQC fastq files structural variant analysis GATK refinement SNP/INDEL vcf files somatic mutation gene-level analysis gene associates Translocation, inversion, copy number variants gene coding changes realignment recalibration mark-duplication best practice filter dbsnp / indel resources

RNAseq Bam files tophat alignment FastQC SeQC fastq files cufflinks annotations cuffdiff comparisons Refinement cuffmerge gene-fusion analysis functional/ pathway cufflinks annotations cuffdiff comparisons genes identifying novel genes discovery cluster Gene List gene quantification

DNAseq SNPs Somatic Mutations Small Indels Large Structural Change CNV RNAseq Gene expression difference Splicing Variants Fusion Genes What do you expect to find in NGS data?

What you don’t expect to find in NGS data? Is targeted? Exome sequencing reads Mapped reads Targeted DNA Unmapped DNA reads Untargeted DNA Virus/Microbe DNA Contamination Intronic DNA Intergenic DNA Mitochondrial DNA Is mapped? No Yes

Exome Capture

Why do we care about intron and intergenic regions some introns can encode specific proteins and can be processed after splicing to form noncoding RNA molecules. (Rearick, Prakash et al. 2011)Rearick, Prakash et al Majority of the GWAS SNPs are not in coding regions (706 exon, 3986 intron, 3323 intergenic) The ENCODE Project: ENCyclopedia Of DNA Elements

GWAS catalog SNPs Kit Target total bases Missing Exon SNPs Missing intron SNPs Missing Intergenic SNPs SureSelect(v2) TrueSeq SeqCap EZ (v3.0)

Samples Average depth IntronicSplicing 1 ncRNA 2 Intergen ic Exonic Non- synonymous Stopgai n Stoploss Agilent (N=22) ≥ ≥ ≥ G (N=6) ≥ ≥ ≥ Illumina (N=6) ≥ ≥ ≥ Variant is within 2-bp of a splicing junction 2. Variant overlaps a transcript without coding annotation in the gene definition

Mitochondria Mitochondria play an important role in cellular energy metabolism, free radical generation, and apoptosis (Andrews, Kubacka et al. 1999; Verma and Kumar 2007).Andrews, Kubacka et al. 1999Verma and Kumar 2007 Mitochondrial DNA (mtDNA) is a maternally-inherited 16,569-bp closed-circle genome that encodes two rRNAs, 22 tRNAs, and 10 polypeptides. Dysfunctions in mitochondrial function are an important cause of many neurological diseases (Fernandez-Vizarra, Bugiani et al. 2007) and drug toxicities (Lemasters, Qian et al. 1999; Wallace and Starkov 2000) and may contribute to carcinogenesis and tumor progression (Modica-Napolitano and Singh 2004; Chen 2012).Fernandez-Vizarra, Bugiani et al. 2007Lemasters, Qian et al. 1999Wallace and Starkov 2000Modica-Napolitano and Singh 2004Chen 2012

Mitochondria Extraction Strategy

Results

Virus Known oncogenic viruses are estimated to cause 15 to 20 percent of all cancers in humans (Parkin 2006).Parkin 2006 Understanding the viral integration pattern of cancer- associated viruses may uncover novel oncogenes and tumor suppressors that are associated with cellular transformation. Viral genomes have been detected using off-target exome sequencing reads (Barzon, Lavezzo et al. 2011; Li and Delwart 2011; Chevaliez, Rodriguez et al. 2012; Radford, Chapman et al. 2012; Capobianchi, Giombini et al. 2013).Barzon, Lavezzo et al Li and Delwart 2011Chevaliez, Rodriguez et al Radford, Chapman et al. 2012Capobianchi, Giombini et al. 2013

One example using HNSCC

Virus Detection in HNSCC in TCGA Siteclin_hpv_ishclin_hpv_p16ExomeSeqlow_passRNAseqHPV Buccal Mucosa Oropharynx Tonsil

Existing Tools PathSeq (Kostic, Ojesina et al. 2011)Kostic, Ojesina et al VirusSeq (Chen, Yao et al. 2012)Chen, Yao et al ViralFusionSeq (Li, Wan et al. 2013)Li, Wan et al. 2013

SNP and Somatic Mutation Identification using RNAseq Data Traditionally, somatic mutations are detected using Sanger sequencing or RT-PCR by comparing paired tumor and normal samples. One obvious limitation of such methods is that we have to limit our search to a certain genomic region of interest. With the maturity of next generation sequencing, we can now screen all coding genes or even the whole genome for somatic mutations at a reasonable cost.

Why do we want to detect mutation in RNAseq data? You don’t have DNA sequencing data Detecting mutation was not the original goal, but why not There are much more RNAseq data than DNAseq data A mutation in RNA is more relevant than a mutation in DNA

Difficulties Not enough depth in the non-expressed genes to detect mutation Reverse transcribe RNA to cDNA introduce more error Hard to distinguish mutation from RNA editing In summary, somatic mutation detection using RNAseq data contains much more false positives.

Somatic Mutation Caller Designed Specifically for RNAseq Data

Other Ways you can mine your data

Summary Get your priority right, never design a study just for secondary analysis targets If you have old data, think about else you can do with it, try to maximize the full potential of your data At VANGARD, we help you with your basic genomic data analysis needs Advanced data analysis can be done through collaboration.

Acknowledgement Yu Shyr Tiger Sheng Chung-I Li Jiang Li Mike Guo David Samuels Chun Li