RNA-Seq as a Discovery Tool

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Functional Genomics with Next-Generation Sequencing
Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
Marius Nicolae Computer Science and Engineering Department
RNA-Seq based discovery and reconstruction of unannotated transcripts
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Walk-thru of CAGE exercise Also at /tag_analysis/ /tag_analysis/
Peter Tsai Bioinformatics Institute, University of Auckland
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Presented by: Pham Kien Cuong NUS Graduate School for Integrative Sciences and Engineering.
TOPHAT Next-Generation Sequencing Workshop RNA-Seq Mapping
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
RNA-Seq based discovery and reconstruction of unannotated transcripts in partially annotated genomes 3 Serghei Mangul*, Adrian Caciula*, Ion.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
RNA-Seq and RNA Structure Prediction
Whole Exome Sequencing for Variant Discovery and Prioritisation
Maximum likelihood estimation of relative transcript abundances Advanced bioinformatics 2012.
Todd J. Treangen, Steven L. Salzberg
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
The iPlant Collaborative
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Simon v RNA-Seq Analysis Simon v
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Detect alternative splicing
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Experimental Verification Department of Genetic Medicine
Kallisto: near-optimal RNA seq quantification tool
Gene expression estimation from RNA-Seq data
Fig. 8. Recurrent copy number amplification of BRD4 gene was observed across common cancers. Recurrent copy number amplification of BRD4 gene was observed.
Reference based assembly
From: TopHat: discovering splice junctions with RNA-Seq
Genome organization and Bioinformatics
Proteomics Informatics David Fenyő
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Alternative Splicing QTLs in European and African Populations
Basic Local Alignment Search Tool (BLAST)
Artefacts and Biases in Gene Set Analysis
Working with RNA-Seq Data
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Proteomics Informatics David Fenyő
Sequence Analysis - RNA-Seq 2
Schematic representation of a transcriptomic evaluation approach.
Sequence Analysis - RNA-Seq 1
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

RNA-Seq as a Discovery Tool Julia Salzman

Deciphering the Genome

Power of RNA-Seq: Quantification and Discovery Salzman, Gawad, Wang Lacayo, Brown, 2012 RNA Isoform specific gene expression Gene fusions Overlooked RNA structural variants

Paired-end RNA-Seq Matched sequences are obtained for each library molecule CTTC…..GAAG GGAC…..GCCT Data: millions of 70-150+ bp A/C/G/T sequences

Part 1: Isoform Specific Expression

Example: Paired-end Data Aligned Some reads are informative about isoform-specific expression

Paired-end RNA-Seq for RNA Isoform Specific Gene Expression Exon 4 Exon 1 Since the size distribution of library molecules is known, inferred insert lengths can be used to increase statistical power and inference Rnpep Goal: estimate the expression of each isoform? Nontrivial : we only observe fragments of sequences

Insert Length Distributions Insert lengths of entire library (pooled) can be calculated and used to precisely estimate the distribution of sizes of cDNA in the library: 100 200 300 Base pairs Sequenced molecule length

Paired-end RNA-Seq Model Compute genome-wide insert length distribution 100 200 300 Base pairs Sequenced molecule length Mapped to Isoform 1  length 150 Mapped to Isoform 2  length 90 Salzman, Jiang, Wong 2011

Using PE for quantification is statistically more powerful PE model is a statistical improvement over naïve models and has optimal information reduction “Information” gain using PE Sequencing Overall, using “mate pair” information, more power, but sometimes experimental artifacts can effect results

Paired-end Size Distributions are Foundation for Tophat and other PE-RNA Seq Algorithms Summary and Problems: rely on a reference assume uniformity of size distributions in library over look biases’ Rep.1 Rep.2

Paired-End RNA-Seq for Gene Fusions in Ovarian Tumors (2009) Paired-end sequencing of poly-A selected RNA from 12 late stage tumors– genome wide search Top hit of our novel algorithm : ESRRA-C11orf20 C11orf20 ESRRA Fusion Isoform-specific estimation: ESRRA and the fusion are expressed at roughly equal magnitude (Salzman, Jiang, Wong)

Part 2: Gene Fusions

Recurrent Gene Fusions in Cancer A handful of recurrent fusions in solid tumors PAX8 -PPARγ fusion (thyroid cancer) EML4-ALK fusion (non small cell lung cancer) TMPRSS2-ERG family fusion (prostate cancer) Not Genome-wide More to be learned by unbiased study of RNA

Fusion Discovery 2 flavors Totally “de novo” discovery Search for any RNA fragments out of order with respect to the reference genome– not necessarily coinciding with exon boundaries Noisy Discovery with a reference database Discover fusions at annotated exon boundaries (protein coding) and better statistical checks Misses some fusions

Reference Approach Search for gene fusions with exon A in gene 1 spliced to exon B of gene 2 Exon A Exon B

Algorithm (with respect to reference) Remove all PE reads consistent with the reference Identify gene pairs PE reads where (read1, read2) map to (gene1, gene2) Find PE reads of the form: (gene A, gene A-B junction) Exon A Exon B

Paired-End RNA-Seq for Gene Fusions in Ovarian Tumors Paired-end sequencing of poly-A selected RNA from 12 late stage tumors– genome wide search Top hit of our algorithm : ESRRA-C11orf20 C11orf20 ESRRA Fusion Isoform-specific estimation: ESRRA and the fusion are expressed at roughly equal magnitude (Salzman, Jiang, Wong) Salzman et al, 2011

Part 3: Exploratory Analysis of RNA Rearrangements

Exploratory analysis: biological “noise” in RNA-Seq Data Wildtype genome: DNA Canonical transcript Locally rearranged DNA Scrambled transcript Is exon scrambling present in rRNA-depleted RNA?

Bioinformatic Analysis Thousands of exon scrambling events in RNA from human leukocytes and cancer samples Wildtype genome: DNA Canonical transcript Inconsistent with the reference genome!

Potential Biological Mechanisms for RNA Rearrangements DNA Rearrangement RNA rearrangement Trans-splicing Template switching PCR artifact

Analysis of Leukocyte Data Exons in ‘scrambled’ (non-increasing) order with respect to canonical exon order Thousands of genes with evidence of exon scrambling Naïve estimate of fractional abundance of scrambled read rate: all read rate (per transcript)

100s of Transcripts with High Fractions of Scrambled Isoforms Canonical Isoform 100s of genes < 25% Scrambled Isoform > 75% 100s of transcripts from B cells, stem cells and neutrophils have >50% copies from scrambled isoform

What Models Can Explain Exon Scrambling in RNA?

Model 1 to Explain RNA Exon Scrambling

Model 1 Prediction Can be made statistically precise Model 1 is statistically inconsistent with vast majority of data A subset of genes have evidence of tandem duplication in mRNA Against Model 1 For Model 1 2000- 1000- 100 - Transcripts with evidence

Alternative Model Model and data are consistent

Mining RNA-Seq Data for Evidence Consistent with Circular RNA? In poly-A depleted samples, expect to see strong evidence of scrambled exons (circular RNA) In poly-A selected samples, expect to see little evidence of scrambled exons (circular RNA)

Poly-A Depleted Samples Enriched for Scrambled Exons Align all reads to a custom database

Summary of RNA-Seq for NGS RNA-Seq can be used for discovery Tophat and other fusion/splicing algorithms gives a broad picture May have significant noise Miss important features of RNA expression

(feel free to contact me for the algorithm to identify circular RNA!) Currently, all published/downloadable algorithms will miss identifying circular RNA! (feel free to contact me for the algorithm to identify circular RNA!) In poly-A depleted samples, expect to see strong evidence of scrambled exons (circular RNA) In poly-A selected samples, expect to see little evidence of scrambled exons (circular RNA)