mRNA-Seq: methods and applications

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Capturing the chicken transcriptome with PacBio long read RNA-seq data OR Chicken in awesome sauce: a recipe for new transcript identification Gladstone.
An Introduction to Studying Expression Data Through RNA-seq
RNA-Seq based discovery and reconstruction of unannotated transcripts
12/04/2017 RNA seq (I) Edouard Severing.
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
DEG Mi-kyoung Seo.
RNA-seq: the future of transcriptomics ……. ?
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Data Analysis for High-Throughput Sequencing
Canadian Bioinformatics Workshops
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
Tyson A. Clark, Ph.D. February 11, 2015
RNA-seq Analysis in Galaxy
NGS Transcriptomic Workflows Hugh Shanahan & Jamie al-Nasir Royal Holloway, University of London.
Interrogating the transcriptome in all its diversity
Biases in RNA-Seq data Aim: to provide you with a brief overview of biases in RNA-seq data such that you become aware of this potential problem (and solutions)
Software for Robust Transcript Discovery and Quantification from RNA-Seq Ion Mandoiu, Alex Zelikovsky, Serghei Mangul.
RNA-Seq and RNA Structure Prediction
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Ji-hye Choi August Introduction (2006) ABRF-NGS (the Association fo Biomolecular Resource Facilities next-generation sequencing study)
Todd J. Treangen, Steven L. Salzberg
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Next Generation DNA Sequencing
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
The iPlant Collaborative
Transcriptome Analysis by High-Throughput Sequencing (RNA-Seq) Mark Reimers Virginia Institute for Psychiatric and Behavioral Genetics.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
RNA-seq: Quantifying the Transcriptome
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
An Integer Programming Approach to Novel Transcript Reconstruction from Paired-End RNA-Seq Reads Serghei Mangul Department of Computer Science Georgia.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Aim: to provide you with a brief overview of biases in RNA-seq data such that you become aware of this potential problem (and solutions) Biases in RNA-Seq.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
RNA Quantitation from RNAseq Data
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO
Sequence Analysis 2- RNA-Seq
Reference based assembly
Transcriptome analysis
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Quantitative analyses using RNA-seq data
Sequence Analysis - RNA-Seq 2
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

mRNA-Seq: methods and applications Jim Noonan GENE 760

Introduction to mRNA-seq Technical methodology Read mapping and normalization Estimating isoform-level gene expression De novo transcript reconstruction Sensitivity and sequencing depth Differential expression analysis

mRNA-seq workflow Wang et al. Nat Rev Genet 10:57 (2009) Martin and Wang Nat Rev Genet 12:671 (2011)

Illumina RNA-seq library preparation Capture poly-A RNA with poly-T oligo attached beads (100 ng total) (2x) RNA quality must be high – degradation produces 3’ bias Non-poly-A RNAs are not recovered Fragment mRNA Synthesize ds cDNA Ligate adapters Amplify Generate clusters and sequence

Ribosomal RNA subtraction RiboMinus

Mapping RNA-seq reads and quantifying transcripts

RNA-seq reads mapped to a reference genome Normalization : Reads per kilobase of feature length per million mapped reads (RPKM) Quantify expression of known genes (counting) Gene model level  composite of the whole gene vs constitutive Differences that we are seeing could be due to splicing  methods for isoform level expression values Transcriptome reconstruction  combination of Tophat,paired end tags What is a “feature?” What about genomes with poor genome annotation? What about species with no sequenced genome? For a detailed comparison of normalization methods, see: Bullard et al. BMC Bioinformatics 11:94 (2010). Robinson and Oshlack, Genome Biol 11:R25 (2010)

Quantifying gene expression by RNA-seq Use existing gene annotation: Align to genome plus annotated splices Depends on high-quality gene annotation Which annotation to use: RefSeq, GENCODE, UCSC? Isoform quantification? Identifying novel transcripts? Reference-guided alignments: Align to genome sequence Infer splice events from reads Allows transcriptome analyses of genomes with poor gene annotation De novo transcript assembly: Assemble transcripts directly from reads Allows transcriptome analyses of species without reference genomes

Composite gene model approach Map reads to genome Map remaining reads to known splice junctions Requires good gene models Isoforms are ignored

Which gene annotation to use?

Strategies for transcript assembly Garber et al. Nat Methods 8:469 (2011)

Splice-aware short read aligners Martin and Wang Nat Rev Genet 12:671 (2011)

Reference based transcript assembly Martin and Wang Nat Rev Genet 12:671 (2011)

Transcript assembly programs Martin and Wang Nat Rev Genet 12:671 (2011)

Cufflinks: ab initio transcript assembly Step 1: map reads to reference genome Trapnell et al. Nat. Biotechnology 28:511 (2010)

Cufflinks: ab initio transcript assembly Isoform abundances estimated by maximum likelihood Trapnell et al. Nat. Biotechnology 28:511 (2010)

Graph-based transcript assembly Martin and Wang Nat Rev Genet 12:671 (2011)

Graph-based transcript assembly Martin and Wang Nat Rev Genet 12:671 (2011)

Trinity: de novo transcript assembly Grabherr et al. Nat Biotechnol 29:644 (2011)

What depth of sequencing is required to characterize a transcriptome? Wang et al. Nat Rev Genet 10:57 (2009)

Considerations Gene length: Expression level: Long genes are detected before short genes Expression level: High expressors are detected before low expressors Complexity of the transcriptome: Tissues with many cell types require more sequencing Feature type Composite gene models Common isoforms Rare isoforms Detection vs. quantification Obtaining confident expression level estimates (e.g., “stable” RPKMs) requires greater coverage

Transcript detection is biased in favor of long genes Tarazona et al. Genome Res 21:2213 (2011)

Applications of mRNA-seq Characterizing transcriptome complexity Alternative splicing Differential expression analysis Gene- and isoform-level expression comparisons Novel RNA species lincRNAs and eRNAs Pervasive transcription Translation Ribosome profiling Allele-specific expression Effect of genetic variation on gene expression Imprinting RNA editing Novel events

Alternative isoform regulation in human tissue transcriptomes Wang et al Nature 456:470 (2008)

Diversity of alternative splicing events in human tissues Wang et al. Nature 456:470 (2008)

Differential expression Garber et al. Nat Methods 8:469 (2011)

Programs for identifying DE genes in RNA-seq datasets Assumed distribution for count data URL DESeq Negative binomial www-huber.embl.de/users/anders/DESeq/ DEGseq Poisson www.bioconductor.org/packages/2.6/bioc/html/DEGseq.html edgeR www.bioconductor.org/packages/release/bioc/html/edgeR.html baySeq www.bioconductor.org/packages/release/bioc/html/baySeq.html Cuffdiff cufflinks.cbcb.umd.edu/

Differential expression: Characterizing transcriptome dynamics during brain development Neuronal functions synaptic transmission cell adhesion Embryonic mouse cortex RNA-seq DEX Neuronal migration “Stemness” functions Cell cycle M phase Sox2, Oct4 Ayoub et al PNAS 1086:14950 (2011)

Differential expression: Characterizing transcriptome dynamics during brain development Embryonic mouse cortex Differential isoforms RNA-seq DE isoforms Ayoub et al PNAS 1086:14950 (2011)

Novel RNA species: annotating lincRNAs Guttman et al Nat Biotechnol 28:503 (2010)

Enhancer-associated RNAs (eRNAs) Neurons treated with KCL Kim et al Nature 465:182 (2010)

Enhancer-associated RNAs (eRNAs) Ren B. Nature 465:173 (2010)

How much of the genome is transcribed? van Bakel et al. PLoS Biol. 8:e1000371 (2010)

Exploiting sequence information in RNA-seq reads Majewski and Pastinen. Trends Genet 27:72 (2011)

Detecting variants that affect splicing Pickrell et al . Nature 464:768 (2010)

mRNA-seq applications Summary: mRNA-seq applications Quantify transcriptome complexity and compare across biological states Determine how transcriptomes are translated in different biological contexts Effect of genetic variation on gene expression Imprinting and RNA editing