RNA Quantitation from RNAseq Data

Slides:



Advertisements
Similar presentations
Functional Genomics with Next-Generation Sequencing
Advertisements

RNA-Seq based discovery and reconstruction of unannotated transcripts
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Exploring the Human Transcriptome
Processing of miRNA samples and primary data analysis
Simon v2.3 RNA-Seq Analysis Simon v2.3.
Peter Tsai Bioinformatics Institute, University of Auckland
DEG Mi-kyoung Seo.
RNA-seq: the future of transcriptomics ……. ?
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
Ribosomal Profiling Data Handling and Analysis
RNA-seq Analysis in Galaxy
mRNA-Seq: methods and applications
Brief workflow RNA is isolated from cells, fragmented at random positions, and copied into complementary DNA (cDNA). Fragments meeting a certain size specification.
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Todd J. Treangen, Steven L. Salzberg
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
Lecture 11. Microarray and RNA-seq II
RNA-Seq Analysis Simon V4.1.
RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
The iPlant Collaborative
No reference available
Lecture 12 RNA – seq analysis.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Statistics Behind Differential Gene Expression
Simon v RNA-Seq Analysis Simon v
RNAseq: a Closer Look at Read Mapping and Quantitation
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Volume 4, Issue 6, Pages e4 (June 2017)
Lab meeting
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO
Kallisto: near-optimal RNA seq quantification tool
Lecture 7. Topics in RNA Bioinformatics (Single-Cell RNA Sequencing)
Measuring transcriptomes with RNA-Seq
Tests for Two Means – Normal Populations
Sarah K. Whitley, William T. Horne, Jay K. Kolls 
Volume 4, Issue 6, Pages e4 (June 2017)
Learning to count: quantifying signal
Assessing changes in data – Part 2, Differential Expression with DESeq2
Working with RNA-Seq Data
The DHX15–CMTR1 interaction inhibits translation of a subset of genes.
Additional file 2: RNA-Seq data analysis pipeline
Quantitative analyses using RNA-seq data
Sequence Analysis - RNA-Seq 2
Analysis of RNA-Seq data Counting, normalization, and statistical tests for differential expression March 16, 2018 Dr. ir. Perry D. Moerland Bioinformatics.
Schematic representation of a transcriptomic evaluation approach.
On the Dependency of Cellular Protein Levels on mRNA Abundance
Volume 11, Issue 7, Pages (May 2015)
Rates, Ratios and Proportions
Differential Expression of RNA-Seq Data
Presentation transcript:

RNA Quantitation from RNAseq Data Jeremy Buhler for GEP Alumni Workshop

RNAseq Pipeline for Expression Analysis 37251 20653 9827 5121 RNAseq Read Count per Transcript Map reads to transcripts RNA Reads Extract and Sequence RNA Source RNA Abundance per Transcript Quantify

Topics for Today Read mapping and RNA quantitation from read counts are two key computational steps in RNAseq expression analysis. Mapping: how do we do it fast? Quantitation: how do we get from mapped reads to transcript abundance?

From mapped reads to transcript abundance More highly expressed transcripts should produce more RNA-Seq reads Read counts versus population of RNA transcripts Inferences we would like to make: Gene g is expressed at k copies per cell Gene g shows 2-fold higher expression than gene h Gene g shows 2-fold higher expression in sample 2 than in sample 1 How to make this statement quantitative? Explore relationship between read counts and the population of RNA transcripts Which of these inferences are feasible?

Simple model for RNA transcript abundance Sample contains n different transcripts Transcript i appears ci times in the sample Sample has M RNA reads with length s Effective length: s-1 The start position of a mapped read (with length s) cannot be at the last s-1 positions of the transcript Assume reads can map to every position in the transcript except the end Assume each start position is equally likely Number of times transcript appears in sample * effective length = C Number of possible start positions across all RNA molecules in a sample:

Number of mapped reads versus number of copies of transcript i fi = fraction of all starting positions for a mapped read that lie within a copy of transcript i Relationship between fi and ci: Every transcript in the sample has the same constant of proportionality: Expect more reads to map to longer transcripts C = number of possible start positions in all RNA molecules in a sample Proportional to number of copies of the transcript in the sample

Estimate transcript abundance from the number of mapped reads mi = number of RNAseq reads in the sample mapped to transcript i M = total number of RNAseq reads is a good estimator of : Estimate from RNAseq data: Model assumes each read is sampled randomly among all feasible positions Hence m_i/M is a good estimator of f_i

Scaled version of Ri = RPKM RPKM = Reads Per Kilobase of transcript i per Million reads sampled: Use RPKM to compare abundance of two transcripts within a sample Ratio of abundances for transcripts i and j:

RPKM values for the same transcript are not comparable across multiple samples The same Ri values in two different samples could correspond to two different counts (ci) The constant of proportionality (C) depends on the quantities of other RNAs in the sample

Estimate abundance using TPM ti = fraction of all RNA molecules in a sample that are copies of transcript i By definition: Estimate ti from RNAseq read counts: Multiply Ti by 106 TPM = copies of Transcript i Per Million RNA molecules Normalize for gene length first, then by sequencing depth = TPM Normalize for sequencing depth, then by gene length = RPKM Measure of transcript abundance that is more meaningful across samples: TPM m_i/M is a good estimator of f_i

Is TPM better than RPKM? TPM is no better than RPKM for comparing transcript abundance within a sample TPM are better than RPKM for comparisons across samples: Show transcript i forms a larger or smaller fraction of all transcripts in sample 2 than in sample 1 One possibility is to use spike-in controls with known copy numbers of each transcript for calibration TPM does not provide the absolute number of copies of transcript i (ci) in a sample

Differential expression analysis tools Examples: DESeq2, edgeR Uses different modeling approach that compares raw read counts Normalize by sequencing depth per sample Often use the negative binomial distribution as the reference distribution

Additional considerations for RNA quantitation For paired-end reads, count fragments instead of reads (i.e., FPKM) Model fragment lengths distribution Biases due to library construction, sequencing Multi-mapped reads Different isoforms, conserved domains, repeats Unmapped reads Incomplete transcriptome / genome Sequencing errors, polymorphisms Tools such as RSEM attempts to take these factors into account