Sequence Analysis - RNA-Seq 2

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
RNA-Seq based discovery and reconstruction of unannotated transcripts
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Peter Tsai Bioinformatics Institute, University of Auckland
DEG Mi-kyoung Seo.
RNA-seq: the future of transcriptomics ……. ?
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
mRNA-Seq: methods and applications
RNA-Seq and RNA Structure Prediction
Li and Dewey BMC Bioinformatics 2011, 12:323
Expression Analysis of RNA-seq Data
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
RNAseq analyses -- methods
RNA-Seq Analysis Simon V4.1.
Transcriptome Analysis
RNA-seq workshop ALIGNMENT
Verna Vu & Timothy Abreo
RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
The iPlant Collaborative
RNA-seq: Quantifying the Transcriptome
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
The iPlant Collaborative
Lecture 12 RNA – seq analysis.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Statistics Behind Differential Gene Expression
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Simon v RNA-Seq Analysis Simon v
3rd Internal RECESS workshop Caroline C. Friedel
RNA Quantitation from RNAseq Data
An Introduction to RNA-Seq Data and Differential Expression Tools in R
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Moderní metody analýzy genomu
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
VCF format: variants c.f. S. Brown NYU
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
Gene expression.
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO
Kallisto: near-optimal RNA seq quantification tool
Differential Expression from RNA-seq
Gene expression estimation from RNA-Seq data
Sequence Analysis 2- RNA-Seq
Reference based assembly
Transcriptome analysis
Learning to count: quantifying signal
Alternative Splicing QTLs in European and African Populations
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Assessing changes in data – Part 2, Differential Expression with DESeq2
Quantitative analyses using RNA-seq data
Introduction to RNA-seq
BF528 - Sequence Analysis Fundamentals
Schematic representation of a transcriptomic evaluation approach.
Sequence Analysis - RNA-Seq 1
Volume 11, Issue 7, Pages (May 2015)
Differential Expression of RNA-Seq Data
RNA-Seq Data Analysis UND Genomics Core.
Presentation transcript:

Sequence Analysis - RNA-Seq 2

RNA-Seq Identify sequence of RNA molecules “Unbiased” - possible to sequence any molecule in sample Molecules sequenced in proportion to relative abundance in sample Most often used for gene abundance estimation

Common Types of RNA-Seq Analyses Sequence based Transcriptome reconstruction Splicing analysis Gene fusion discovery Coding variants Abundance based Differential expression Allele-specific expression (with genotyping)

Transcriptome reconstruction Given RNA fragments, recover original transcript Two approaches: De novo - no reference transcriptome available Reference or genome guided - reference available Very challenging with short reads!

De novo transcriptome reconstruction

De novo transcriptome reconstruction

Concept: Spliced Alignment Coding sequences interrupted by introns mRNA molecules have introns excised Some reads span splice junctions Spliced alignment aligns junction or spliced reads https://discoveringthegenome.org/discovering-genome/rna-sequencing-up-close-data/spliced-alignment

Concept: Spliced Alignment Splicing-aware programs: STAR, TopHat Splicing unaware programs: bwa, bowtie, most others De novo - use only genome sequence Transcriptome guided - use known splice junctions to guide alignment

Concept: mRNA Isoforms Isoform: pattern of exons/transcribed sequence

Alternative Splicing (AS) Analysis Detect and/or quantify isoforms Different AS types Examine pattern of exons in spliced reads Methods: Whippet MISO rMATS IRFinder, and many others https://www.ncbi.nlm.nih.gov/books/NBK19730/figure/A1150/

Alternative Splicing (AS) Analysis Read support: # of reads containing splice junction Grey areas → overall aligned read depth Black areas → spliced reads These have minimum 10 supporting reads per splicing event https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141298

Reference guided reconstruction

Reference guided reconstruction

Gene Expression (RNA Abundance) Measure relative abundance of RNA species # reads mapping to a gene is proportional to # of molecules transcribed Example: Gene A = 5, Gene B = 10, Gene C = 10 reads Gene A is about half as abundant Gene B Gene A and Gene C have about the same abundance Gene A Gene B Gene C

Bag of Fragments Analogy Reads are samples drawn from the distribution of all RNA fragments Drawn in proportion to frequency High abundance transcripts drawn frequently Low abundance transcripts might not be drawn at all (black read) More reads sequenced → more chance to draw low abundance transcripts Absence of evidence is not evidence of absence! Metaphorical bag All RNA Fragments (billions and billions)

RNA-Seq vs Microarray RNA-Seq compared to microarray: “Unbiased” - de novo sequences Larger dynamic range Information rich More complex data Much larger data More expensive

Dynamic Range - Detection Limits Low depth RNA-Seq High depth RNA-Seq Microarray

Gene Expression Analysis Strategies Align+count Explicit alignment against genome Count reads aligned to known loci (e.g. genes) Quantify “All-in-one” approach Quasi-alignment (i.e. “good enough” alignment) against transcriptome only Statistical model estimates abundance

RNA-Seq Analysis: Align and Count Align against reference genome Compare alignments with annotation Count # of reads within desired features (e.g. exons, coding sequence) Sum to transcript or gene level Read counts are estimates of abundance

Concept: Multimapping Reads Paralogs, repetitive sequence may transcribe identical RNA Multimapping read: read that maps equally well to multiple loci Can cause abundance estimation bias Mitigate by: Filtering out multimappers Limit reads to aligning to a maximum # of loci (e.g. 1, 10) Multimap resolution methods (e.g. mmr, ORMAN)

Abundance Estimation (Quantification) Quasi- (or pseudo-)alignment: map reads to sequences without explicit alignment Must build reference transcriptome Statistical model performs abundance inference Handles multimapping reads implicitly Similar accuracy, faster than align+count

Expression Analysis Strategies Summary Align+count: Spliced alignment methods: STAR, TopHat Counting methods: htseq, featurecounts, VERSE Advantages: flexible, accurate, whole genome Disadvantages: slow, many parameters to choose Quantify Methods: salmon, kallisto Advantage: very fast, accurate, handles multimaps Disadvantage: transcriptome only

Differential Expression (DE) Start with expression matrix of genes x sample counts/estimates Which genes have counts associated with variable of interest, e.g. case vs control? Each gene will have significance (e.g. p-value) Effect size (e.g. log2 fold change)

Concept: Count Normalization # of reads differ between libraries Counts proportional to library size Must be normalized for samples to be comparable Un-normalized counts called raw counts

Count Normalization Strategies DESeq2 - median of geometric mean count ratio FPKM/RPKM - fragments (reads) per kilobase per million reads Divide each gene count by length of gene*10^6 Others proposed, these two most common Dillies, M.-A. et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 14, 671–683 (2013). http://bib.oxfordjournals.org/content/14/6/671.full

Modeling Count Data Count data are not normally distributed: Non-negative integers Mean-Variance dependence Long upper tail Modeled as Negative Binomial Distributed Negative Binomial Regression Utilized for DE

Differential Expression Methods Current state of the art: DESeq2 (Negative Binomial Regression) edgeR (Negative Binomial Regression) Count transformation + limma (linear regression) Deprecated: Cufflinks (Negative Binomial Regression) These methods perform normalization