Sequence Analysis - RNA-Seq 2

Slides:

Advertisements

Similar presentations

RNA-Seq as a Discovery Tool

Advertisements

RNA-seq library prep introduction

RNA-Seq based discovery and reconstruction of unannotated transcripts

12/04/2017 RNA seq (I) Edouard Severing.

Peter Tsai Bioinformatics Institute, University of Auckland

DEG Mi-kyoung Seo.

RNA-seq: the future of transcriptomics ……. ?

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Transcriptomics Jim Noonan GENE 760.

mRNA-Seq: methods and applications

RNA-Seq and RNA Structure Prediction

Li and Dewey BMC Bioinformatics 2011, 12:323

Expression Analysis of RNA-seq Data

Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

RNAseq analyses -- methods

RNA-Seq Analysis Simon V4.1.

Transcriptome Analysis

RNA-seq workshop ALIGNMENT

Verna Vu & Timothy Abreo

RNA-seq workshop COUNTING & HTSEQ Erin Osborne Nishimura.

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.

Introduction to RNAseq

The iPlant Collaborative

RNA-seq: Quantifying the Transcriptome

TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.

The iPlant Collaborative

Lecture 12 RNA – seq analysis.

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.

Canadian Bioinformatics Workshops

RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.

RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Statistics Behind Differential Gene Expression

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on

Simon v RNA-Seq Analysis Simon v

3rd Internal RECESS workshop Caroline C. Friedel

RNA Quantitation from RNAseq Data

An Introduction to RNA-Seq Data and Differential Expression Tools in R

RNA-Seq for the Next Generation RNA-Seq Intro Slides

Moderní metody analýzy genomu

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

VCF format: variants c.f. S. Brown NYU

Gene expression from RNA-Seq

RNA-Seq analysis in R (Bioconductor)

Gene expression.

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

High-Throughput Analysis of Genomic Data [S7] ENRIQUE BLANCO

Kallisto: near-optimal RNA seq quantification tool

Differential Expression from RNA-seq

Gene expression estimation from RNA-Seq data

Sequence Analysis 2- RNA-Seq

Reference based assembly

Transcriptome analysis

Learning to count: quantifying signal

Alternative Splicing QTLs in European and African Populations

Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey

Assessing changes in data – Part 2, Differential Expression with DESeq2

Quantitative analyses using RNA-seq data

Introduction to RNA-seq

BF528 - Sequence Analysis Fundamentals

Schematic representation of a transcriptomic evaluation approach.

Sequence Analysis - RNA-Seq 1

Volume 11, Issue 7, Pages (May 2015)

Differential Expression of RNA-Seq Data

RNA-Seq Data Analysis UND Genomics Core.

Presentation transcript:

Sequence Analysis - RNA-Seq 2

RNA-Seq Identify sequence of RNA molecules “Unbiased” - possible to sequence any molecule in sample Molecules sequenced in proportion to relative abundance in sample Most often used for gene abundance estimation

Common Types of RNA-Seq Analyses Sequence based Transcriptome reconstruction Splicing analysis Gene fusion discovery Coding variants Abundance based Differential expression Allele-specific expression (with genotyping)

Transcriptome reconstruction Given RNA fragments, recover original transcript Two approaches: De novo - no reference transcriptome available Reference or genome guided - reference available Very challenging with short reads!

De novo transcriptome reconstruction

De novo transcriptome reconstruction

Concept: Spliced Alignment Coding sequences interrupted by introns mRNA molecules have introns excised Some reads span splice junctions Spliced alignment aligns junction or spliced reads https://discoveringthegenome.org/discovering-genome/rna-sequencing-up-close-data/spliced-alignment

Concept: Spliced Alignment Splicing-aware programs: STAR, TopHat Splicing unaware programs: bwa, bowtie, most others De novo - use only genome sequence Transcriptome guided - use known splice junctions to guide alignment

Concept: mRNA Isoforms Isoform: pattern of exons/transcribed sequence

Alternative Splicing (AS) Analysis Detect and/or quantify isoforms Different AS types Examine pattern of exons in spliced reads Methods: Whippet MISO rMATS IRFinder, and many others https://www.ncbi.nlm.nih.gov/books/NBK19730/figure/A1150/

Alternative Splicing (AS) Analysis Read support: # of reads containing splice junction Grey areas → overall aligned read depth Black areas → spliced reads These have minimum 10 supporting reads per splicing event https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0141298

Reference guided reconstruction

Reference guided reconstruction

Gene Expression (RNA Abundance) Measure relative abundance of RNA species # reads mapping to a gene is proportional to # of molecules transcribed Example: Gene A = 5, Gene B = 10, Gene C = 10 reads Gene A is about half as abundant Gene B Gene A and Gene C have about the same abundance Gene A Gene B Gene C

Bag of Fragments Analogy Reads are samples drawn from the distribution of all RNA fragments Drawn in proportion to frequency High abundance transcripts drawn frequently Low abundance transcripts might not be drawn at all (black read) More reads sequenced → more chance to draw low abundance transcripts Absence of evidence is not evidence of absence! Metaphorical bag All RNA Fragments (billions and billions)

RNA-Seq vs Microarray RNA-Seq compared to microarray: “Unbiased” - de novo sequences Larger dynamic range Information rich More complex data Much larger data More expensive

Dynamic Range - Detection Limits Low depth RNA-Seq High depth RNA-Seq Microarray

Gene Expression Analysis Strategies Align+count Explicit alignment against genome Count reads aligned to known loci (e.g. genes) Quantify “All-in-one” approach Quasi-alignment (i.e. “good enough” alignment) against transcriptome only Statistical model estimates abundance

RNA-Seq Analysis: Align and Count Align against reference genome Compare alignments with annotation Count # of reads within desired features (e.g. exons, coding sequence) Sum to transcript or gene level Read counts are estimates of abundance

Concept: Multimapping Reads Paralogs, repetitive sequence may transcribe identical RNA Multimapping read: read that maps equally well to multiple loci Can cause abundance estimation bias Mitigate by: Filtering out multimappers Limit reads to aligning to a maximum # of loci (e.g. 1, 10) Multimap resolution methods (e.g. mmr, ORMAN)

Abundance Estimation (Quantification) Quasi- (or pseudo-)alignment: map reads to sequences without explicit alignment Must build reference transcriptome Statistical model performs abundance inference Handles multimapping reads implicitly Similar accuracy, faster than align+count

Expression Analysis Strategies Summary Align+count: Spliced alignment methods: STAR, TopHat Counting methods: htseq, featurecounts, VERSE Advantages: flexible, accurate, whole genome Disadvantages: slow, many parameters to choose Quantify Methods: salmon, kallisto Advantage: very fast, accurate, handles multimaps Disadvantage: transcriptome only

Differential Expression (DE) Start with expression matrix of genes x sample counts/estimates Which genes have counts associated with variable of interest, e.g. case vs control? Each gene will have significance (e.g. p-value) Effect size (e.g. log2 fold change)

Concept: Count Normalization # of reads differ between libraries Counts proportional to library size Must be normalized for samples to be comparable Un-normalized counts called raw counts

Count Normalization Strategies DESeq2 - median of geometric mean count ratio FPKM/RPKM - fragments (reads) per kilobase per million reads Divide each gene count by length of gene*10^6 Others proposed, these two most common Dillies, M.-A. et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief Bioinform 14, 671–683 (2013). http://bib.oxfordjournals.org/content/14/6/671.full

Modeling Count Data Count data are not normally distributed: Non-negative integers Mean-Variance dependence Long upper tail Modeled as Negative Binomial Distributed Negative Binomial Regression Utilized for DE

Differential Expression Methods Current state of the art: DESeq2 (Negative Binomial Regression) edgeR (Negative Binomial Regression) Count transformation + limma (linear regression) Deprecated: Cufflinks (Negative Binomial Regression) These methods perform normalization