Differential Expression from RNA-seq

Slides:

Advertisements

Similar presentations

RNA-Seq as a Discovery Tool

Advertisements

RNA-seq library prep introduction

RNA-Seq based discovery and reconstruction of unannotated transcripts

12/04/2017 RNA seq (I) Edouard Severing.

Simon v2.3 RNA-Seq Analysis Simon v2.3.

Peter Tsai Bioinformatics Institute, University of Auckland

RNA-seq: the future of transcriptomics ……. ?

Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.

Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Transcriptomics Jim Noonan GENE 760.

at the Single-Cell Level

mRNA-Seq: methods and applications

RNA-Seq and RNA Structure Prediction

Brief workflow RNA is isolated from cells, fragmented at random positions, and copied into complementary DNA (cDNA). Fragments meeting a certain size specification.

Li and Dewey BMC Bioinformatics 2011, 12:323

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

Introduction to DESeq and edgeR packages Peter A.C. ’t Hoen.

Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,

RNA-Seq Analysis Simon V4.1.

Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)

The iPlant Collaborative

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.

Introduction to RNAseq

RNA-seq: Quantifying the Transcriptome

TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.

The iPlant Collaborative

CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops

Library QA & QC Day 1, Video 3

Canadian Bioinformatics Workshops

RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.

RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520

Statistics Behind Differential Gene Expression

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on

Simon v RNA-Seq Analysis Simon v

Presenter: Zheng “Alex” Fu, Ph.D. LIAI, Bioinformatics Core

RNA Quantitation from RNAseq Data

An Introduction to RNA-Seq Data and Differential Expression Tools in R

Placental Bioinformatics

Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017

Why weight? Variance modelling for designed RNA-seq experiments

Moderní metody analýzy genomu

Gene expression from RNA-Seq

RNA-Seq analysis in R (Bioconductor)

The RNA-Seq Bid Idea: Statistical Design and Analysis for RNA Sequencing Data The RNA-Seq Big Idea Team: Yaqing Zhao1,2, Erika Cule1†, Andrew Gehman1,

Gene expression.

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

Kallisto: near-optimal RNA seq quantification tool

Computational Methods for Analysis of Single Cell RNA-Seq Data

Lecture 7. Topics in RNA Bioinformatics (Single-Cell RNA Sequencing)

Design and Analysis of Single-Cell Sequencing Experiments

Analysing ChIP-Seq Data

Gene expression estimation from RNA-Seq data

Comparative Analysis of Single-Cell RNA Sequencing Methods

Reference based assembly

From: TopHat: discovering splice junctions with RNA-Seq

Transcriptome analysis

Maximize read usage through mapping strategies

Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey

Assessing changes in data – Part 2, Differential Expression with DESeq2

Working with RNA-Seq Data

Volume 7, Issue 3, Pages e12 (September 2018)

Quantitative analyses using RNA-seq data

Sequence Analysis - RNA-Seq 2

Schematic representation of a transcriptomic evaluation approach.

Single cell RNAseq Kathie Mihindukulasuriya, PhD

Differential Expression of RNA-Seq Data

The Technology and Biology of Single-Cell RNA Sequencing

Presentation transcript:

Differential Expression from RNA-seq X. Shirley Liu STAT115/215, BIO/BST282

Sequencing Read Distribution The number of patients arriving in an emergency room between 10 and 11 pm # Reads mapped to a gene of 1KB long Poisson dist λ average events per interval K # events in an interval Var = mean = λ

Sequencing Read Distribution In reality, sequencing data is over-dispersed (Mean<Variance) Negative binomial NB(r, p) # of success before the first r failure, if Pb(succ) is p

Modeling Read Over Dispersion Variance estimated by borrowing information from all the genes – hierarchical models Test whether gene i expression follows same NB() between 2 conditions FDR?

Fold Change with Var Shrinkage shrinkage is not equal. strong moderation for low information genes: low counts almost no shrinkage noisy estimates due to low counts large FDR from the statistical model, but we shouldn't trust the estimate itself

Splicing Transcripts Assign reads to splice isoforms (TopHat)

Reference-based assembly Transcript Assembly Reference-based assembly Cufflinks De novo assembly Trinity

Isoform Inference If given known set of isoforms Estimate x to maximize the likelihood of observing n

Known Isoform Abundance Inference

Identification of Differential Splicing Between RNA-seq Samples Most differential splicing detection algorithms call differentially expressed exons, not whole transcripts, esp for novel splicing

Splicing Isoform Inference With known isoform set, sometimes the gene-level expression level inference is great, although isoform abundances might have uncertainty (e.g. known set incomplete) De novo method are usually better at detecting differential exon splicing, but not whole transcripts De novo isoform inference is a non-identifiable problem if RNA-seq reads are short and gene is long with too many exons Experimental validation of quantitative differential splicing is still quite hard

Active Field HISAT2 for fast alignment Kallisto and Sleuth Hierarchical index https://ccb.jhu.edu/software/hisat2/index.shtml Kallisto and Sleuth Kallisto TPM, Sleuth differential expression Known genes and transcripts https://scilifelab.github.io/courses/rnaseq/labs/kallisto

Summary Break RNA-seq design considerations Read mapping: BWA, STAR Quality control: RSeQC Expression index: R/FPKM and TPM Differential expression: LIMMA-VOOM and DESeq Transcriptome assembly: Cufflinks, Trinity Alternative splicing: r/MATs New developments: HISAT2, Kallisto and Sleuth Break

Single Cell RNA-seq

Why Single-Cell RNA-seq? Heterogeneous cell populations Kolodziejczyk et al, Mol Cell 2015

Why Single-Cell RNA-seq?

Two General Approaches From Ziegenhain et al. 2017

Drop-Seq From Macosko et al. 2015 Drop-seq overview. Cells mix with reagents in a droplet. RNA attaches to particle with specific barcode, etc, etc. From Macosko et al. 2015

Variations cDNA conversion rate: 2-25% Droplet size Reagent concentration Cell ct & dilution PCR efficiency UMI controls over amplification of one transcript

Sequencing Results PE seq $$$, one read has cell barcode, UMI and polyA Compress all transcripts with the same barcode and same UMI into 1 From Macosko et al. 2015

SMART-based vs Droplet-based Fresh cells One-cell at a time Small cell population Lower dropout Cell barcode Full length Transcripts / cell higher Per cell transcription more accurate $$$ Droplet-based Fresh cells All droplets together Higher dropout Cell barcode UMI for PCR bias correction 3’ bias Transcripts / cell lower Per cluster transcription more accurate $$$

Potential Applications Understand stem cell differentiation or state transition Map heterogeneity in complex tissue type (tumor / brain / blood, etc) Identify new cell types with new functions Stochastic and dynamic responses to perturbation … Break

Quality Control

Dropouts Kharchenko, et al, Nat Meth 2014; Zheng et al, Nat Comm 2017

From Kolodziejczyk et al. 2015 In each single cell, we observe variations in the gene counts. A good proportion of the variation doesn’t help us discover biology. UMIs discussed more on other slide; transcription kinetics are largely unknown. Multiple methods have been proposed for cell cycle adjustment but have had limited success From Kolodziejczyk et al. 2015

Visualizing scRNA-seq data t-distributed stochastic neighbor embedding (tSNE) New dimension reduction method Preserve pair-wise distance, but focus on points close by Distant between far-away clusters don’t matter Colors are manually labeled Density should be labeled Non-deterministic

Reconstruction of Retinal Cell Types PCA ~14K high quality cells from 44K sequenced cells T-SNE on 32 statistically significant PCA Density based clustering

Checking Batch Effect Single cells from different days

Seurat is an R package designed for QC, analysis, and exploration of single cell RNA-seq data Satija et al., Nat. Biotech. 2015

Summary SMART-based vs Droplet-based single-cell sequencing Barcode and UMI Dropout modeling tSNE for visualization

Acknowledgement Wei Li Michael Love Alisha Holloway Simon Andrews Radhika Khetani Chengzhong Zhang Etai Jacob Caleb Lareau Luca Pinello Assieh Saadatpour