Differential Expression from RNA-seq

Slides:



Advertisements
Similar presentations
RNA-Seq as a Discovery Tool
Advertisements

RNA-seq library prep introduction
RNA-Seq based discovery and reconstruction of unannotated transcripts
RNAseq.
12/04/2017 RNA seq (I) Edouard Severing.
Simon v2.3 RNA-Seq Analysis Simon v2.3.
Peter Tsai Bioinformatics Institute, University of Auckland
RNA-seq: the future of transcriptomics ……. ?
Transcriptome Assembly and Quantification from Ion Torrent RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Transcriptomics Jim Noonan GENE 760.
at the Single-Cell Level
mRNA-Seq: methods and applications
RNA-Seq and RNA Structure Prediction
Brief workflow RNA is isolated from cells, fragmented at random positions, and copied into complementary DNA (cDNA). Fragments meeting a certain size specification.
Li and Dewey BMC Bioinformatics 2011, 12:323
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
Introduction to DESeq and edgeR packages Peter A.C. ’t Hoen.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
RNA-Seq Analysis Simon V4.1.
Adrian Caciula Department of Computer Science Georgia State University Joint work with Serghei Mangul (UCLA) Ion Mandoiu (UCONN) Alex Zelikovsky (GSU)
The iPlant Collaborative
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
RNA-seq: Quantifying the Transcriptome
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
The iPlant Collaborative
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Library QA & QC Day 1, Video 3
Canadian Bioinformatics Workshops
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
RNA-Seq Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Statistics Behind Differential Gene Expression
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on
Simon v RNA-Seq Analysis Simon v
Presenter: Zheng “Alex” Fu, Ph.D. LIAI, Bioinformatics Core
RNA Quantitation from RNAseq Data
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Placental Bioinformatics
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Why weight? Variance modelling for designed RNA-seq experiments
Moderní metody analýzy genomu
Gene expression from RNA-Seq
RNA-Seq analysis in R (Bioconductor)
The RNA-Seq Bid Idea: Statistical Design and Analysis for RNA Sequencing Data The RNA-Seq Big Idea Team: Yaqing Zhao1,2, Erika Cule1†, Andrew Gehman1,
Gene expression.
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Kallisto: near-optimal RNA seq quantification tool
Computational Methods for Analysis of Single Cell RNA-Seq Data
Lecture 7. Topics in RNA Bioinformatics (Single-Cell RNA Sequencing)
Design and Analysis of Single-Cell Sequencing Experiments
Analysing ChIP-Seq Data
Gene expression estimation from RNA-Seq data
Comparative Analysis of Single-Cell RNA Sequencing Methods
Reference based assembly
From: TopHat: discovering splice junctions with RNA-Seq
Transcriptome analysis
Maximize read usage through mapping strategies
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs BMI/CS Spring 2019 Colin Dewey
Assessing changes in data – Part 2, Differential Expression with DESeq2
Working with RNA-Seq Data
Volume 7, Issue 3, Pages e12 (September 2018)
Quantitative analyses using RNA-seq data
Sequence Analysis - RNA-Seq 2
Schematic representation of a transcriptomic evaluation approach.
Single cell RNAseq Kathie Mihindukulasuriya, PhD
Differential Expression of RNA-Seq Data
The Technology and Biology of Single-Cell RNA Sequencing
Presentation transcript:

Differential Expression from RNA-seq X. Shirley Liu STAT115/215, BIO/BST282

Sequencing Read Distribution The number of patients arriving in an emergency room between 10 and 11 pm # Reads mapped to a gene of 1KB long Poisson dist λ average events per interval K # events in an interval Var = mean = λ

Sequencing Read Distribution In reality, sequencing data is over-dispersed (Mean<Variance) Negative binomial NB(r, p) # of success before the first r failure, if Pb(succ) is p

Modeling Read Over Dispersion Variance estimated by borrowing information from all the genes – hierarchical models Test whether gene i expression follows same NB() between 2 conditions FDR?

Fold Change with Var Shrinkage shrinkage is not equal. strong moderation for low information genes: low counts almost no shrinkage noisy estimates due to low counts large FDR from the statistical model, but we shouldn't trust the estimate itself

Splicing Transcripts Assign reads to splice isoforms (TopHat)

Reference-based assembly Transcript Assembly Reference-based assembly Cufflinks De novo assembly Trinity

Isoform Inference If given known set of isoforms Estimate x to maximize the likelihood of observing n

Known Isoform Abundance Inference

Identification of Differential Splicing Between RNA-seq Samples Most differential splicing detection algorithms call differentially expressed exons, not whole transcripts, esp for novel splicing

Splicing Isoform Inference With known isoform set, sometimes the gene-level expression level inference is great, although isoform abundances might have uncertainty (e.g. known set incomplete) De novo method are usually better at detecting differential exon splicing, but not whole transcripts De novo isoform inference is a non-identifiable problem if RNA-seq reads are short and gene is long with too many exons Experimental validation of quantitative differential splicing is still quite hard

Active Field HISAT2 for fast alignment Kallisto and Sleuth Hierarchical index https://ccb.jhu.edu/software/hisat2/index.shtml Kallisto and Sleuth Kallisto TPM, Sleuth differential expression Known genes and transcripts https://scilifelab.github.io/courses/rnaseq/labs/kallisto

Summary Break RNA-seq design considerations Read mapping: BWA, STAR Quality control: RSeQC Expression index: R/FPKM and TPM Differential expression: LIMMA-VOOM and DESeq Transcriptome assembly: Cufflinks, Trinity Alternative splicing: r/MATs New developments: HISAT2, Kallisto and Sleuth Break

Single Cell RNA-seq

Why Single-Cell RNA-seq? Heterogeneous cell populations Kolodziejczyk et al, Mol Cell 2015

Why Single-Cell RNA-seq?

Two General Approaches From Ziegenhain et al. 2017

Drop-Seq From Macosko et al. 2015 Drop-seq overview. Cells mix with reagents in a droplet. RNA attaches to particle with specific barcode, etc, etc. From Macosko et al. 2015

Variations cDNA conversion rate: 2-25% Droplet size Reagent concentration Cell ct & dilution PCR efficiency UMI controls over amplification of one transcript

Sequencing Results PE seq $$$, one read has cell barcode, UMI and polyA Compress all transcripts with the same barcode and same UMI into 1 From Macosko et al. 2015

SMART-based vs Droplet-based Fresh cells One-cell at a time Small cell population Lower dropout Cell barcode Full length Transcripts / cell higher Per cell transcription more accurate $$$ Droplet-based Fresh cells All droplets together Higher dropout Cell barcode UMI for PCR bias correction 3’ bias Transcripts / cell lower Per cluster transcription more accurate $$$

Potential Applications Understand stem cell differentiation or state transition Map heterogeneity in complex tissue type (tumor / brain / blood, etc) Identify new cell types with new functions Stochastic and dynamic responses to perturbation … Break

Quality Control

Dropouts Kharchenko, et al, Nat Meth 2014; Zheng et al, Nat Comm 2017

From Kolodziejczyk et al. 2015 In each single cell, we observe variations in the gene counts. A good proportion of the variation doesn’t help us discover biology. UMIs discussed more on other slide; transcription kinetics are largely unknown. Multiple methods have been proposed for cell cycle adjustment but have had limited success From Kolodziejczyk et al. 2015

Visualizing scRNA-seq data t-distributed stochastic neighbor embedding (tSNE) New dimension reduction method Preserve pair-wise distance, but focus on points close by Distant between far-away clusters don’t matter Colors are manually labeled Density should be labeled Non-deterministic

Reconstruction of Retinal Cell Types PCA ~14K high quality cells from 44K sequenced cells T-SNE on 32 statistically significant PCA Density based clustering

Checking Batch Effect Single cells from different days

Seurat is an R package designed for QC, analysis, and exploration of single cell RNA-seq data Satija et al., Nat. Biotech. 2015

Summary SMART-based vs Droplet-based single-cell sequencing Barcode and UMI Dropout modeling tSNE for visualization

Acknowledgement Wei Li Michael Love Alisha Holloway Simon Andrews Radhika Khetani Chengzhong Zhang Etai Jacob Caleb Lareau Luca Pinello Assieh Saadatpour