Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Expression from RNA-seq

Similar presentations


Presentation on theme: "Differential Expression from RNA-seq"— Presentation transcript:

1 Differential Expression from RNA-seq
X. Shirley Liu STAT115/215, BIO/BST282

2 Sequencing Read Distribution
The number of patients arriving in an emergency room between 10 and 11 pm # Reads mapped to a gene of 1KB long Poisson dist λ average events per interval K # events in an interval Var = mean = λ

3 Sequencing Read Distribution
In reality, sequencing data is over-dispersed (Mean<Variance) Negative binomial NB(r, p) # of success before the first r failure, if Pb(succ) is p

4 Modeling Read Over Dispersion
Variance estimated by borrowing information from all the genes – hierarchical models Test whether gene i expression follows same NB() between 2 conditions FDR?

5 Fold Change with Var Shrinkage
shrinkage is not equal. strong moderation for low information genes: low counts almost no shrinkage noisy estimates due to low counts large FDR from the statistical model, but we shouldn't trust the estimate itself

6 Splicing Transcripts Assign reads to splice isoforms (TopHat)

7 Reference-based assembly
Transcript Assembly Reference-based assembly Cufflinks De novo assembly Trinity

8 Isoform Inference If given known set of isoforms
Estimate x to maximize the likelihood of observing n

9 Known Isoform Abundance Inference

10 Identification of Differential Splicing Between RNA-seq Samples
Most differential splicing detection algorithms call differentially expressed exons, not whole transcripts, esp for novel splicing

11 Splicing Isoform Inference
With known isoform set, sometimes the gene-level expression level inference is great, although isoform abundances might have uncertainty (e.g. known set incomplete) De novo method are usually better at detecting differential exon splicing, but not whole transcripts De novo isoform inference is a non-identifiable problem if RNA-seq reads are short and gene is long with too many exons Experimental validation of quantitative differential splicing is still quite hard

12 Active Field HISAT2 for fast alignment Kallisto and Sleuth
Hierarchical index Kallisto and Sleuth Kallisto TPM, Sleuth differential expression Known genes and transcripts

13 Summary Break RNA-seq design considerations Read mapping: BWA, STAR
Quality control: RSeQC Expression index: R/FPKM and TPM Differential expression: LIMMA-VOOM and DESeq Transcriptome assembly: Cufflinks, Trinity Alternative splicing: r/MATs New developments: HISAT2, Kallisto and Sleuth Break

14 Single Cell RNA-seq

15 Why Single-Cell RNA-seq?
Heterogeneous cell populations Kolodziejczyk et al, Mol Cell 2015

16 Why Single-Cell RNA-seq?

17 Two General Approaches
From Ziegenhain et al. 2017

18 Drop-Seq From Macosko et al. 2015
Drop-seq overview. Cells mix with reagents in a droplet. RNA attaches to particle with specific barcode, etc, etc. From Macosko et al. 2015

19 Variations cDNA conversion rate: 2-25% Droplet size
Reagent concentration Cell ct & dilution PCR efficiency UMI controls over amplification of one transcript

20 Sequencing Results PE seq $$$, one read has cell barcode, UMI and polyA Compress all transcripts with the same barcode and same UMI into 1 From Macosko et al. 2015

21 SMART-based vs Droplet-based
Fresh cells One-cell at a time Small cell population Lower dropout Cell barcode Full length Transcripts / cell higher Per cell transcription more accurate $$$ Droplet-based Fresh cells All droplets together Higher dropout Cell barcode UMI for PCR bias correction 3’ bias Transcripts / cell lower Per cluster transcription more accurate $$$

22 Potential Applications
Understand stem cell differentiation or state transition Map heterogeneity in complex tissue type (tumor / brain / blood, etc) Identify new cell types with new functions Stochastic and dynamic responses to perturbation Break

23 Quality Control

24 Dropouts Kharchenko, et al, Nat Meth 2014; Zheng et al, Nat Comm 2017

25 From Kolodziejczyk et al. 2015
In each single cell, we observe variations in the gene counts. A good proportion of the variation doesn’t help us discover biology. UMIs discussed more on other slide; transcription kinetics are largely unknown. Multiple methods have been proposed for cell cycle adjustment but have had limited success From Kolodziejczyk et al. 2015

26 Visualizing scRNA-seq data
t-distributed stochastic neighbor embedding (tSNE) New dimension reduction method Preserve pair-wise distance, but focus on points close by Distant between far-away clusters don’t matter Colors are manually labeled Density should be labeled Non-deterministic

27

28

29 Reconstruction of Retinal Cell Types
PCA ~14K high quality cells from 44K sequenced cells T-SNE on 32 statistically significant PCA Density based clustering

30 Checking Batch Effect Single cells from different days

31 Seurat is an R package designed for QC, analysis, and exploration of single cell RNA-seq data
Satija et al., Nat. Biotech. 2015

32 Summary SMART-based vs Droplet-based single-cell sequencing
Barcode and UMI Dropout modeling tSNE for visualization

33 Acknowledgement Wei Li Michael Love Alisha Holloway Simon Andrews
Radhika Khetani Chengzhong Zhang Etai Jacob Caleb Lareau Luca Pinello Assieh Saadatpour


Download ppt "Differential Expression from RNA-seq"

Similar presentations


Ads by Google