Download presentation
Presentation is loading. Please wait.
Published byMildred Green Modified over 6 years ago
1
A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department of Bioinformatics and Biostatistics University of Louisville JSM: July 30th, 2018 Advisors: Dr. Susmita Datta and Dr. Jeremy Gaskins
2
Overview Motivation Proposed Method
Hurdle Model with Correlated Random Effects Detecting Differentially Expressed (DE) Genes Results Conclusions
3
Motivation
4
Differential Expression
Old question on new data Bulk RNA-sequencing (RNA-seq) Averaged gene expression levels Single cell RNA-sequencing (scRNA-seq) Expressions from individual cells Abundance of zeros Cell-to-cell variability
5
Proposed Method Hurdle model with correlated random effects
6
Logistic Regression Truncated Negative Binomial Regression
Hurdle Model π ππ = expression count for gene π in cell π π ππ = indicator of expression for gene π in cell π ( π ππ =1 when π ππ >0) Logistic Regression Truncated Negative Binomial Regression πππππ‘ π ππ = π½ 0π πΏ +π π π· π π³ + π π π π πΏ π ππ =π( π ππ = 1) πππ π ππ = π½ 0π πΆ +π π π· π πͺ + π π π π πΆ
7
Correlated Random Effects
8
Correlated Random Effects
Create initial rough clustering of cells πΎ 0 subpopulations in control group πΎ 1 subpopulations in treatment group
9
Correlated Random Effects
π π = πΎ π‘, π π‘ (π) + π π β πΎ π‘,π ~ π.π.π. ππππππ(0, π π‘ 2 ) π π β ~ π.π.π. ππππππ(0, π β 2 ) Correlation between cells within same subpopulation of treatment t π π‘ = π π‘ 2 π π‘ 2 + π β 2
10
Proposed Method Detecting differentially expressed (DE) genes
11
Parameter Estimation Bayesian approach
Markov Chain Monte Carlo (MCMC) sampling Natural choice SLOW!
12
Parameter Estimation Bayesian approach
Markov Chain Monte Carlo (MCMC) sampling Natural choice SLOW! Automatic differentiation variational inference* (ADVI) Variational Bayes method implemented in STAN Faster alternative Obtain samples from the approximate posterior distribution * Kucukelbir et al., 2015
13
DE Method π π = π© π π» π½ π βπ π© π β π 2 (2)
Evaluate posterior estimates of π½ 1π πΏ and π½ 1π πΆ Coefficients of treatment indicator (treatment vs. control) Under frequentist null hypothesis: π½ 1π πΏ =0 and π½ 1π πΆ =0 π π = π© π π» π½ π βπ π© π β π 2 (2)
14
Data Analysis Results
15
Hurdle model simulation
TPR FPR FDR AUC DE Genes CRE, SC3 0.682 0.010 0.046 0.958 1548 CRE, SNN-Cliq 0.009 0.045 0.959 1547 CRE, TRUE 0.047 1551 IRE 0.686 0.011 0.050 1564 NRE 0.669 0.019 0.086 0.947 1591 MAST 0.594 0.007 0.038 0.948 1337 SCDE 0.126 0.094 0.200 0.646 974 DESeq2 0.402 0.058 0.321 0.777 1304 edgeR 0.396 0.073 0.377 0.764 1400 Splat* simulation 0.475 0.006 0.051 0.924 576 0.052 0.923 0.501 0.008 0.063 615 0.404 0.910 497 0.220 0.002 0.032 0.879 264 0.238 0.030 0.905 274 0.601 0.042 0.215 0.887 892 0.740 0.076 0.297 0.911 1219 * Zappia et al., Genome Biology, 2017
16
Islam et al., Genome Research, 2011
17
Top 500 DE genes from MEC dataset
CRE MAST SCDE DESeq2 edgeR Key FC(+) PZ (-) FC(+) PZ (+) FC(-) PZ (-) FC(-) PZ (-) Top 500 DE genes from MEC dataset
18
Conclusions
19
Final Thoughts Identify high number of DE genes
Outperform current methods across most performance measures (simulation studies) Identify high number of DE genes More DE genes than other scRNA-seq methods Maintain FDR
20
THANK YOU!
21
References Islam, S., KjΓ€llquist, U., Moliner, A., Zajac, P., Fan, J. B., LΓΆnnerberg, P., & Linnarsson, S. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq.Β Genome research,Β 21(7), Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. InΒ Advances in neural information processing systemsΒ (pp ). Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: Simulation of Single-Cell RNA Sequencing Data.Β Genome biology, 18(1), 174.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.