A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department.

A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department of Bioinformatics and Biostatistics University of Louisville JSM: July 30th, 2018 Advisors: Dr. Susmita Datta and Dr. Jeremy Gaskins

Overview Motivation Proposed Method
Hurdle Model with Correlated Random Effects Detecting Differentially Expressed (DE) Genes Results Conclusions

Motivation

Differential Expression
Old question on new data Bulk RNA-sequencing (RNA-seq) Averaged gene expression levels Single cell RNA-sequencing (scRNA-seq) Expressions from individual cells Abundance of zeros Cell-to-cell variability

Proposed Method Hurdle model with correlated random effects

Logistic Regression Truncated Negative Binomial Regression
Hurdle Model 𝑌 𝑔𝑖 = expression count for gene 𝑔 in cell 𝑖 𝑍 𝑔𝑖 = indicator of expression for gene 𝑔 in cell 𝑖 ( 𝑍 𝑔𝑖 =1 when 𝑌 𝑔𝑖 >0) Logistic Regression Truncated Negative Binomial Regression 𝑙𝑜𝑔𝑖𝑡 𝜃 𝑔𝑖 = 𝛽 0𝑔 𝐿 +𝑋 𝑖 𝜷 𝒈 𝑳 + 𝜔 𝑖 𝜁 𝑔 𝐿 𝜃 𝑔𝑖 =𝑃( 𝑍 𝑔𝑖 = 1) 𝑙𝑜𝑔 𝜇 𝑔𝑖 = 𝛽 0𝑔 𝐶 +𝑋 𝑖 𝜷 𝒈 𝑪 + 𝜔 𝑖 𝜁 𝑔 𝐶

Correlated Random Effects

Create initial rough clustering of cells 𝐾 0 subpopulations in control group 𝐾 1 subpopulations in treatment group

𝜔 𝑖 = 𝛾 𝑡, 𝑘 𝑡 (𝑖) + 𝜔 𝑖 ∗ 𝛾 𝑡,𝑘 ~ 𝑖.𝑖.𝑑. 𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝜎 𝑡 2 ) 𝜔 𝑖 ∗ ~ 𝑖.𝑖.𝑑. 𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝜎 ∗ 2 ) Correlation between cells within same subpopulation of treatment t 𝜌 𝑡 = 𝜎 𝑡 2 𝜎 𝑡 2 + 𝜎 ∗ 2

Proposed Method Detecting differentially expressed (DE) genes

Parameter Estimation Bayesian approach
Markov Chain Monte Carlo (MCMC) sampling Natural choice SLOW!

Parameter Estimation Bayesian approach
Markov Chain Monte Carlo (MCMC) sampling Natural choice SLOW! Automatic differentiation variational inference* (ADVI) Variational Bayes method implemented in STAN Faster alternative Obtain samples from the approximate posterior distribution * Kucukelbir et al., 2015

DE Method 𝑊 𝑔 = 𝑩 𝒈 𝑻 𝑽 𝒈 −𝟏 𝑩 𝒈 → 𝜒 2 (2)
Evaluate posterior estimates of 𝛽 1𝑔 𝐿 and 𝛽 1𝑔 𝐶 Coefficients of treatment indicator (treatment vs. control) Under frequentist null hypothesis: 𝛽 1𝑔 𝐿 =0 and 𝛽 1𝑔 𝐶 =0 𝑊 𝑔 = 𝑩 𝒈 𝑻 𝑽 𝒈 −𝟏 𝑩 𝒈 → 𝜒 2 (2)

Data Analysis Results

Hurdle model simulation
TPR FPR FDR AUC DE Genes CRE, SC3 0.682 0.010 0.046 0.958 1548 CRE, SNN-Cliq 0.009 0.045 0.959 1547 CRE, TRUE 0.047 1551 IRE 0.686 0.011 0.050 1564 NRE 0.669 0.019 0.086 0.947 1591 MAST 0.594 0.007 0.038 0.948 1337 SCDE 0.126 0.094 0.200 0.646 974 DESeq2 0.402 0.058 0.321 0.777 1304 edgeR 0.396 0.073 0.377 0.764 1400 Splat* simulation 0.475 0.006 0.051 0.924 576 0.052 0.923 0.501 0.008 0.063 615 0.404 0.910 497 0.220 0.002 0.032 0.879 264 0.238 0.030 0.905 274 0.601 0.042 0.215 0.887 892 0.740 0.076 0.297 0.911 1219 * Zappia et al., Genome Biology, 2017

Islam et al., Genome Research, 2011

Top 500 DE genes from MEC dataset
CRE MAST SCDE DESeq2 edgeR Key FC(+) PZ (-) FC(+) PZ (+) FC(-) PZ (-) FC(-) PZ (-) Top 500 DE genes from MEC dataset

Conclusions

Final Thoughts Identify high number of DE genes
Outperform current methods across most performance measures (simulation studies) Identify high number of DE genes More DE genes than other scRNA-seq methods Maintain FDR

THANK YOU!

References Islam, S., Kjällquist, U., Moliner, A., Zajac, P., Fan, J. B., Lönnerberg, P., & Linnarsson, S. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome research, 21(7), Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. In Advances in neural information processing systems (pp ). Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: Simulation of Single-Cell RNA Sequencing Data. Genome biology, 18(1), 174.

A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department.

Similar presentations

Presentation on theme: "A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department.

Similar presentations

Presentation on theme: "A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department."— Presentation transcript:

Similar presentations

About project

Feedback