Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department.

Similar presentations


Presentation on theme: "A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department."β€” Presentation transcript:

1 A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department of Bioinformatics and Biostatistics University of Louisville JSM: July 30th, 2018 Advisors: Dr. Susmita Datta and Dr. Jeremy Gaskins

2 Overview Motivation Proposed Method
Hurdle Model with Correlated Random Effects Detecting Differentially Expressed (DE) Genes Results Conclusions

3 Motivation

4 Differential Expression
Old question on new data Bulk RNA-sequencing (RNA-seq) Averaged gene expression levels Single cell RNA-sequencing (scRNA-seq) Expressions from individual cells Abundance of zeros Cell-to-cell variability

5 Proposed Method Hurdle model with correlated random effects

6 Logistic Regression Truncated Negative Binomial Regression
Hurdle Model π‘Œ 𝑔𝑖 = expression count for gene 𝑔 in cell 𝑖 𝑍 𝑔𝑖 = indicator of expression for gene 𝑔 in cell 𝑖 ( 𝑍 𝑔𝑖 =1 when π‘Œ 𝑔𝑖 >0) Logistic Regression Truncated Negative Binomial Regression π‘™π‘œπ‘”π‘–π‘‘ πœƒ 𝑔𝑖 = 𝛽 0𝑔 𝐿 +𝑋 𝑖 𝜷 π’ˆ 𝑳 + πœ” 𝑖 𝜁 𝑔 𝐿 πœƒ 𝑔𝑖 =𝑃( 𝑍 𝑔𝑖 = 1) π‘™π‘œπ‘” πœ‡ 𝑔𝑖 = 𝛽 0𝑔 𝐢 +𝑋 𝑖 𝜷 π’ˆ π‘ͺ + πœ” 𝑖 𝜁 𝑔 𝐢

7 Correlated Random Effects

8 Correlated Random Effects
Create initial rough clustering of cells 𝐾 0 subpopulations in control group 𝐾 1 subpopulations in treatment group

9 Correlated Random Effects
πœ” 𝑖 = 𝛾 𝑑, π‘˜ 𝑑 (𝑖) + πœ” 𝑖 βˆ— 𝛾 𝑑,π‘˜ ~ 𝑖.𝑖.𝑑. π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™(0, 𝜎 𝑑 2 ) πœ” 𝑖 βˆ— ~ 𝑖.𝑖.𝑑. π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™(0, 𝜎 βˆ— 2 ) Correlation between cells within same subpopulation of treatment t 𝜌 𝑑 = 𝜎 𝑑 2 𝜎 𝑑 2 + 𝜎 βˆ— 2

10 Proposed Method Detecting differentially expressed (DE) genes

11 Parameter Estimation Bayesian approach
Markov Chain Monte Carlo (MCMC) sampling Natural choice SLOW!

12 Parameter Estimation Bayesian approach
Markov Chain Monte Carlo (MCMC) sampling Natural choice SLOW! Automatic differentiation variational inference* (ADVI) Variational Bayes method implemented in STAN Faster alternative Obtain samples from the approximate posterior distribution * Kucukelbir et al., 2015

13 DE Method π‘Š 𝑔 = 𝑩 π’ˆ 𝑻 𝑽 π’ˆ βˆ’πŸ 𝑩 π’ˆ β†’ πœ’ 2 (2)
Evaluate posterior estimates of 𝛽 1𝑔 𝐿 and 𝛽 1𝑔 𝐢 Coefficients of treatment indicator (treatment vs. control) Under frequentist null hypothesis: 𝛽 1𝑔 𝐿 =0 and 𝛽 1𝑔 𝐢 =0 π‘Š 𝑔 = 𝑩 π’ˆ 𝑻 𝑽 π’ˆ βˆ’πŸ 𝑩 π’ˆ β†’ πœ’ 2 (2)

14 Data Analysis Results

15 Hurdle model simulation
TPR FPR FDR AUC DE Genes CRE, SC3 0.682 0.010 0.046 0.958 1548 CRE, SNN-Cliq 0.009 0.045 0.959 1547 CRE, TRUE 0.047 1551 IRE 0.686 0.011 0.050 1564 NRE 0.669 0.019 0.086 0.947 1591 MAST 0.594 0.007 0.038 0.948 1337 SCDE 0.126 0.094 0.200 0.646 974 DESeq2 0.402 0.058 0.321 0.777 1304 edgeR 0.396 0.073 0.377 0.764 1400 Splat* simulation 0.475 0.006 0.051 0.924 576 0.052 0.923 0.501 0.008 0.063 615 0.404 0.910 497 0.220 0.002 0.032 0.879 264 0.238 0.030 0.905 274 0.601 0.042 0.215 0.887 892 0.740 0.076 0.297 0.911 1219 * Zappia et al., Genome Biology, 2017

16 Islam et al., Genome Research, 2011

17 Top 500 DE genes from MEC dataset
CRE MAST SCDE DESeq2 edgeR Key FC(+) PZ (-) FC(+) PZ (+) FC(-) PZ (-) FC(-) PZ (-) Top 500 DE genes from MEC dataset

18 Conclusions

19 Final Thoughts Identify high number of DE genes
Outperform current methods across most performance measures (simulation studies) Identify high number of DE genes More DE genes than other scRNA-seq methods Maintain FDR

20 THANK YOU!

21 References Islam, S., KjΓ€llquist, U., Moliner, A., Zajac, P., Fan, J. B., LΓΆnnerberg, P., & Linnarsson, S. (2011). Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq.Β Genome research,Β 21(7), Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. InΒ Advances in neural information processing systemsΒ (pp ). Zappia, L., Phipson, B., and Oshlack, A. (2017). Splatter: Simulation of Single-Cell RNA Sequencing Data.Β Genome biology, 18(1), 174.


Download ppt "A Correlated Random Effects Hurdle Model for Detecting Differentially Expressed Genes in Discrete Single Cell RNA Sequencing Data Michael Sekula Department."

Similar presentations


Ads by Google