Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tianjian Zhou U of Chicago/UT Austin

Similar presentations


Presentation on theme: "Tianjian Zhou U of Chicago/UT Austin"— Presentation transcript:

1 Tianjian Zhou U of Chicago/UT Austin
Bayesian Nonparametric Models for Biomedical Data Analysis — Inference for Tumor Heterogeneity and Missing Data Tianjian Zhou U of Chicago/UT Austin

2 Tumor Heterogeneity A C G A C G

3 Tumor Heterogeneity A C G A C G A C G A C G

4 Tumor Heterogeneity A C G A C G A C G A C G A G A C

5 Tumor Heterogeneity A C G A C G A C G A C G A G A C A G A C

6 Tumor Heterogeneity A C G A C G A C G A C G A G A C A G A C A G A C

7 Tumor Heterogeneity A C G A C G A C G A C G A G A C A G A C A G A C A

8 Tumor Heterogeneity A C G A C G A C G A C G A G A C A G A C A G A C A

9 Tumor Heterogeneity A C G A C G A C G A C G A G A C A G A C A G A C A

10 Tumor Heterogeneity A C G A C G A C G A C G A G A C A G A C A G A C A

11 Tumor Heterogeneity A C G A C G A C G A C G A G A C A G A C A G A C A

12 Tumor Heterogeneity Subclone 1 (Normal) Subclone 2 Subclone 3 A C G A

13 Population frequencies
Tumor Heterogeneity A C G A C G A G A C A G C T Subclone 1 Subclone 2 Subclone 3 Population frequencies 2/ / /10

14 Why Study Tumor Heterogeneity
Mutations are bad, want to eliminate mutated cells A C G A C G A G A C A G C T A C G A C G A G A C A G C T Therapy 1: targets the mutation at the 1st locus Therapy 2: targets the mutation at the 2nd locus

15 Why Study Tumor Heterogeneity
Mutations are bad, want to eliminate mutated cells A C G A C G A G A C A C G A C G Therapy 1: targets the mutation at the 1st locus Therapy 2: targets the mutation at the 2nd locus

16 Subclones/DNA sequences are unobserved/latent
Inference for Tumor Heterogeneity ???? ???? ???? ???? ? Subclone 1 Subclone 2 Subclone ? Subclones/DNA sequences are unobserved/latent

17 ? Inference for Tumor Heterogeneity Data: Short DNA reads
Mixture of signals from many cells Quantities of interest A C G ? ? ? ???? A G A C C A G A C ? # of subclones Phylogenetic relationship Genotypes Population frequencies Two DNA strands G A C T C A G A G A

18 ? Inference for Tumor Heterogeneity Data: Short DNA reads
Mixture of signals from many cells Quantities of interest A C G ? ? ? ???? A G A C C A G A C Proximal mutations ? # of subclones Phylogenetic relationship Genotypes Population frequencies Two DNA strands G A C T C A G A G A Proximal mutations

19 ? Inference for Tumor Heterogeneity Data: Short DNA reads
Mixture of signals from many cells Quantities of interest A C G ? ? ? ???? A G A C C A G A C Mutation Pair 1 ? # of subclones Phylogenetic relationship Genotypes Population frequencies Two DNA strands G A C T C A G A G A Mutation Pair 2

20 Population frequencies
Representation of Subclones A C G A C G A G A C A G C T Subclone 1 Subclone 2 Subclone 3 Population frequencies 2/ / /10

21 Population frequencies
Representation of Subclones 1: mutation; 0: no mutation (reference) 1 1 1 1 Subclone 1 Subclone 2 Subclone 3 Population frequencies 2/ / /10

22 Population frequencies
Representation of Subclones Mutation pairs (MP) 1: mutation; 0: no mutation (reference) 1 1 1 MP1 Subclone 1 Subclone 2 Subclone 3 Population frequencies 2/ / /10

23 Population frequencies
Representation of Subclones Mutation pairs (MP) 1: mutation; 0: no mutation (reference) 1 1 MP2 Subclone 1 Subclone 2 Subclone 3 Population frequencies 2/ / /10

24 Representation of Subclones
1 1 1 1 Latent factor matrix Z Factor loadings w # of subclones C 3 Phylogenetic tree T 1 → 2 → 3 0 0 1 0 0 1 MP1 MP2 2/10 3/10 5/10 S S S3 S S S3

25 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0 MP1 MP2 MP3
S S S3

26 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0
# of new mutations 𝐴 2 ~ Trunc−Poi 𝜆 MP1 MP2 MP3 MP4 S S S3

27 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0
# of new mutations 𝐴 2 ~ Trunc−Poi 𝜆 𝐴 2 =3 MP1 MP2 MP3 MP4 S S S3

28 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0
# of new mutations 𝐴 2 ~ Trunc−Poi 𝜆 𝐴 2 =3 MP1 MP2 MP3 MP4 S S S3

29 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0 1 0 0 1 MP1
S S S3

30 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0 1 0 0 1
# of new mutations 𝐴 3 ~ Trunc−Poi 𝜆 MP1 MP2 MP3 MP4 S S S3

31 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0 1 0 0 1
# of new mutations 𝐴 3 ~ Trunc−Poi 𝜆 𝐴 3 =4 MP1 MP2 MP3 MP4 S S S3

32 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0 1 0 0 1
# of new mutations 𝐴 3 ~ Trunc−Poi 𝜆 𝐴 3 =4 MP1 MP2 MP3 MP4 S S S3

33 Prior Model Latent factor matrix Z|T, C T: 1 → 2 → 3 0 0 1 0 0 1 1 1
MP1 MP2 MP3 MP4 S S S3

34 ? Inference for Tumor Heterogeneity Data: Short DNA reads
Mixture of signals from many cells Quantities of interest A C G ? ? ? ???? A G A C C A G A C Mutation Pair 1 ? # of subclones Phylogenetic relationship Genotypes Population frequencies Two DNA strands G A C T C A G A G A Mutation Pair 2

35 Sampling Model A C A C A C A G A C A G C 2/10 3/10 5/10

36 Sampling Model A C A G C 5/10

37 Sampling Model A C A G 5/10

38 Sampling Model A C A G 5/10

39 Sampling Model A C A G A C A C A G A C A G C 2/10 3/10 5/10

40 Sampling Model A C A G A C A C 2/10

41 Sampling Model A C A G A C 2/10

42 Sampling Model A C A G A C 2/10

43 Sampling Model A C A G A C A C A C A G A C A G C 2/10 3/10 5/10

44 Sampling Model A C A G A C A G C 5/10

45 Sampling Model A C A G A C C 5/10

46 Sampling Model A C A G A C C 5/10

47 Sampling Model A C A G A C C A C A C A G A C A G C 2/10 3/10 5/10

48 Sampling Model A C A G A C C A G A C 3/10

49 Sampling Model A C A G A C C A G 3/10

50 Sampling Model A C A G A C C A G 3/10

51 Sampling Model A C A G A C C A G A C A C A G A C A G C 2/10 3/10 5/10

52 Sampling Model A C A G A C C A G A G A C 3/10

53 Sampling Model A C A G A C C A G A C 3/10

54 Sampling Model A C A G A C C A G A C 3/10

55 Sampling Model A C 2/10 3/10 5/10 A G A C C A G A C A C A C A G A C A

56 Sampling Model A C G 2/10 3/10 5/10 A G A C C A G A C G A C T C A G A

57 Sampling Model A C G 2/10 3/10 5/10 1 1 1 1 1 A C G A C G A G A C A G
1 1 1 A C G A C G A G A C A G C T 2/10 3/10 5/10 1 1

58 ? Inference for Tumor Heterogeneity Data: Short DNA reads
Mixture of signals from many cells Quantities of interest A C G ? ? ? ???? A G A C C A G A C Mutation Pair 1 ? # of subclones Phylogenetic relationship Genotypes Population frequencies Two DNA strands G A C T C A G A G A Mutation Pair 2 Sampling model & Prior model → Posterior inference

59 TCGA Lung Cancer Data Malignant hyper-mutated subclone
Small population frequency

60 Concluding Remark The use of mutation pairs strengthens inference for tumor heterogeneity

61 Inference for Missing Data
𝒚: Longitudinal outcomes after treated by a test drug 𝑠: Dropout time 𝑠=4 Treatment Effect E 𝑌 6 − 𝑌 1 Outcome 𝑠=5 1 2 3 4 5 6 Time

62 Inference for Missing Data
Biased if not MCAR Inefficient Can’t do sensitivity analysis Treatment Effect E 𝑌 6 − 𝑌 1 Outcome 1 2 3 4 5 6 Time

63 Inference for Missing Data
Treatment Effect E 𝑌 6 − 𝑌 1 Outcome 1 2 3 4 5 6 Time

64 Inference for Missing Data
𝒗= (Age: 66, Height: 185, Weight: 78, Gender: M) Dropout due to lack of efficacy 𝒗= (Age: 41, Height: 170, Weight: 62, Gender: F) Outcome 𝒗= (Age: 29, Height: 166, Weight: 54, Gender: F) Dropout due to pregnancy 𝒗= (Age: 33, Height: 159, Weight: 49, Gender: F) 1 2 3 4 5 6 Time

65 Extrapolation Factorization
Joint model for 𝒚,𝑠 and 𝒗 𝑝 𝒚,𝑠,𝒗 =𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs ,𝑠,𝒗 Observed data distribution: identified and can be estimated semi/non-parametrically Extrapolation distribution: not identified without uncheckable assumptions (e.g. MAR, missing non-future dependent NFD)

66 Observed Data Distribution: Pattern Mixture Modeling
𝑝 𝒚 obs ,𝑠,𝒗 =𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 𝑝 𝒚 obs 𝑠,𝒗 Gaussian process (GP) & autoregressive (AR) & conditional autoregressive (CAR) priors 𝑝 𝑠 𝒗 Bayesian additive regression trees (BART) 𝑝 𝒗 Bayesian bootstrap

67 Extrapolation Distribution: Identifying Restrictions
Missing at random (MAR): 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 is fully identified by 𝑝 𝒚 obs ,𝑠,𝒗 Non-future dependent (NFD): 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 is partially identified by 𝑝 𝒚 obs ,𝑠,𝒗 . Put informative priors on non-identified parameters

68 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 33, Height: 166, Weight: 54, Gender: F) Outcome 1 2 3 4 5 6 Time

69 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 33, Height: 166, Weight: 54, Gender: F) 𝑠=5 Outcome 1 2 3 4 5 6 Time

70 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 33, Height: 166, Weight: 54, Gender: F) 𝑠=5 Outcome 1 2 3 4 5 6 Time

71 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 33, Height: 166, Weight: 54, Gender: F) 𝑠=5 Outcome 1 2 3 4 5 6 Time

72 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 37, Height: 163, Weight: 51, Gender: F) Outcome 1 2 3 4 5 6 Time

73 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 37, Height: 163, Weight: 51, Gender: F) 𝑠=6 Outcome 1 2 3 4 5 6 Time

74 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 37, Height: 163, Weight: 51, Gender: F) 𝑠=6 Outcome 1 2 3 4 5 6 Time

75 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 66, Height: 185, Weight: 78, Gender: M) Outcome 1 2 3 4 5 6 Time

76 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 66, Height: 185, Weight: 78, Gender: M) 𝑠=3 Outcome 1 2 3 4 5 6 Time

77 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 66, Height: 185, Weight: 78, Gender: M) 𝑠=3 Outcome 1 2 3 4 5 6 Time

78 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 𝒗= (Age: 66, Height: 185, Weight: 78, Gender: M) 𝑠=3 Outcome 1 2 3 4 5 6 Time

79 Monte Carlo Integration/G-Computation
E 𝑡 𝒚 = ∫ 𝒚 𝑡 𝒚 𝑝 𝒚 d𝒚 = ∫ 𝒚 𝑡 𝒚 ∑ 𝑠 ∫ 𝒗 𝑝 𝒚 mis 𝒚 obs ,𝑠,𝒗 𝑝 𝒚 obs 𝑠,𝒗 𝑝 𝑠 𝒗 𝑝 𝒗 ⅆ𝒗 d𝒚 Outcome 𝑦 6 − 𝑦 1 1 2 3 4 5 6 Time

80 Schizophrenia Dataset
Test drug improvement over placebo A negative value represents an improvement Conclusion: no evidence that the test drug performs better than placebo

81 Sensitivity Analysis Vary uncheckable assumptions and see whether conclusion differs

82 Concluding Remark The model specifications (GP/AR/CAR/BART) nicely exploit the data structure and lead to improvement over simple parametric approaches

83 References Zhou, T., Müller, P., Sengupta, S. and Ji, Y. (2019) PairClone: A Bayesian subclone caller based on mutation pairs. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(3), Zhou, T., Sengupta, S., Müller, P., and Ji, Y. (2019) TreeClone: Reconstruction of tumor subclone phylogeny based on mutation pairs using next generation sequencing data. The Annals of Applied Statistics, 13(2), Zhou, T., Daniels, M. J. and Müller, P. (2019) A Semiparametric Bayesian Approach to Dropout in Longitudinal Studies with Auxiliary Covariates. Journal of Computational and Graphical Statistics, forthcoming.

84 Thank you! Questions & comments: tjzhou@uchicago.edu
Currently on job market


Download ppt "Tianjian Zhou U of Chicago/UT Austin"

Similar presentations


Ads by Google