Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Identification of Tumor heterogeneity

Similar presentations


Presentation on theme: "Computational Identification of Tumor heterogeneity"— Presentation transcript:

1 Computational Identification of Tumor heterogeneity
Sangwoo Kim

2 Tumor heterogeneity Inter-tumor heterogeneity: genetic and phenotypic variation between individuals with the same tumor type Intra-tumor heterogeneity: subclonal diversity within a tumor

3 Tumor heterogeneity in AML

4 Tumor progression and response

5 Heterogeneity and resistance
Unfortunately, about a third of ER-positive breast cancers are inherently resistant to endocrine therapy, although they may still respond to other drugs with different mechanisms of action, and 30–40% of the initial responders will also eventually progress to resistant disease. Because the options for these patients are limited, understanding what induces endocrine resistance in these tumors has been one of the longest standing and most intense areas of breast cancer research

6 Inferring tumor heterogeneity
1. single cell sequencing 2. bulk sequencing and reconstruction

7 computational identification of tumor subclones

8 Today’s paper 1 (PyClone)
Shorab Shah, Ph.D. Associate Professor in the Departments of Pathology and Computer Science, University of British Columbia Dr. Shah’s work focuses on characterization of cancer genomes for determination of pathogenic driver mutations in cancer subtypes and measuring and quantifying tumour evolution

9 Conceptual overview Sequencing – pool sequencing – unclassified tools

10 Allele frequency and Cellular prevalence
Allele frequency (af): ratio of alternative allele to total haploid cellular prevalence (cp): proportion of tumor cells harboring a mutation 70% 30% allele frequency = 15% cellular prevalence = 30% subclone 1 (AA) subclone 2 (AB) 70% 30% allele frequency = 10% cellular prevalence = 30% subclone 1 (AA) subclone 2 (AAB)

11 Allele frequency to cellular prevalence
A toy example Example AF Genotype CP mutation 1 10% AB 20% mutation 2 AAB 30% mutation 3 ABB 15% mutation 4 40% mutation 5 AABB mutation 6 50% 100% mutation 7 75% Genotype (copy number) is essential for heterogeneity estimation

12 Cellular prevalence and evolution model
Assumption: 1) clonal population follows a perfect phylogeny: no site mutates more than once in its evolutionary history and each harbors at most one somatic mutant genotype 2) clonal population follows a persistent phylogeny: mutations do not disappear or revert

13 Cellular prevalence and evolution model
10% 30% 30% 10% 20% What to infer: 1) number and composition of subclones 2) cellular prevalence (cp): proportion of tumor cells harboring a mutation

14 Input and Output Input (observation): Output:
a set of deeply sequenced mutations (AF) from one or multiple locus in each sample a measure of allele specific copy number at each mutation locus (genotype) Output: CP of each mutation Clustering among mutations overall CP and cluster Clusters and CP CNV mutation (AF)

15 Pyclone population structure
Allele frequency of this mutation: 6*4*(2/4) / {2*2 + 4*3 + 6*4} Cellular prevalence of this mutation: 6 / (4 + 6)

16 Things to consider fraction of cancer cell: t
fraction of normal cell = 1-t genotype of normal, reference, variant population of nth mutation gN, gR, gV ∈ {-, A, B, AA, AB, BB, AAA, AAB...} ψn = (gnN , gnR , gnV ) ∈ G3 read depth at the locus of nth mutation: dn number of reads harboring nth mutation: bn Cellular prevalence of nth mutation

17 The generative model prior parameter posterior parameter
φn = fraction of cancer cells from the variant populations ψn = (gnN , gnR , gnV )

18 The probability the probability of sampling a read containing the variant allele covering a mutation with state ψ = (gN, gR, gV) and cellular prevalence φ c(g) : copy number of the genotype (e.g. g=AAB, c(g)=3) b(g) : number of variant allele of the genotype (e.g. g=AAB, b(g)=1) µ(g) : probability of sampling a variant allele from a cell = b(g)/c(g)

19 The probability of bn 𝑃(𝑏𝑛)=𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑑𝑛,ξ(ψ, φ, t))
when cp is given we can calculate the probability of observing bn

20 inferring cp from bn 1. mutations with same cellular prevalence are clustered to a same clone 2. We want to infer the most likely cellular prevalence of mutations from observation; and find clusters for subclone e,g, if the best φ is [0.7, 0.5, 0.5, 0.4, 0.2, 0.5, 1.0, 0.9, 0.1, 0.4] always problematic!!

21 Getting cp by sampling Cp prior ~ Dirichlet process Sampling:
to have discrete cp values Sampling: Metropolis-Hastings algorithm Let f(x) be a function that is proportional to the desired probability distribution P(x). Initialization: Choose an arbitrary point x0 to be the first sample, and choose an arbitrary probability density   which suggests a candidate for the next sample value x, given the previous sample value y. For the Metropolis algorithm, Q must be symmetric; in other words, it must satisfy  . A usual choice is to let   be a Gaussian distribution centered at y, so that points closer to y are more likely to be visited next—making the sequence of samples into a random walk. The function Q is referred to as the proposal density or jumping distribution. For each iteration t: Generate a candidate x' for the next sample by picking from the distribution  . Calculate the acceptance ratio α = f(x')/f(xt), which will be used to decide whether to accept or reject the candidate. Because f is proportional to the density of P, we have that α = f(x')/f(xt) = P(x')/P(xt). If α ≥ 1, then the candidate is more likely than xt; automatically accept the candidate by setting xt+1 = x'. Otherwise, accept the candidate with probability α; if the candidate is rejected, set xt+1 = xt, instead.

22 example of cluster

23 results (synthetic data)
accuracy with synthetic data di ~ Poisson(10,000), t=0.75, 8 clusters with CP~Uniform(0,1), genotype -> total copy number (1~5), AB, BB, NZ, TCN, PCN -> genotype prior (goto 17p)

24 results (synthetic data)

25 prior for mutational genotype
copy number must be measured for each mutation site: 𝑐 =total copy number 𝑐1 , 𝑐2 =copy number of each homologous chromosome 5 different strategies for assigning genotype AB prior: gR=AA, gV=AB BB prior: gR=AA, gV=BB No Zygosity (NZ) prior: gR=AA, c(gV)= 𝑐 , b(gV)=1 Total Copy Number (TCN) prior: c(gV)= 𝑐 , b(gV) ∈{1... 𝑐 }, gR=AA or c(gR)= 𝑐 , b(gR)=0 Parental Copy Number (PCN) prior: c(gV)= 𝑐 , b(gV) ∈{1, 𝑐1 , 𝑐2 } if b(gV) ∈{ 𝑐1 , 𝑐2 }, gR=gN (AA) => mutation occurred before copy number increase if b(gV)=1, or c(gR)= 𝑐 , b(gR)=0 => mutation occurred after copy number increase c=4, c1=c2=2 c=3, c1=1, c2=2

26 results (real data) Data = physical mixture of 4 individuals (from 1000 Genomes) {0.01,0.05,0.20,0.74) - NA12156, NA12878, NA18507, NA19240 - generated 7 clusters (unique 4, NA18507+NA19240, NA12878+NA18507+NA19240, All four shared) BeBin = Beta Binomial (instead of binomial) to emulate over-dispersion

27 results (real data) naïve (12 clusters) True answer
false separation of clusters with homo and hetero Pyclone (7 clusters) cluster1

28 result (ovarian cancer)
LOH hetero CNV1~3 Four spatially sampled high-grade serous ovarian cancer -> 49 deeply sequenced validated mutations IBBMM cluster 1,2,6 should be collapsed to PyClone cluster 1 => single cell sequencing of 25

29 result (ovarian cancer)
pyclone cluster (yellow box = cluster 1) IBBMM non-somatic IBBMM cluster 1, 2 is one cluster (as Pyclone expected)

30 Conclusions PyClone can infer clonal population structures in cancer
Using beta-binomial emission densities, which models data sets with more variance in allelic prevalence measurements more effectively than a binomial model. Flexible prior probability estimates ('priors') of possible mutational genotypes are used, reflecting how allelic prevalence measurements are deterministically linked to zygosity and coincident copy-number variation events. Bayesian nonparametric clustering is used to discover groupings of mutations and the number of groups simultaneously. This obviates fixing the number of groups a priori and allows for cellular prevalence estimates to reflect uncertainty in this parameter. Multiple samples from the same cancer may be analyzed jointly to leverage the scenario in which clonal populations are shared across samples.

31 Software Implemented in Python Freely available in
License: GPL3 (free for academic use)

32

33 V-measure


Download ppt "Computational Identification of Tumor heterogeneity"

Similar presentations


Ads by Google