Zhi Yang, MS Department of Preventive Medicine, USC Jul 29, 2018 Statistical Approach for Investigating Change in Mutational Processes During Cancer Growth and Development Zhi Yang, MS Department of Preventive Medicine, USC Jul 29, 2018 Hello everyone My name is Zhi, I am a third year Phd student in the biostatistics program. In project four, we also do hierarchical modeling but in tumors by using somatic mutations. More specifically, we will describe the somatic mutation with mutational signature, which is a concept I will introduce later in the talk. Therefore, we use hierarchical modeling of mutational signatures in tumors to capture the change during the tumor growth.
A Unifying Model to Test Difference? HiLDA = “Hierarchical Latent Dirichlet Allocation” Uncertainty in Proportions Somatic mutations pmsignature Estimated Proportions, 𝒒 Are 𝒒 different in two groups? Regress 𝒒 on 𝑮 (0=branch, 1=trunk) HiLDA If people would like to infer the difference in signature proportions, they can take the point estimates by using any current methods, for example, R package pmsignature by assuming independence. Then, take the fractions to regress on the indicator variable group, 1 as trunk 2
Hierarchical Latent Dirichlet Allocation 𝒑 𝒊 𝟎 𝒑 𝒊 𝟏 Hyperprior 𝒒 𝑖 1 𝑍 𝑖,𝑗 1 𝑋 𝑖,𝑗 1 𝒇 𝑘 𝑘=1…𝐾 𝑗=1… 𝑛 𝑖 1 𝑖=1…𝑁 𝑗=1… 𝑛 𝑖 0 𝑋 𝑖,𝑗 0 𝑍 𝑖,𝑗 0 𝒒 𝑖 0 Signature Latent signature assignment Observed Mutation Proportions 𝜹 𝑘 Hyperprior Branch Trunk Adding animation for hyperprior 3
Methods: HiLDA Branch - Trunk 2nd sig 3rd sig Coefficient -0.786 0.984 Group 𝒈 Tumor 𝒊 log(fractions) of Signature 𝒌 1st sig 2nd sig 3rd sig Trunk 1 𝜶 𝟐 + 𝜸 𝟏,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏,𝟑 𝟎 2 𝜶 𝟐 + 𝜸 𝟐,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟐,𝟑 𝟎 … 16 𝜶 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟎 Branch 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟐,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟐,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟏 𝑞 𝑖,𝑘 𝑔 = 𝜙 𝑖,𝑘 𝑔 𝑘 𝜙 𝑖,𝑘 𝑔 ; 𝑙𝑜𝑔 𝑞 𝑖,𝑘 𝑔 𝑞 𝑖, 1 𝑔 = 𝜶 𝒌 + 𝜷 𝒌 + 𝜸 𝒊,𝒌 𝒈 𝜶 𝒌 : Baseline difference between 1st and 𝑘 𝑡ℎ signature 𝜷 𝒌 : Difference between two groups in 𝑘 𝑡ℎ signature 𝜸 𝒋𝒌 𝒈 : Variation for 𝑘 𝑡ℎ signature of 𝑖 𝑡ℎ tumor in 𝑔 𝑡ℎ group 4 Group 𝒈 Tumor 𝒊 log(fractions) of Signature 𝒌 1st sig 2nd sig 3rd sig Trunk 1 𝜶 𝟐 + 𝜸 𝟏,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏,𝟑 𝟎 2 𝜶 𝟐 + 𝜸 𝟐,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟐,𝟑 𝟎 … 16 𝜶 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟎 Branch 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟐,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟐,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟏 Group 𝒈 Tumor 𝒊 log(fractions) of Signature 𝒌 1st sig 2nd sig 3rd sig Trunk 1 𝜶 𝟐 + 𝜸 𝟏,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏,𝟑 𝟎 2 𝜶 𝟐 + 𝜸 𝟐,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟐,𝟑 𝟎 … 16 𝜶 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟎 𝜶 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟎 Branch 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟐,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟐,𝟑 𝟏 𝜶 𝟐 + 𝜷 𝟐 + 𝜸 𝟏𝟔,𝟐 𝟏 𝜶 𝟑 + 𝜷 𝟑 + 𝜸 𝟏𝟔,𝟑 𝟏 Branch - Trunk 2nd sig 3rd sig Coefficient -0.786 0.984 SE 0.152 0.587 P value <0.001 0.094
Results: Two-step Method v.s. HiLDA Branch-Trunk 2nd Sig 3rd Sig Coefficient -0.786 0.984 -0.795 3.417 SE 0.152 0.587 0.179 1.424 P value <0.001 0.094 0.016 The new signatures (3rd signature) tend to appear significantly more often in the branch mutations (𝑝=0.016) by using the new model (HiLDA) after considering uncertainty. 5 Branch - Trunk 2nd sig 3rd sig Coefficient -0.786 0.984 SE 0.152 0.587 P value <0.001 0.094