TEMPLATE DESIGN © Distribution of Passenger Mutations in Exponentially Growing Wave 0 Cancer Population Yifei Chen 1 ; Mentor: Rick Durrett 2 1 Duke University ’13; 2 Department of Mathematics, Duke University Abstract Results for Site Frequency Spectrum of Population Summary References 1.We studied an exponentially growing population of type 0 cancer cells with mutations following the infinite sites model and derived equations for the site frequency spectrum for the population and for a sample of size n. 2.We used Matlab to produce simulations to test the theoretical predictions about the site frequency spectrum. 3.The fit of the mathematical model to the simulated data with the mathematical models, is generally very good except for small x in the population frequency and for m= 1 in the site spectrum of the sample. Introduction Cancer cells accumulate mutations that confer selective advantages in the microenvironment of the body, so they out compete the normal cells and grow uncontrolled, invading neighboring tissues and metastasizing to distant locations. Somatic mutations that confer cancer cells increased fitness are called driver mutations while those that do not affect fitness are called passenger mutations. Identification of driver mutations in order to find treatments is an important goal of cancer research. Current investigations involve sequencing and analyzing mutations in a sample of examples of a particular cancer. However, since many genes are sequenced it is difficult to tell whether a mutation is a driver or a passenger. A better understanding the frequency distribution of passenger mutations will help to distinguish between the two types. In investigating the frequencies of neutral mutations of the branching process in context of the infinite sites model, it is only necessary to look at those that occur in Y 0 ( t ) In the Yule process, mutations and branching of lineages are both exponential and that the number of mutations when there are a certain number of lineages is given by a shifted geometric distribution with p=γ/(γ+ν) where ν is the passenger mutation rate. Also in a Yule process, limiting fraction of the population descended from an individual is given by a beta distribution. Using these properties, the site frequency spectrum for the population, F ( x) and for a sample of size n, Eη n,m is derived to be: Results for Site Frequency Spectrum of Sample Size n The math model predicts many mutations occur in very small portion of the population with few occurring in more than 20% of the population. This is expected because genealogies in exponentially growing population tend to be star-shaped so for a mutation to be present in a large proportion of the population, it would have to have occurred very early on. To simulate the genealogy of a population of exponentially growing population of cancer cells, we used Matlab’s built in random number generator based on the uniform (0, 1) distribution to create distinct lineages since the fraction of individuals descended from one half of a branching lineages is uniformly distributed. The level of the branching is determined by the number of breakpoints, k, which creates k +1 distinct lineages and the interval between breakpoints indicate the proportion of the population that is in that specific lineage. At each level, the number of mutations is given by a shifted geometric distribution with p=γ/ ( γ+ν ), so to simulate the accumulation of passenger mutations in the population, we used Matlab’s built in geometric random number generator. The simulation records where each mutation occurred on the genealogy and a histogram is generated to show how many mutations occurred in greater than x of the population. The simulation was run for 100 times with γ =0.01, ν= 0.01, down to level 1000 and the number of mutations that occurred in each interval was averaged and graphed along with the equation for the expected result F ( x ) = 0.01/(0.01*x). Durrett, R. Population genetics of neutral mutations in exponentially growing cancer cell populations. In preparation F IGURE 1. Site frequency spectrum results for the population. The count at x includes the count of all those greater than x. We examined the occurrence of neutral or “passenger” mutations in exponentially growing population of type 0 cancer cells. Assuming mutations follow the infinite sites model, we derived the equations for the site frequency spectrum for the population and for a sample of size n, which depend on ν, the rate for passenger mutations per cell division, γ, the rate a family that do not die arises, x, the fraction of the population and m, number of individuals in the sample with the mutation. To test these equations, we used Matlab to simulate the population growth and mutations. Afterwards, a sample of individuals was randomly taken from the population and the site frequency spectrum was computed for the sample and the population as a whole. The simulated data fit well with the derived equations with the exception of small x in the population frequency and the m=1 case of the site frequency spectrum of the sample. This observation prompted the derivation of a more accurate formula for that case (not shown here). F ( x ) is the expected number of mutations present in more than a fraction x of the population Eη n,m is the expected number of sites in a sample of size n with m mutants Mathematical Model: Site Frequency Spectrum for Type 0 Cells The cancer cells are modeled by a branching process Z o (t) in which individuals give birth at rate a 0 and die at rate b 0 <a 0. When Z o (t) is conditioned to not die out and Y 0 (t) is set to be the number of individuals whose families do not die out, then Y 0 (t) is a Yule process in which births occur at rate γ=λ 0 / a 0 where λ 0 = a 0 - b 0. The simulation begins with the same steps for the population frequency. Then, Matlab samples n individuals from the population and counts how many, m, of those individuals have each mutation. The simulation was run for 100 times with γ =0.01, ν= 0.01, to level 1000 and with a sample of size 10. The number of sites with m individuals in the sample having the mutation was averaged and plotted along with the expected value at each m based on the equation: From figure 2, it is evident that for cases of m >1, the fit between theory and simulated result is more or less perfect. However, for m =1, the formula Eη n,m predicts many more singletons than are observed indicated. After this work was completed, a improved formula has been derived which predicts 36.6 for m=1. As seen in figure 1, the formula fits the simulated data very well except for small values of x. F IGURE 2. Site frequency spectrum results for a sample of size 10 ν is the passenger mutation rate γ is the rate of branching/birth of a new not dying out family Nγ is the size of the population of Yule process