Presentation is loading. Please wait.

Presentation is loading. Please wait.

The coalescent with recombination (Chapter 5, Part 1)

Similar presentations


Presentation on theme: "The coalescent with recombination (Chapter 5, Part 1)"— Presentation transcript:

1 The coalescent with recombination (Chapter 5, Part 1)

2 Six Assumptions of Wright-Fisher Model
Discrete and non-overlapping generations Haploid individuals or two subpopulations The population size is constant All individuals are equally fit The population has no geographical or social structure The genes are not recombining No need to be relaxed Have been relaxed in Chapter 4 To be relaxed soon 2019/1/13 Comp 790-Coalescent with recombination

3 No recombination: the last assumption
The last assumption that needs to be relaxed. Why does it need? Recombination occurs in most of the real data sets. Why is it the last one to be relaxed? More mathematically complex in analysis The sequence samples are no longer related by a tree, but a graph or a collection of trees. 2019/1/13 Comp 790-Coalescent with recombination

4 Comp 790-Coalescent with recombination
Outline What is recombination? An example of recombination Hudson’s model of recombination Wright-Fisher model with recombination ARG Simulation Algorithm 2019/1/13 Comp 790-Coalescent with recombination

5 Comp 790-Coalescent with recombination
What is recombination? Recall the slides in lecture 5. Recombination A process in which new gene combinations are introduced Eg. Crossover, Gene-conversion 2019/1/13 Comp 790-Coalescent with recombination

6 What is the result of recombination?
No recombination Recombination Grandparents Layer Parents Layer Recombination Children Layer 2019/1/13 Comp 790-Coalescent with recombination

7 An example of recombination
The Apolipoprotein E gene 31 different haplotypes (rows) 21 segregating sites (columns) Some pairs of sites cannot be fitted on a single tree. There must be recombination. 2019/1/13 Comp 790-Coalescent with recombination

8 Comp 790-Coalescent with recombination
Pair-wise LD measure LD is a indirect measure of the correlation of genealogical trees for different segregating sites. The higher LD, the more correlated the pair of sites The color denotes the significance There is a weak tendency that highly significant LD is found for close sites. 2019/1/13 Comp 790-Coalescent with recombination

9 LD on different distance
LD is smaller the further apart the sites are. Recombination leads to these pattern. Sites far apart experience more recombination events. 2019/1/13 Comp 790-Coalescent with recombination

10 A summary of the example
We cannot use previous model without recombination to fit these sequences. Recombination is the cause. Recombination can generate incompatibilities between pairs of sites. Segregation sites far apart experience more recombination events, so they become less correlated. 2019/1/13 Comp 790-Coalescent with recombination

11 Hudson’s model of recombination
Forward perspective: Parental chromosome is directly inherited from grandparental chromosomes Choose a random point uniformly Copy the genetic material from Chromosome A to the left of that point Copy the genetic material from Chromosome B to the right of that point. A B Recombination 2019/1/13 Comp 790-Coalescent with recombination

12 Hudson’s model of recombination (cont.)
Reversed: Choose a chromosome from a parent The chromosome splits to two grandparental chromosomes Recombination 2019/1/13 Comp 790-Coalescent with recombination

13 Modeling recombination and coalescence
Recombination events are the opposite of coalescent events. Looking backwards Coalescence is a combining event. Recombination is a splitting event. But how can we model both of these events? Use a similar idea we did before (in adding mutation events to coalescence). Question 1:What is this idea? 2019/1/13 Comp 790-Coalescent with recombination

14 Another exponential distribution
We model the waiting time of recombination events to be an exponential distribution. This distribution is independent of the coalescent process. The parameter (or the intensity of recombination) depends on the recombination rate(ρ) in a sequence, times the number of ancestral lineages. 2019/1/13 Comp 790-Coalescent with recombination

15 From Hudson’s model to Wright-Fisher model
Hudson’s model simplifies recombination process in terms of the biological facts. The mechanisms of recombination are very different and complicated in eukaryotes, bacteria, and viruses. The process is still not very well understood at the molecular level. But still, it forms the basis for most applications of coalescent theory to recombining sequences. Now we modify Wright-Fisher model to include this kind of simplified model of recombination. 2019/1/13 Comp 790-Coalescent with recombination

16 Wright-Fisher model with recombination
Diploid Wright-Fisher Model An individual perspective 2019/1/13 Comp 790-Coalescent with recombination

17 Wright-Fisher model with recombination (cont.)
Haploid Wright-Fisher Model We can ignore the existence of individuals under some conditions. A sequence perspective 2019/1/13 Comp 790-Coalescent with recombination

18 Discrete time formulation
In discrete model, let r be the recombination rate. TR denotes the number of generations until the first recombination event. The probability that a sequence was created by recombination in j generation is TR is geometrically distributed. 2019/1/13 Comp 790-Coalescent with recombination

19 Continuous time approximation
Let the scaled recombination rate ρ=4Nr, similar to θ in mutation. J=2Nt is exponentially distributed. Note that the probability until now is for only one sequence 2019/1/13 Comp 790-Coalescent with recombination

20 Continuous time approximation (cont.)
If there are k sequences, the parameter of the exponential distribution will be kρ/2 Question 2: Why? The waiting times for recombination events of every sequences are exponentially distributed ( i.e. Exp(ρ/2) ) and are independent. The intensity of recombination in any of the k sequences equals the sum of the intensity in each sequence. 2019/1/13 Comp 790-Coalescent with recombination

21 Continuous time approximation (cont.)
Again, both coalescence event or recombination event in k sequences are independent and exponentially distributed. The waiting time of one of these events occurs will be Exp( ) The probability that the first event is a coalescence is The probability that it is a recombination is 2019/1/13 Comp 790-Coalescent with recombination

22 ARG Simulation algorithm
1. Start with k = n genes. 2. For k sequences with ancestral material, draw a random number from the exponential distribution with parameter k(k − 1)/2 + kρ/2. This is the time to the next event. 3. With probability (k − 1)/(k − 1 + ρ) the event is a coalescence event, otherwise it is a recombination event. 4. If it is a coalescence event choose two sequences among ancestral sequences at random and merge them into one sequence inheriting the ancestral material to both of the sequences. Decrease k by one. If k = 1 end the process, otherwise go to 1. 2019/1/13 Comp 790-Coalescent with recombination

23 ARG Simulation algorithm (cont.)
5. If it is recombination, draw a random sequence and a random point on the sequence. Create an ancestor sequence with the ancestral material to the left of the chosen point and a second ancestor with the ancestral material to the right of the recombination point. Increase the number of ancestral sequences k by one and go to 1. Question 3: Where can we find the missing material of the ancestors? Splitting A random point 2019/1/13 Comp 790-Coalescent with recombination

24 Is the single ancestor ever reached?
A coalescence event decreases k by one. A recombination event increases k by one. Question 4: Is there an end for the process? YES! Why? It is a birth-death process. The coalescent intensity is k(k-1)/ [birth rate] The recombination intensity is kρ/ [death rate] k(k-1)/2 >= kρ/2 GMRCA is always found. But it may be a LONG time. 2019/1/13 Comp 790-Coalescent with recombination

25 Genealogical structure: From tree to graph
With recombination, we must use a graph to model the sequence relations rather than a tree. ARG (Ancestral Recombination Graph) The graph resulting from the algorithm 2019/1/13 Comp 790-Coalescent with recombination

26 Genealogical structure: From graph to a collection of trees
However, if we focus on a single point on the sequence, there will be no recombination! Question 5: Why? The point of child sequence is always inherited from only one parent sequence. Local tree The tree relating the sequences in a single position The genealogy graph can be seen as a collection of local trees, one for each position. 2019/1/13 Comp 790-Coalescent with recombination

27 Comp 790-Coalescent with recombination
Next time More on simulation algorithm Effect of a single recombination event Coalescent events with gene conversion 2019/1/13 Comp 790-Coalescent with recombination


Download ppt "The coalescent with recombination (Chapter 5, Part 1)"

Similar presentations


Ads by Google