The coalescent with recombination (Chapter 5, Part 1)

Slides:



Advertisements
Similar presentations
Introduction to Haplotype Estimation Stat/Biostat 550.
Advertisements

An introduction to maximum parsimony and compatibility
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
Recombination and genetic variation – models and inference
Sampling distributions of alleles under models of neutral evolution.
Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a ~ p, of success 5/20/2015Comp 790– Continuous-Time.
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Effective Population Size Real populations don’t satisfy the Wright-Fisher model. In particular, real populations exhibit reproductive structure, either.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Probability Models Chapter 17.
Extensions to Basic Coalescent Chapter 4, Part 1.
Extensions to Basic Coalescent Chapter 4, Part 2.
Lecture 3: population genetics I: mutation and recombination
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Simon Myers, Gil McVean Department of Statistics, Oxford Recombination and genetic variation – models and inference.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
The Standard Genetic Algorithm Start with a “population” of “individuals” Rank these individuals according to their “fitness” Select pairs of individuals.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Genetic Algorithms. Underlying Concept  Charles Darwin outlined the principle of natural selection.  Natural Selection is the process by which evolution.
Selection and Recombination Temi avanzati di Intelligenza Artificiale - Lecture 4 Prof. Vincenzo Cutello Department of Mathematics and Computer Science.
Modelling evolution Gil McVean Department of Statistics TC A G.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Haploid-Diploid Evolutionary Algorithms
Chapter 14 Genetic Algorithms.
Genetic Algorithms.
Genetic Algorithms.
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Montgomery Slatkin  The American Journal of Human Genetics 
COALESCENCE AND GENE GENEALOGIES
Example: Applying EC to the TSP Problem
L4: Counting Recombination events
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
1 Department of Engineering, 2 Department of Mathematics,
Example: Applying EC to the TSP Problem
Estimating Recombination Rates
1 Department of Engineering, 2 Department of Mathematics,
Introduction to Operators
Example: Applying EC to the TSP Problem
1 Department of Engineering, 2 Department of Mathematics,
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Genetic Algorithms Chapter 3.
Montgomery Slatkin  The American Journal of Human Genetics 
Trees & Topologies Chapter 3, Part 2
Trees & Topologies Chapter 3, Part 2
Boltzmann Machine (BM) (§6.4)
Eigen’s paradox and other stories Matt Roberts, University of Bath
Presentation transcript:

The coalescent with recombination (Chapter 5, Part 1)

Six Assumptions of Wright-Fisher Model Discrete and non-overlapping generations Haploid individuals or two subpopulations The population size is constant All individuals are equally fit The population has no geographical or social structure The genes are not recombining No need to be relaxed Have been relaxed in Chapter 4 To be relaxed soon 2019/1/13 Comp 790-Coalescent with recombination

No recombination: the last assumption The last assumption that needs to be relaxed. Why does it need? Recombination occurs in most of the real data sets. Why is it the last one to be relaxed? More mathematically complex in analysis The sequence samples are no longer related by a tree, but a graph or a collection of trees. 2019/1/13 Comp 790-Coalescent with recombination

Comp 790-Coalescent with recombination Outline What is recombination? An example of recombination Hudson’s model of recombination Wright-Fisher model with recombination ARG Simulation Algorithm 2019/1/13 Comp 790-Coalescent with recombination

Comp 790-Coalescent with recombination What is recombination? Recall the slides in lecture 5. Recombination A process in which new gene combinations are introduced Eg. Crossover, Gene-conversion 2019/1/13 Comp 790-Coalescent with recombination

What is the result of recombination? No recombination Recombination Grandparents Layer Parents Layer Recombination Children Layer 2019/1/13 Comp 790-Coalescent with recombination

An example of recombination The Apolipoprotein E gene 31 different haplotypes (rows) 21 segregating sites (columns) Some pairs of sites cannot be fitted on a single tree. There must be recombination. 2019/1/13 Comp 790-Coalescent with recombination

Comp 790-Coalescent with recombination Pair-wise LD measure LD is a indirect measure of the correlation of genealogical trees for different segregating sites. The higher LD, the more correlated the pair of sites The color denotes the significance There is a weak tendency that highly significant LD is found for close sites. 2019/1/13 Comp 790-Coalescent with recombination

LD on different distance LD is smaller the further apart the sites are. Recombination leads to these pattern. Sites far apart experience more recombination events. 2019/1/13 Comp 790-Coalescent with recombination

A summary of the example We cannot use previous model without recombination to fit these sequences. Recombination is the cause. Recombination can generate incompatibilities between pairs of sites. Segregation sites far apart experience more recombination events, so they become less correlated. 2019/1/13 Comp 790-Coalescent with recombination

Hudson’s model of recombination Forward perspective: Parental chromosome is directly inherited from grandparental chromosomes Choose a random point uniformly Copy the genetic material from Chromosome A to the left of that point Copy the genetic material from Chromosome B to the right of that point. A B Recombination 2019/1/13 Comp 790-Coalescent with recombination

Hudson’s model of recombination (cont.) Reversed: Choose a chromosome from a parent The chromosome splits to two grandparental chromosomes Recombination 2019/1/13 Comp 790-Coalescent with recombination

Modeling recombination and coalescence Recombination events are the opposite of coalescent events. Looking backwards Coalescence is a combining event. Recombination is a splitting event. But how can we model both of these events? Use a similar idea we did before (in adding mutation events to coalescence). Question 1:What is this idea? 2019/1/13 Comp 790-Coalescent with recombination

Another exponential distribution We model the waiting time of recombination events to be an exponential distribution. This distribution is independent of the coalescent process. The parameter (or the intensity of recombination) depends on the recombination rate(ρ) in a sequence, times the number of ancestral lineages. 2019/1/13 Comp 790-Coalescent with recombination

From Hudson’s model to Wright-Fisher model Hudson’s model simplifies recombination process in terms of the biological facts. The mechanisms of recombination are very different and complicated in eukaryotes, bacteria, and viruses. The process is still not very well understood at the molecular level. But still, it forms the basis for most applications of coalescent theory to recombining sequences. Now we modify Wright-Fisher model to include this kind of simplified model of recombination. 2019/1/13 Comp 790-Coalescent with recombination

Wright-Fisher model with recombination Diploid Wright-Fisher Model An individual perspective 2019/1/13 Comp 790-Coalescent with recombination

Wright-Fisher model with recombination (cont.) Haploid Wright-Fisher Model We can ignore the existence of individuals under some conditions. A sequence perspective 2019/1/13 Comp 790-Coalescent with recombination

Discrete time formulation In discrete model, let r be the recombination rate. TR denotes the number of generations until the first recombination event. The probability that a sequence was created by recombination in j generation is TR is geometrically distributed. 2019/1/13 Comp 790-Coalescent with recombination

Continuous time approximation Let the scaled recombination rate ρ=4Nr, similar to θ in mutation. J=2Nt is exponentially distributed. Note that the probability until now is for only one sequence 2019/1/13 Comp 790-Coalescent with recombination

Continuous time approximation (cont.) If there are k sequences, the parameter of the exponential distribution will be kρ/2 Question 2: Why? The waiting times for recombination events of every sequences are exponentially distributed ( i.e. Exp(ρ/2) ) and are independent. The intensity of recombination in any of the k sequences equals the sum of the intensity in each sequence. 2019/1/13 Comp 790-Coalescent with recombination

Continuous time approximation (cont.) Again, both coalescence event or recombination event in k sequences are independent and exponentially distributed. The waiting time of one of these events occurs will be Exp( ) The probability that the first event is a coalescence is The probability that it is a recombination is 2019/1/13 Comp 790-Coalescent with recombination

ARG Simulation algorithm 1. Start with k = n genes. 2. For k sequences with ancestral material, draw a random number from the exponential distribution with parameter k(k − 1)/2 + kρ/2. This is the time to the next event. 3. With probability (k − 1)/(k − 1 + ρ) the event is a coalescence event, otherwise it is a recombination event. 4. If it is a coalescence event choose two sequences among ancestral sequences at random and merge them into one sequence inheriting the ancestral material to both of the sequences. Decrease k by one. If k = 1 end the process, otherwise go to 1. 2019/1/13 Comp 790-Coalescent with recombination

ARG Simulation algorithm (cont.) 5. If it is recombination, draw a random sequence and a random point on the sequence. Create an ancestor sequence with the ancestral material to the left of the chosen point and a second ancestor with the ancestral material to the right of the recombination point. Increase the number of ancestral sequences k by one and go to 1. Question 3: Where can we find the missing material of the ancestors? Splitting A random point 2019/1/13 Comp 790-Coalescent with recombination

Is the single ancestor ever reached? A coalescence event decreases k by one. A recombination event increases k by one. Question 4: Is there an end for the process? YES! Why? It is a birth-death process. The coalescent intensity is k(k-1)/2 [birth rate] The recombination intensity is kρ/2 [death rate] k(k-1)/2 >= kρ/2 GMRCA is always found. But it may be a LONG time. 2019/1/13 Comp 790-Coalescent with recombination

Genealogical structure: From tree to graph With recombination, we must use a graph to model the sequence relations rather than a tree. ARG (Ancestral Recombination Graph) The graph resulting from the algorithm 2019/1/13 Comp 790-Coalescent with recombination

Genealogical structure: From graph to a collection of trees However, if we focus on a single point on the sequence, there will be no recombination! Question 5: Why? The point of child sequence is always inherited from only one parent sequence. Local tree The tree relating the sequences in a single position The genealogy graph can be seen as a collection of local trees, one for each position. 2019/1/13 Comp 790-Coalescent with recombination

Comp 790-Coalescent with recombination Next time More on simulation algorithm Effect of a single recombination event Coalescent events with gene conversion 2019/1/13 Comp 790-Coalescent with recombination