Download presentation
Presentation is loading. Please wait.
1
COALESCENCE AND GENE GENEALOGIES
Population Genetics Silvano Presciuttini
2
Population genetics looks backward
Rapid accumulation of DNA sequence data over the past two decades has substantially changed the perspective of much of population genetics the field has seen a shift from a “prospective” view, that of investigating the evolutionary factors involved in change of allele frequencies, to a “retrospective” view, that of inferring evolutionary events that have occurred in the past Understanding the evolutionary causes that have influenced the DNA sequence variation in a sample of individuals, such as the demographic and mutational history of the ancestors of the sample, has become the focus of much population genetics research Population Genetics Silvano Presciuttini
3
Tracing back gene ancestry
Remember that we talked about a paradox when we introduced the distinction between the gene copies that are identical by state and the gene copies that are identical by descent: “The paradox of IBD/IBS distinction is that gene copies identical by state are all, ultimately, also identical by descent, unless for cases of converged evolution of two DNA sequences (homoplasy). Two gene copies that are identical by state may not share a common ancestor if we trace their ancestry only 20 generations, but they may share a common ancestor if we trace their ancestry back 1000 generations and neither may have undergone any mutations since they diverged from one another” We now introduce an apparently counter-intuitive concept: All extant homologous tracts, sampled from any possible kind of admixture of organisms however diverse they can be, are descendant of a single DNA molecule Population Genetics Silvano Presciuttini
4
Coalescence Consider a sample of homologous tracts from a population, which include a number of different haplotypes If one goes back far enough in time, then all haplotypes in the sample will coalesce into a single common ancestral haplotype This ancestor haplotype was a specific DNA segment that existed in a certain individual, and originated all extant haplotypes by duplication, and different haplotypes were created by mutation events occurred along some lineages of descent The individual that carried that ancestor haplotype may have lived very far in the past, even before the speciation events that led to the present population Population Genetics Silvano Presciuttini
5
Genealogy of a single SNP
Consider a particular site in the genome of a species. All existing copies of this site must be related to each other and to a most recent common ancestor (MRCA) through some form of genealogical tree Polymorphism at the site is due to mutations that occurred along the branches of this tree, and the frequency of each sequence variant is determined by the fraction of branches that inherits the variant The pattern of polymorphism therefore reflects both the history of the coalescence of lineages, which gives rise to the tree, and the mutational history Polymorphism at a particular site results from mutations (shown here as G→T) along branches of the genealogical tree, which connects sampled copies of the site to their most recent common ancestor (MRCA) Population Genetics Silvano Presciuttini
6
The following slides comes from “Mike Weale's Seminar 4”, which is part of the 'Introduction to doing genetic history' seminar series of The Center for Genetic Anthropology at the University College London
7
The following series of slides shows how you can build up a genealogical tree to relate a sample of 22 individuals, collected in the present day, at a single locus (e.g. the non-recombining Y chromosome) Because (for the Y chromosome) one son has only one father, but one father can have more than one son, coalescent events occur in the genealogy which inevitably result in a reduction of ancestors. Eventually, one ancestor remains – the Most Recent Common Ancestor (MRCA).
8
Present 22 individuals Time
9
Present 22 individuals 18 ancestors Time
10
Present 22 individuals 18 ancestors 16 ancestors Time
11
Present 22 individuals 18 ancestors 16 ancestors 14 ancestors Time
12
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
13
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
14
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
15
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
16
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
17
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
18
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
19
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
20
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
21
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
22
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
23
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
24
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
25
Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors
26
Present Time
27
Present Time Most recent common ancestor (MRCA)
28
Mutational events can now be added to the genealogical tree, resulting in polymorphic sites. If these sites are typed in the modern sample, they can be used to split the sample into sub- clades (represented by different colours)
29
TCGAGGTATTAAC TCTAGGTATTAAC Present Time mutation
Most recent common ancestor (MRCA)
30
TCGAGGTATTAAC TCTAGGTATTAAC Present Time mutation
Most recent common ancestor (MRCA)
31
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC Present Time
Most recent common ancestor (MRCA)
32
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC Present Time
Most recent common ancestor (MRCA)
33
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC Present Time
Most recent common ancestor (MRCA)
34
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC Present Time
Most recent common ancestor (MRCA)
35
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC
Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Time Most recent common ancestor (MRCA)
36
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC
Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Time Most recent common ancestor (MRCA)
37
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC
Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC Time Most recent common ancestor (MRCA)
38
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC
Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC Time Most recent common ancestor (MRCA)
39
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC
Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * * Time Most recent common ancestor (MRCA)
40
If we now imagine that our sample of 22 was in fact sampled from different subpopulations, then allele frequencies will vary between subpopulations, with the degree of difference reflecting the time at which the population split took place. Of course, population splitting is only one model we can think of. Another would be a migration model, where lineages occasional swap from one subpopulation to another. Note that population splits do not always have to involve just one sub-clade of the tree – in the figure, this is just a limitation of drawing the splitting in 2 dimensions only.
41
TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC
Population 1 Population 2 Population 3 Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * * Time Most recent common ancestor (MRCA)
42
Gene trees are not the same as species trees
The basic phylogenetic model relates species to each other through a bifurcating tree. This ‘species tree’ is estimated as the estimated genealogy of genes sampled from the different species. How can this approach be valid if, as we argue, each gene tree is the random outcome of a historical process? Different genes should give rise to different trees; in fact, even a single gene could have many trees as a result of intragenic recombination. Population Genetics Silvano Presciuttini
43
Disagreement about topology
In this example, species tree and gene tree disagree about topology Before the first species split, a locus is polymorphic with three alleles; because of the founder effect, the blue allele goes in one species and the red and green alleles in the other; at the second species split, the red and green allele separate The result is that the gene genealogy makes species b and c more closely related to each other than to a to b, even though a and b were the last species to separate Population Genetics Silvano Presciuttini
44
Most mutations antedate speciation
In this example, the gene has undergone two mutations in the ancestral species, the first mutation giving rise to the ‘blue’ allele and the second to the ‘green’ allele. Random genetic drift in association with the two subsequent speciations results in the red allele lineage appearing in species A, the green allele lineage in species B and the blue allele lineage in species C Molecular phylogenetics based on the gene sequences will reveal that the red-blue split occurred before the blue-green split, giving the gene tree shown on the right. However, the actual species tree is different Population Genetics Silvano Presciuttini
45
Disagreement about bifurcation time
In this example, species tree and gene tree disagree about bifurcation time Before the first species split, a locus is polymorphic with two alleles, and they separate at the moment of splitting; during the lifetime of the species ancestor to a and b, a third allele originates by mutation, and the locus remain polymorphic for a long time; then, the last two alleles separate at the second species split The result is that the topology of gene tree and species tree is the same, but they disagree about the time of splitting Population Genetics Silvano Presciuttini
46
Probability of incongruence in the simplest case
Consider a 3-species tree, and assume that the true speciation history is (A(BC)) Then, three possible gene genealogies are possible, one correct (a), and two wrong (b and c) In this case, the probability of incongruence between gene trees and species trees is given by Pincongruence = 2/3 e– t /(2N) , where t is time measured in number of generations and N is the population size As the ratio t/N increases (either t increases for costant N, or N decreases for constant t), the chance of coalescence in the interval between speciation events increases, and the probability of observing an incongruent gene tree decreases. For small values of t/N incongruence is more probable c b Population Genetics Silvano Presciuttini
47
An example taken from the human lineage
Chen and Li (Am J Hum Genet 2001: 68, 444) obtained gene trees for 53 randomly chosen, non-coding regions in human, gorilla and chimpanzee When the 53 autosomal segments were considered together, the neighbor- joining tree supported the Homo-Pan clade with a 100% bootstrap value When each segment was considered individually, 31 segments supported the Homo-Pan clade, 10 supported the Homo-Gorilla clade, and 12 supported the Pan-Gorilla clade (this is still a very high support of the Homo-Pan clade), or about 42% incongruent gene tree Using the previous equation, Chen and Li proposed a value for the size of the population in the time interval between the first species split (gorilla vs Homo-Pan ancestor) and the second split (Homo-Pan), and it resulted in the order of hundreds of thousands Population Genetics Silvano Presciuttini
48
The coalescent model The basic idea underlying the coalescent is that, in the absence of selection, sampled lineages can be viewed in the backward time direction, like as offspring randomly ‘pick’ their parents. Whenever two lineages pick the same parent, their lineages coalesce Eventually, all lineages coalesce into a single lineage, the MRCA of the sample The rate at which lineages coalesce depends on how many lineages are picking their parents (the more lineages, the faster the rate) and on the size of the population (the more parents to choose from, the slower the rate) The coalescent is a probabilistic description of a sample genealogy that takes account of the influence of population size, migration rates, population growth and decline, and recombination rates on the distributions of times to the most recent common ancestors. Population Genetics Silvano Presciuttini
49
Phylogenetics and the coalescent
Phylogenetic methods estimate trees A single sequence from each species of interest is usually analyzed, and the genealogy of the sequences is estimated. The estimated gene tree is then used to draw conclusions about relationships between species This approach works for distantly related taxa; it might not make sense to try to estimate a population tree — the relevant model might involve migration between populations, population history might not be tree-like, and the rates of migration might be of primary interest Genealogical methods do not estimate trees Instead, they are used to estimate parameters of the random genealogical process that has given rise to each tree. The genealogical approach has none of the limitations of the phylogenetic methods and provides a coherent statistical framework in which to consider recombination, migration, selection and other processes. Population Genetics Silvano Presciuttini
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.