COALESCENCE AND GENE GENEALOGIES

Slides:



Advertisements
Similar presentations
Amorphophallus titanum Largest unbranched inflorescence in the world Monecious and protogynous Carrion flower (fly/beetle pollinated) Indigenous to the.
Advertisements

Phylogenetic Trees Systematics, the scientific study of the diversity of organisms, reveals the evolutionary relationships between organisms. Taxonomy,
Lecture 23: Introduction to Coalescence April 7, 2014.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
14 Molecular Evolution and Population Genetics
Islands in Africa: a study of structure in the source population for modern humans Rosalind Harding Depts of Statistics, Zoology & Anthropology, Oxford.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Molecular phylogenetics
The Evolutionary History of Biodiversity
 Read Chapter 4.  All living organisms are related to each other having descended from common ancestors.  Understanding the evolutionary relationships.
Coalescent Models for Genetic Demography
Evolution within a species Aims: Must be able to state the observations and subsequent deductions that Darwin and Wallace based their theories on. Should.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Lecture 17: Phylogenetics and Phylogeography
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Amorphophallus titanum
Phylogeny.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
LECTURE 9. Genetic drift In population genetics, genetic drift (or more precisely allelic drift) is the evolutionary process of change in the allele frequencies.
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are.
Section 2: Modern Systematics
TOPIC 7- EVIDENCE FOR THE THEORY OF EVOLUTION
Evolutionary genomics can now be applied beyond ‘model’ organisms
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Population Genetics And Speciation.
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Gil McVean Department of Statistics
Population Genetics Chapter 4.
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers.
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
17.2 Classification based on evolutionary relationships
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Pipelines for Computational Analysis (Bioinformatics)
Thursday, October Writing assignment: (Darwinism.
Section 2: Modern Systematics
Evolution as Genetic Change
PROCESS OF EVOLUTION.
The Making of the Fittest Evidence of Evolution youtube
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Evolution and Natural Selection
BIOLOGY NOTES EVOLUTION PART 2 PAGES
Statistical Modeling of Ancestral Processes
Testing the Neutral Mutation Hypothesis
Mechanisms of Evolution
In your own words, explain the significance of the diagram shown.
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
The coalescent with recombination (Chapter 5, Part 1)
There is a Great Diversity of Organisms
Trees & Topologies Chapter 3, Part 2
Trees & Topologies Chapter 3, Part 2
Change over a period of time.
BIOLOGY NOTES EVOLUTION PART 2 PAGES
5.4 Cladistics Essential idea: The ancestry of groups of species can be deduced by comparing their base or amino acid sequences. The images above are both.
Unit Genomic sequencing
BIOLOGY NOTES EVOLUTION PART 2 PAGES
9.6 Evolution as Genetic Change in Populations
Interpreting Cladograms Notes
Chapter 18: Evolution and Origin of Species
First, let’s talk about the word THEORY…
Evolution Biology Mrs. Johnson.
Presentation transcript:

COALESCENCE AND GENE GENEALOGIES Population Genetics 201-14 Silvano Presciuttini

Population genetics looks backward Rapid accumulation of DNA sequence data over the past two decades has substantially changed the perspective of much of population genetics the field has seen a shift from a “prospective” view, that of investigating the evolutionary factors involved in change of allele frequencies, to a “retrospective” view, that of inferring evolutionary events that have occurred in the past Understanding the evolutionary causes that have influenced the DNA sequence variation in a sample of individuals, such as the demographic and mutational history of the ancestors of the sample, has become the focus of much population genetics research Population Genetics 201-14 Silvano Presciuttini

Tracing back gene ancestry Remember that we talked about a paradox when we introduced the distinction between the gene copies that are identical by state and the gene copies that are identical by descent: “The paradox of IBD/IBS distinction is that gene copies identical by state are all, ultimately, also identical by descent, unless for cases of converged evolution of two DNA sequences (homoplasy). Two gene copies that are identical by state may not share a common ancestor if we trace their ancestry only 20 generations, but they may share a common ancestor if we trace their ancestry back 1000 generations and neither may have undergone any mutations since they diverged from one another” We now introduce an apparently counter-intuitive concept: All extant homologous tracts, sampled from any possible kind of admixture of organisms however diverse they can be, are descendant of a single DNA molecule Population Genetics 201-14 Silvano Presciuttini

Coalescence Consider a sample of homologous tracts from a population, which include a number of different haplotypes If one goes back far enough in time, then all haplotypes in the sample will coalesce into a single common ancestral haplotype This ancestor haplotype was a specific DNA segment that existed in a certain individual, and originated all extant haplotypes by duplication, and different haplotypes were created by mutation events occurred along some lineages of descent The individual that carried that ancestor haplotype may have lived very far in the past, even before the speciation events that led to the present population Population Genetics 201-14 Silvano Presciuttini

Genealogy of a single SNP Consider a particular site in the genome of a species. All existing copies of this site must be related to each other and to a most recent common ancestor (MRCA) through some form of genealogical tree Polymorphism at the site is due to mutations that occurred along the branches of this tree, and the frequency of each sequence variant is determined by the fraction of branches that inherits the variant The pattern of polymorphism therefore reflects both the history of the coalescence of lineages, which gives rise to the tree, and the mutational history Polymorphism at a particular site results from mutations (shown here as G→T) along branches of the genealogical tree, which connects sampled copies of the site to their most recent common ancestor (MRCA) Population Genetics 201-14 Silvano Presciuttini

The following slides comes from “Mike Weale's Seminar 4”, which is part of the 'Introduction to doing genetic history' seminar series of The Center for Genetic Anthropology at the University College London www.ucl.ac.uk/tcga/presentations/TCGAugss/TCGA_MW_Seminar4.ppt

The following series of slides shows how you can build up a genealogical tree to relate a sample of 22 individuals, collected in the present day, at a single locus (e.g. the non-recombining Y chromosome) Because (for the Y chromosome) one son has only one father, but one father can have more than one son, coalescent events occur in the genealogy which inevitably result in a reduction of ancestors. Eventually, one ancestor remains – the Most Recent Common Ancestor (MRCA).

Present 22 individuals Time

Present 22 individuals 18 ancestors Time

Present 22 individuals 18 ancestors 16 ancestors Time

Present 22 individuals 18 ancestors 16 ancestors 14 ancestors Time

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time 22 individuals 18 ancestors 16 ancestors 14 ancestors

Present Time

Present Time Most recent common ancestor (MRCA)

Mutational events can now be added to the genealogical tree, resulting in polymorphic sites. If these sites are typed in the modern sample, they can be used to split the sample into sub- clades (represented by different colours)

TCGAGGTATTAAC TCTAGGTATTAAC Present Time mutation Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC Present Time mutation Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC Present Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC Present Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC Present Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC Present Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC Time Most recent common ancestor (MRCA)

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * * Time Most recent common ancestor (MRCA)

If we now imagine that our sample of 22 was in fact sampled from different subpopulations, then allele frequencies will vary between subpopulations, with the degree of difference reflecting the time at which the population split took place. Of course, population splitting is only one model we can think of. Another would be a migration model, where lineages occasional swap from one subpopulation to another. Note that population splits do not always have to involve just one sub-clade of the tree – in the figure, this is just a limitation of drawing the splitting in 2 dimensions only.

TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC Population 1 Population 2 Population 3 Present TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * * Time Most recent common ancestor (MRCA)

Gene trees are not the same as species trees The basic phylogenetic model relates species to each other through a bifurcating tree. This ‘species tree’ is estimated as the estimated genealogy of genes sampled from the different species. How can this approach be valid if, as we argue, each gene tree is the random outcome of a historical process? Different genes should give rise to different trees; in fact, even a single gene could have many trees as a result of intragenic recombination. Population Genetics 201-14 Silvano Presciuttini

Disagreement about topology In this example, species tree and gene tree disagree about topology Before the first species split, a locus is polymorphic with three alleles; because of the founder effect, the blue allele goes in one species and the red and green alleles in the other; at the second species split, the red and green allele separate The result is that the gene genealogy makes species b and c more closely related to each other than to a to b, even though a and b were the last species to separate Population Genetics 201-14 Silvano Presciuttini

Most mutations antedate speciation In this example, the gene has undergone two mutations in the ancestral species, the first mutation giving rise to the ‘blue’ allele and the second to the ‘green’ allele. Random genetic drift in association with the two subsequent speciations results in the red allele lineage appearing in species A, the green allele lineage in species B and the blue allele lineage in species C Molecular phylogenetics based on the gene sequences will reveal that the red-blue split occurred before the blue-green split, giving the gene tree shown on the right. However, the actual species tree is different Population Genetics 201-14 Silvano Presciuttini

Disagreement about bifurcation time In this example, species tree and gene tree disagree about bifurcation time Before the first species split, a locus is polymorphic with two alleles, and they separate at the moment of splitting; during the lifetime of the species ancestor to a and b, a third allele originates by mutation, and the locus remain polymorphic for a long time; then, the last two alleles separate at the second species split The result is that the topology of gene tree and species tree is the same, but they disagree about the time of splitting Population Genetics 201-14 Silvano Presciuttini

Probability of incongruence in the simplest case Consider a 3-species tree, and assume that the true speciation history is (A(BC)) Then, three possible gene genealogies are possible, one correct (a), and two wrong (b and c) In this case, the probability of incongruence between gene trees and species trees is given by Pincongruence = 2/3 e– t /(2N) , where t is time measured in number of generations and N is the population size As the ratio t/N increases (either t increases for costant N, or N decreases for constant t), the chance of coalescence in the interval between speciation events increases, and the probability of observing an incongruent gene tree decreases. For small values of t/N incongruence is more probable c b Population Genetics 201-14 Silvano Presciuttini

An example taken from the human lineage Chen and Li (Am J Hum Genet 2001: 68, 444) obtained gene trees for 53 randomly chosen, non-coding regions in human, gorilla and chimpanzee When the 53 autosomal segments were considered together, the neighbor- joining tree supported the Homo-Pan clade with a 100% bootstrap value When each segment was considered individually, 31 segments supported the Homo-Pan clade, 10 supported the Homo-Gorilla clade, and 12 supported the Pan-Gorilla clade (this is still a very high support of the Homo-Pan clade), or about 42% incongruent gene tree Using the previous equation, Chen and Li proposed a value for the size of the population in the time interval between the first species split (gorilla vs Homo-Pan ancestor) and the second split (Homo-Pan), and it resulted in the order of hundreds of thousands Population Genetics 201-14 Silvano Presciuttini

The coalescent model The basic idea underlying the coalescent is that, in the absence of selection, sampled lineages can be viewed in the backward time direction, like as offspring randomly ‘pick’ their parents. Whenever two lineages pick the same parent, their lineages coalesce Eventually, all lineages coalesce into a single lineage, the MRCA of the sample The rate at which lineages coalesce depends on how many lineages are picking their parents (the more lineages, the faster the rate) and on the size of the population (the more parents to choose from, the slower the rate) The coalescent is a probabilistic description of a sample genealogy that takes account of the influence of population size, migration rates, population growth and decline, and recombination rates on the distributions of times to the most recent common ancestors. Population Genetics 201-14 Silvano Presciuttini

Phylogenetics and the coalescent Phylogenetic methods estimate trees A single sequence from each species of interest is usually analyzed, and the genealogy of the sequences is estimated. The estimated gene tree is then used to draw conclusions about relationships between species This approach works for distantly related taxa; it might not make sense to try to estimate a population tree — the relevant model might involve migration between populations, population history might not be tree-like, and the rates of migration might be of primary interest Genealogical methods do not estimate trees Instead, they are used to estimate parameters of the random genealogical process that has given rise to each tree. The genealogical approach has none of the limitations of the phylogenetic methods and provides a coherent statistical framework in which to consider recombination, migration, selection and other processes. Population Genetics 201-14 Silvano Presciuttini