Coalescent Models for Genetic Demography

Slides:



Advertisements
Similar presentations
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Advertisements

Gene tree analyses of Aboriginal Australians Rosalind Harding University of Oxford.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
Recombination and genetic variation – models and inference
Amorphophallus titanum Largest unbranched inflorescence in the world Monecious and protogynous Carrion flower (fly/beetle pollinated) Indigenous to the.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Molecular Evolution. Morphology You can classify the evolutionary relationships between species by examining their features Much of the Tree of Life was.
Sampling distributions of alleles under models of neutral evolution.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Lecture 23: Introduction to Coalescence April 7, 2014.
Population Genetics I. Evolution: process of change in allele
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
14 Molecular Evolution and Population Genetics
Islands in Africa: a study of structure in the source population for modern humans Rosalind Harding Depts of Statistics, Zoology & Anthropology, Oxford.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
2: Population genetics break.
Genetic Drift In small, reproductively isolated populations, special circumstances exist that can produce rapid changes in gene frequencies totally independent.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Population Genetics What is population genetics?
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
Maria Eugenia D’Amato Slide 1
Population Genetics Learning Objectives
Molecular phylogenetics
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
1 Random Genetic Drift : Chance as an Evolutionary Force Random Genetic Drift is the random change in allele frequencies from one generation to the next.
MIGRATION  Movement of individuals from one subpopulation to another followed by random mating.  Movement of gametes from one subpopulation to another.
14 Population Genetics and Evolution. Population Genetics Population genetics involves the application of genetic principles to entire populations of.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Course outline HWE: What happens when Hardy- Weinberg assumptions are met Inheritance: Multiple alleles in a population; Transmission of alleles in a family.
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating IV. Genetic Drift A. Sampling Error.
Genomic diversity and differentiation heading toward exam 3.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
Remainder of Chapter 23 Read the remaining materials; they address information specific to understanding evolution (e.g., variation and nature of changes)
Lecture 17: Phylogenetics and Phylogeography
Selectionist view: allele substitution and polymorphism
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Evidence that supports the theory of Evolution Fossil Records Geographic Distribution Ebryology Homologous Structures Vestigial Structures Biochemistry.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Amorphophallus titanum
The plant of the day Pinus longaevaPinus aristata.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Evolution of Populations
8 and 11 April, 2005 Chapter 17 Population Genetics Genes in natural populations.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
LECTURE 9. Genetic drift In population genetics, genetic drift (or more precisely allelic drift) is the evolutionary process of change in the allele frequencies.
Robert Page Doctoral Student in Dr. Voss’ Lab Population Genetics.
Lecture 6 Genetic drift & Mutation Sonja Kujala
Evolution and Population Genetics
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
Polymorphism Polymorphism: when two or more alleles at a locus exist in a population at the same time. Nucleotide diversity: P = xixjpij considers.
Why study population genetic structure?
COALESCENCE AND GENE GENEALOGIES
Deviations from HWE I. Mutation II. Migration III. Non-Random Mating
Testing the Neutral Mutation Hypothesis
Summary and Recommendations
MIGRATION Movement of individuals from one subpopulation to another followed by random mating. Movement of gametes from one subpopulation to another followed.
The coalescent with recombination (Chapter 5, Part 1)
David H. Spencer, Kerry L. Bubb, Maynard V. Olson 
Summary and Recommendations
Presentation transcript:

Coalescent Models for Genetic Demography What can the Coalescent do for you? Rosalind Harding University of Oxford

Who was MtEve? the most recent common ancestor (mcra) to whom all mtDNA haplotype diversity, currently sampled, can be traced.

One possibility: First a bottleneck, then multiple lineages are established during expansion phases MtEve

But if there wasn’t a bottleneck? Then our predecessors collecting data 20,000 years ago, could have identified a different mtEve, an Eve from an earlier generation; in 20,000 years time, a new generation will be likely to find their mtEve to be a grandn-daughter of our mtEve. While our mtEve may be special to us, for archaeogeneticists of past and future generations she will have no particular significance!

Insights from coalescent models Eve? Time Eve? Eve? present

What is the coalescent? a simple model which generates a probability distribution for gene genealogies sampled from a population.

Further definitions simple models: abstractions from complex demographic reality, which preserve key features population: all individuals within a generation with the potential to contribute to the gene pool (including individuals who are reproductively successful as well as those who are not.) gene genealogies: lineages of transmission of copies of a gene from parents to offspring coalescence: where two transmission lineages find a common ancestor, looking backwards in time probability distribution: a set of probabilities for many possible alternative gene genealogies compatible with the model

Models and data Interpreting genetic polymorphism data consider a sample of genes from a contemporary population, with their allelic frequencies and sequence identities determined – these data do not reveal our genetic past directly, they must be interpreted. Options for model choice evolution as phylogeny, phylo-geography evolution as a balance of mutation and genetic drift in a population with a specified demography (population size, mating pattern, offspring distribution)

Characteristics of polymorphism data For a small proportion of sites in human DNA, a second allele is present in populations due to a relatively recent mutation; this is polymorphism. Polymorphism constitutes a transient phase in evolution, intermediate between the occurrence of a mutation and the fixation of either allele at 100%. MtDNA trees may distort frequencies of polymorphisms. They show sets of mutation events as a proxy for fixed differences; it is the new allele that is assumed to fix (attain 100%). These potential sources of error for time scale estimates may be minor but could be substantial.

Ingman and Gyllensten, 2003 Genome Research 13:1600-1606 Neighbor-joining phylogram of 101 mtDNA coding regions sequences. Is phylogenetic branching the right model? Note variable branch lengths and endpoints; yet all individuals sampled in the present!

A phylogenetic model with added genealogical detail and molecular clock

Trajectories for neutral alleles

Understanding genetic drift as genealogy Ne=10, constant over time Understanding genetic drift as genealogy Two of the gene copies in gen. t are inherited by all of the offspring copies in generation t+x. This is the process of drift that leads eventually to either loss or fixation (100% frequency in the population) of new mutations.

Some advantages of coalescent models over phylogeny for interpreting polymorphism data they make better use of molecular clocks and do not treat polymorphisms as fixed differences; as models of populations they clarify the difference between ‘absence of evidence’ (eg for Neanderthal ancestry) and ‘evidence of absence’ (any single locus only represents such a small sample of ancestors from >50,000 years ago that with present data we don’t have the statistical power to rule out Neanderthal ancestry). they incorporate some measure of our uncertainty about the evolution of allele frequencies (a mixed process of mutation and transmission in genealogies).

Assumptions of Kingman’s (1982) coalescent for interpreting polymorphism data (random sample) Neutrality All new mutations unique and informative If individuals are diploid in a population of size N, the model applies to 2N independent, haploid copies of a gene Random mating within a population Constant population size, Ne A very specific probability distribution for transmissions of gene copies to 0, 1, 2 … offspring Non-overlapping generations

Aims of coalescent modelling: to make inferences from genetic data to simulate different demographies to see what to expect in polymorphism data; to estimate parameters under an explicit demographic model, eg Kingman’s coalescent; to estimate in which generation (and sub-population) particular lineages coalesced or mutations occurred, given explicit demographic assumptions; to evaluate the uncertainty in our estimates; to introduce new parameters to improve the model, judging by its fit to data, to learn about demography.

The ancestry of a sample composed of two copies of the gene in generation t0 MRCA Following the ancestry of a sample of two copies of a gene (gene A) from time t0, ie the present, backwards (red) , we find their most recent common ancestor (MRCA) at generation t8.

Expected coalescence times Expected time to coalescence for n lineages As the sample size increases towards 2N, E(tmrca) approaches 4N, which equals the fixation time for a newly arisen mutation.

Thanks to Lounes for this slide E(T2)=2Ne E(T5)=Ne/5 E(TMRCA)=4Ne(1-1/5) Constant N N N expanding N reducing N0 N1 time

Simulated genealogies with constant Ne TMRCA 4.57 2.93* 1.48 0.01 1 2 units of 2Ne generations 3 4 eg 2.93x2x10,000x20 = 1.2 million years

Simulating recent expansion: not much variability in TMRCA between genealogies 1 2 TMRCA 1. 0.0026 2. 0.0029 3. 0.0028 4. 0.0027 3 4 units of 2Ne generations ~1000 years of human evolution

1. A time scale is given by the coalescent model for the demography (drift history) 2. Add mutations

Infinite-sites mutation in a gene tree

The relationship between av pairwise sequence difference, p, and the parameter q in Kingman’s Coalescent 2N generations

Data: Aboriginal Australian mtDNAs Model: Kingman’s coalescent MtDNA Coding DNA Sites: 9000 to 16000 one colonization event? ? ? ? ? or several founding lineages at different times? Note the non-uniform spacing of mutations

Another advantage of coalescent models over phylogeny While the population bottlenecks implicitly assumed in phylogenetic and phylogeographic analyses can be explicitly assumed in a coalescent framework, alternative demographies may be assumed, or may be inferred. (the relationship between coalescent nodes and colonization events is very ambiguous.)

Kingman’s coalescent as H0 Kingman’s coalescent model is a starting point, available to us even before we collect any data. Having collected data, we can test whether the data show goodness-of-fit to the expectations of our starting model. If not, we should change or add parameters to improve the model. At present there are some options available (not many, but some!)

Variations from Kingman’s coalescent Selection Recurrent and back mutation Recombination *Non-random mating: eg geographic subdivision with specified migration between subpopulations *Population size fluctuation, including bottlenecks and expansions Non-’Poisson’ distributions of offspring numbers Unequal generation intervals between lineages *similar model but additional parameters

The coalescent with structure Much migration Little migration Each generation m alleles are exchanged between sub-populations. Discrete migration probability m/2N, an allele migrates. Continuous waiting time for migration is expo(m)

Summary and points for discussion Data drawn as gene trees show the relative ordering of coalescence events. The length of time between coalescence events is a function of the number of mutation events inferred from the data AND the assumed demographic history. (Molecular clocks should NOT be applied directly.) Present phylo-geographic methods fudge the data to circumvent thinking about demography. Consequently we do not learn anything about demography from them. Furthermore, these methods may be generating some highly inaccurate time estimates and they don’t provide satisfactory estimates of the uncertainty surrounding these estimates. Coalescent modelling to date draws attention to many concerns, but to improve ‘phylo-geographic’ inference we need implementations of the structured coalescent appropriate for a colonization/extinction demography.

MtDNACoding DNA Sites: 500 to 9000

Implications of drift as genealogy All the identical copies of a gene, eg all the copies of the MC1R-151 red hair allele, carried by thousands of people across Europe, have been inherited from a single common ancestor living some time in the past. Although mutation may have generated MC1R-151 alleles many times, all these mutations were quickly lost, except for one. On one occasion only, the new mutation increased in frequency, becoming a common polymorphism. Could this be true? (We think so!)