Download presentation
Presentation is loading. Please wait.
Published byNeil Mervyn Poole Modified over 7 years ago
1
Tracking the genetic legacy of past human populations through the grid
Nicolas Ray University of Bern / CMPG (University of Geneva & UNEP/GRID-Europe) ECSAC09, Veli Lošinj, August 26th 2009
2
Archaeology: Paleoanthropology:
3
Human migrations Homo sapiens sapiens [12,000] [55,000]
Adapted from Cavalli-Sforza & Feldman, 2003 [12,000] [55,000] Let me set the scene of my talk Homo sapiens sapiens
4
Why aiming at a good demographic model?
5
1. Better understand human evolution
Origin of modern human (when, where, how many?) Relationship with other members of the Homo genus 2. Distinguish between the effect of demography and those of selection (biomedical applications) Distinguish between the effect of demography and those of selection Find genes under selection Exclude false positives
6
Observed patterns of genetic diversity in contemporary populations
A complex past demography fluctuation in effective pop. size substructure migrations Observed patterns of genetic diversity in contemporary populations Gene-specific factors mutations recombination selection
7
Semi-spatial approach
8
Statistical Evaluation of Alternative Models of Human Evolution
Nelson Fagundes, Nicolas Ray, Mark Beaumont, Samuel Neuenschwander, Francisco Salzano, Sandro Bonatto, and Laurent Excoffier PNAS, 104: 50 loci in non-genic regions (Chen and Li, 2001) About 500 bp each, 24,425 bp in total 30 individuals: 10 Africans, 8 Asians, 12 Amerindians Chimpanzee sequenced to get estimation of mutation rates assuming 6 My divergence time
9
Models African replacement Assimilation Multiregional evolution time
AM AFRIG AFREG AF AS AM ASIG AF AS AM ASEG AF AS AM time Multiregional evolution MRE1S AF AS AM MRE2S AF AS AM MREBIG AF AS AM MREBEG AF AS AM There are two things that we are interested to know: What is the most likely model What are the likely parameter values (main two ones: exit of Africa, entrance into America) The statistical aprroach
10
Model parameters and priors
11
Simulations Coalescence theory
Africa Asia Americas Time A retrospective model of population genetics Traces all copies of a gene in a sample from a population to a single ancestral copy shared by all members (MRCA) Assumes no recombination, no selection In genetics, coalescent theory is a retrospective model of population genetics that traces all alleles of a gene in a sample from a population to a single ancestral copy shared by all members of the population, known as the most recent common ancestor (MRCA; sometimes also termed the coancestor to emphasize the coalescent relationship[1]). The inheritance relationships between alleles are typically represented as a gene genealogy, similar in form to a phylogenetic tree. This gene genealogy is also known as the coalescent; understanding the statistical properties of the coalescent under different assumptions forms the basis of coalescent theory. In the most simple case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure. Advances in coalescent theory, however, allow extension to the basic coalescent, and can include recombination, selection, and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis. The mathematical theory of the coalescent was originally developed in the early 1980s by John Kingman[2]
12
Simulated genealogy Summary statistics Within population: S, p
TCCTTGTA…ATTGGT Mutation Modèle de mutation ACCTAGTACAATCGGTAATGCCATTGGT Summary statistics Within population: S, p Between populations Pairwise FST Global FST Globally This process is repeated for each independent loci ACCGAGTA…GTTGGT
13
Approximate Bayesian Computations (ABC)
The rejection-sampling approach: Calculate summary statistics (S) for observed data sets Draw parameter values φ’ from prior distributions, and use them to simulate data Calculate summary statistics (S’) on the simulated data set and compare them to the observations: δ = ||S - S’|| (Euclidean distance) Accept φ’ if δ is arbitrarily small, otherwise reject sample Rejection-sampling method Tavaré et al. 1997; Pritchard et al. 1999 Aim: Estimate some parameters F in a well defined model The ABC approach (Beaumont et al. 2002) Modification: a local regression is added within the set of accepted φ’ values
14
Rejection-sampling method Tavaré et al. 1997; Pritchard et al. 1999
Aim: Estimate some parameters F in a well defined model Neuenschwander (2006)
15
Computational issues Computer clusters UBELIX (>500 nodes)
Draw parameter values from priors Simulate one genealogy Generate genetic data Compute summary statistics Computer clusters UBELIX (>500 nodes) Zooblythii (~40 nodes) 1-10 mio.
16
For ABC, 5 mio. demographic simulations are necessary to obtain robust parameter estimations
Each demographic simulation is followed by n genetic simulations (n = num. of loci) Example 8 simple models, 50 loci, 30 individuals 2 CPU-year
17
Relative probabilities of models of human evolution
African replacement Assimilation AFREG AF AS AM AF AS AM AFRIG AFREG AF AS AM ASEG AF AS AM ASIG AF AS AM ASEG AF AS AM 0.781 0.001 0.958 0.042 0.091 0.909 Multiregional evolution MREBIG AF AS AM MRE1S AF AS AM MRE2S AF AS AM MREBIG AF AS AM MREBEG AF AS AM 0.218 0.461 0.422 0.048 0.069
18
Americas colonization time
51.1 Kya (40.1 – 70.9) Out-of-Africa time AFREG AS AM AF 10.3 Kya (7.6 – 15.9) Americas colonization time 142 Kya (104 – 186) Speciation time
19
Fully spatial approach
20
A complex demography [10,000] demographic and spatial expansions
Adapted from Cavalli-Sforza & Feldman, 2003 [10,000] demographic and spatial expansions [55,000] population bottlenecks secondary contacts ! Show landscape effects on demography: bottlenecks, isolation, expansion, migration, secondary contact Now, it seems on this figure that the environment of the world is quite uniform, but of course it’s not the case… population isolation fast migration events
21
From environment to demography
low high Carrying capacity Basée sur les données de la littérature sur les densités observées des chasseurs-cueilleurs contemporains. Vous remarquez que la projection géographique a été modifiée, afin de minimiser les distorsions. Spatial resolution: 100 km
22
From environment to demography
Friction low Basée sur quelques données de la littérature, mais encore moins de données que K high
23
Demographic simulations
stepping-stone model (cellular automata) Cell or deme Pop. size time
24
SPatiaL And Temporal Coalescences in Heterogeneous Environment
SPLATCHE SPatiaL And Temporal Coalescences in Heterogeneous Environment ( Windows-based graphical version: extremely useful to explore outputs of simulation Of course a Console-version for faster execution of simulation under Linux-based system
25
Vegetation maps Vegetation at the Last Glacial Maximum
Taking into account altitudes Expert system present potential Dvpt avec Jonathan Adams Ray et Adams Internet Archaeology 11 Vegetation at the Last Glacial Maximum Present potential vegetation Last Glacial Maximum
26
Demography and spatial expansion
Population density Ce n’est qu’un exemple
27
Dynamic vegetation intermediate PP LGM Ce n’est qu’un exemple
29
Genetic simulations we need to sample indigenous people, and avoid admixed people
30
Computational issues A fully spatially-explicit model using 500 loci in 800 individuals: 10 CPU-years Adding long-distance dispersal: 20 CPU-years
31
SPLATCHE on the grid early 2005: joined the Biomed VO of the EGEE project mid 2005: tested on GILDA test bed, and deployed on the Grid since late 2005: testing and improvement since mid 2006: production mode and optimization
32
Use of SPLATCHE on the grid
N simulations Posterior distribution of demographic/genetic parameters of interest Statistical tools Independent simulations: the more CPUs, the better job failures are not that bad
33
Optimizations Reduction of the number of simulations (Daniel Wegmann)
Submission time multi-threaded application using up to 30 RBs (used for the WISDOM project) Fetching time of job outputs in-house multi-threaded solution for checking status and getting outputs GRID 5 mio. simulations 4’000 jobs takes about 12 hours with sequential submission WISDOM is a Initiative for grid-enabled drug discovery against neglected and emergent diseases Wide In Silico Docking On Malaria I thank the developers of WISDOM, the team of Vincent Breton, and especially Mathieu Reichstadt (?) et Jean Weismann. Reduction of the number of simulations (Daniel Wegmann) By MCMC. Promising results (~10 times less sims)
34
African replacement (AFREG) model best fits with observed genetic diversity in non-genic regions
Scenarios with interbreeding are less well supported Colonization of Americas seems to have occurred post LGM Use of additional statistics, loci, samples may increase the power of the method
35
Geographic origin of human dispersal
22 populations Talk briefly about multiregional evolution scenarios Ray et al. (2005) Genome Research
37
Mutations surfing during a range expansion
38
Mutations surfing during a range expansion
Klopfstein, Currat and Excoffier (2006) MBE 23(3): Some mutation can travel with the wave of advance New mutations can reach high frequencies More pronounced in small populations Spatial distribution of the frequency of a new mutation. K=10 Phenomenon inferred from simulations, difficult to study analytically Centroid of the spatial distribution is often far from the origin
39
Selection ? Currat, Excoffier, Maddison, Otto, Ray, Whitlock and Yeaman (2006) Science 313:172a La découverte du surfing souligne l'importance d'utiliser des modèles réalistes pour l'évolution humaines avant de tirer des conclusions définitives sur la sélection de certains allèles Microcephalin: about 60% difference bw Africa and the rest (for a particular haplogroup) ASPM: about 20% difference bw Africa and the rest ASPM : (abnormal spindle-like microcephaly associated) (2005) Science 509 (5741)
40
Interactions among populations
Interaction between modern humans and Neanderthals in Europe Currat & Excoffier (2004), PLoS Biol.
41
Cane toad invasion in Australia
Estoup, A., Baird, S. J. E., Ray, N., Currat, M., Cornuet, J.-M., Santos, F., Beaumont, M. A. and L. Excoffier. Combining genetic, historical and geographic data to reconstruct the dynamics of the bioinvasion of cane toad Bufo marinus. In prep
42
Take-home message A good human demographic model is important
Realistic spatially-explicit approaches are essential The grid is key for sufficient exploration of parameter space User support and connections outside one’s discipline is crucial
43
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.