Tracking the genetic legacy of past human populations through the grid

Slides:



Advertisements
Similar presentations
Vicky Lee.  The Descent of Man “In each great region of the world the living mammals are closely related to the extinct species of the same region. It.
Advertisements

The Coalescent Theory And coalescent- based population genetics programs.
Gene tree analyses of Aboriginal Australians Rosalind Harding University of Oxford.
Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Amorphophallus titanum Largest unbranched inflorescence in the world Monecious and protogynous Carrion flower (fly/beetle pollinated) Indigenous to the.
Sampling distributions of alleles under models of neutral evolution.
Genomes as the Hub of Biology UNIT 2. The hub of biology As biologists, we seek not only to understand how a single organism works, but how organisms.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lecture 23: Introduction to Coalescence April 7, 2014.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Human Evolution What were our ancestors like? Where did we evolve? Why big brains? Relationships between populations?
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Islands in Africa: a study of structure in the source population for modern humans Rosalind Harding Depts of Statistics, Zoology & Anthropology, Oxford.
Tracing the dispersal of human populations By analysis of polymorphisms in the Non-recombining region of the Human Y Chromosome Underhill et al 2000 Nature.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Inferring human demographic history from DNA sequence data Apr. 28, 2009 J. Wall Institute for Human Genetics, UCSF.
Approximate Bayesian Methods in Genetic Data Analysis Mark A. Beaumont, University of Reading,
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Simulation Models as a Research Method Professor Alexander Settles.
Monte Carlo methods for estimating population genetic parameters Rasmus Nielsen University of Copenhagen.
Human Migrations Anjalee Sujanani CS 374 – Algorithms in Biology Tuesday, October 24, 2006.
Chapter Geography of Evolution Platyrrhini Catarrhini.
Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.
Out-of-Africa Theory: The Origin Of Modern Humans
Process of Evolution Chapter 18 Mader: Biology 8th Ed.
Molecular phylogenetics
A REVIEW OF OCCUPANCY PROBLEMS AND THEIR APPLICATIONS WITH A MATLAB DEMO Samuel Khuvis, Undergraduate Nagaraj Neerchal, Professor of Statistics Department.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
The Search for Genetic Eve and Adam. Divergence Points 5-7 Million Years Ago (MYA)– Divergence from the Chimpanzee Lineage 5-7 Million Years Ago (MYA)–
ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.
Discuss results of forensics analysis Review mini satellites and microsatellites Present Y chromosome study of human origins and migration Discuss one.
Population Genetics and Human Evolution
Coalescent Models for Genetic Demography
Lecture 17: Phylogenetics and Phylogeography
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Swiss Grid Day, Bern, November 26 th Human migrations Adapted from Cavalli-Sforza & Feldman, 2003 [12,000] [55,000] Homo sapiens sapiens.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Amorphophallus titanum
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Our Current Understanding of Human Demographic History and Migrations NeandertalModern Homo Sapiens.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
Characterizing a conifer species range expansion using genomic and tree ring data Joane Elleouet Sally Aitken.
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
IMa2(Isolation with Migration)
Gil McVean Department of Statistics
MULTIPLE GENES AND QUANTITATIVE TRAITS
S. orientale, S. inexspectatum and S. subsecundum
Why study population genetic structure?
Signatures of Selection
Montgomery Slatkin  The American Journal of Human Genetics 
COALESCENCE AND GENE GENEALOGIES
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Reminder: Populations
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Statistical Modeling of Ancestral Processes
MULTIPLE GENES AND QUANTITATIVE TRAITS
Mechanisms of Evolution
Vineet Bafna/Pavel Pevzner
Bellringer Are evolution and natural selection related to one another? Explain. Who is Charles Darwin?
The Central Siberian Origin for Native American Y Chromosomes
The coalescent with recombination (Chapter 5, Part 1)
There is a Great Diversity of Organisms
Incorporating changing population size into the coalescent
Chapter 18: Evolution and Origin of Species
Theory of Natural Selection
Presentation transcript:

Tracking the genetic legacy of past human populations through the grid Nicolas Ray University of Bern / CMPG (University of Geneva & UNEP/GRID-Europe) ECSAC09, Veli Lošinj, August 26th 2009

Archaeology: Paleoanthropology:

Human migrations Homo sapiens sapiens [12,000] [55,000] Adapted from Cavalli-Sforza & Feldman, 2003 [12,000] [55,000] Let me set the scene of my talk Homo sapiens sapiens

Why aiming at a good demographic model?

1. Better understand human evolution Origin of modern human (when, where, how many?) Relationship with other members of the Homo genus 2. Distinguish between the effect of demography and those of selection (biomedical applications) Distinguish between the effect of demography and those of selection Find genes under selection Exclude false positives

Observed patterns of genetic diversity in contemporary populations A complex past demography fluctuation in effective pop. size substructure migrations Observed patterns of genetic diversity in contemporary populations Gene-specific factors mutations recombination selection

Semi-spatial approach

Statistical Evaluation of Alternative Models of Human Evolution Nelson Fagundes, Nicolas Ray, Mark Beaumont, Samuel Neuenschwander, Francisco Salzano, Sandro Bonatto, and Laurent Excoffier. 2007. PNAS, 104: 17614-17619 50 loci in non-genic regions (Chen and Li, 2001) About 500 bp each, 24,425 bp in total 30 individuals: 10 Africans, 8 Asians, 12 Amerindians Chimpanzee sequenced to get estimation of mutation rates assuming 6 My divergence time

Models African replacement Assimilation Multiregional evolution time AM AFRIG AFREG AF AS AM ASIG AF AS AM ASEG AF AS AM time Multiregional evolution MRE1S AF AS AM MRE2S AF AS AM MREBIG AF AS AM MREBEG AF AS AM There are two things that we are interested to know: What is the most likely model What are the likely parameter values (main two ones: exit of Africa, entrance into America)  The statistical aprroach

Model parameters and priors

Simulations Coalescence theory Africa Asia Americas Time A retrospective model of population genetics Traces all copies of a gene in a sample from a population to a single ancestral copy shared by all members (MRCA) Assumes no recombination, no selection In genetics, coalescent theory is a retrospective model of population genetics that traces all alleles of a gene in a sample from a population to a single ancestral copy shared by all members of the population, known as the most recent common ancestor (MRCA; sometimes also termed the coancestor to emphasize the coalescent relationship[1]). The inheritance relationships between alleles are typically represented as a gene genealogy, similar in form to a phylogenetic tree. This gene genealogy is also known as the coalescent; understanding the statistical properties of the coalescent under different assumptions forms the basis of coalescent theory. In the most simple case, coalescent theory assumes no recombination, no natural selection, and no gene flow or population structure. Advances in coalescent theory, however, allow extension to the basic coalescent, and can include recombination, selection, and virtually any arbitrarily complex evolutionary or demographic model in population genetic analysis. The mathematical theory of the coalescent was originally developed in the early 1980s by John Kingman[2]

Simulated genealogy Summary statistics Within population: S, p TCCTTGTA…ATTGGT Mutation Modèle de mutation ACCTAGTACAATCGGTAATGCCATTGGT Summary statistics Within population: S, p Between populations Pairwise FST Global FST Globally This process is repeated for each independent loci ACCGAGTA…GTTGGT

Approximate Bayesian Computations (ABC) The rejection-sampling approach: Calculate summary statistics (S) for observed data sets Draw parameter values φ’ from prior distributions, and use them to simulate data Calculate summary statistics (S’) on the simulated data set and compare them to the observations: δ = ||S - S’|| (Euclidean distance) Accept φ’ if δ is arbitrarily small, otherwise reject sample Rejection-sampling method Tavaré et al. 1997; Pritchard et al. 1999 Aim: Estimate some parameters F in a well defined model The ABC approach (Beaumont et al. 2002) Modification: a local regression is added within the set of accepted φ’ values

Rejection-sampling method Tavaré et al. 1997; Pritchard et al. 1999 Aim: Estimate some parameters F in a well defined model Neuenschwander (2006)

Computational issues Computer clusters UBELIX (>500 nodes) Draw parameter values from priors Simulate one genealogy Generate genetic data Compute summary statistics Computer clusters UBELIX (>500 nodes) Zooblythii (~40 nodes) 1-10 mio.

For ABC, 5 mio. demographic simulations are necessary to obtain robust parameter estimations Each demographic simulation is followed by n genetic simulations (n = num. of loci) Example 8 simple models, 50 loci, 30 individuals  2 CPU-year

Relative probabilities of models of human evolution African replacement Assimilation AFREG AF AS AM AF AS AM AFRIG AFREG AF AS AM ASEG AF AS AM ASIG AF AS AM ASEG AF AS AM 0.781 0.001 0.958 0.042 0.091 0.909 Multiregional evolution MREBIG AF AS AM MRE1S AF AS AM MRE2S AF AS AM MREBIG AF AS AM MREBEG AF AS AM 0.218 0.461 0.422 0.048 0.069

Americas colonization time 51.1 Kya (40.1 – 70.9) Out-of-Africa time AFREG AS AM AF 10.3 Kya (7.6 – 15.9) Americas colonization time 142 Kya (104 – 186) Speciation time

Fully spatial approach

A complex demography [10,000] demographic and spatial expansions Adapted from Cavalli-Sforza & Feldman, 2003 [10,000] demographic and spatial expansions [55,000] population bottlenecks secondary contacts ! Show landscape effects on demography: bottlenecks, isolation, expansion, migration, secondary contact Now, it seems on this figure that the environment of the world is quite uniform, but of course it’s not the case… population isolation fast migration events

From environment to demography low high Carrying capacity Basée sur les données de la littérature sur les densités observées des chasseurs-cueilleurs contemporains. Vous remarquez que la projection géographique a été modifiée, afin de minimiser les distorsions. Spatial resolution: 100 km

From environment to demography Friction low Basée sur quelques données de la littérature, mais encore moins de données que K high

Demographic simulations stepping-stone model (cellular automata) Cell or deme Pop. size time

SPatiaL And Temporal Coalescences in Heterogeneous Environment SPLATCHE SPatiaL And Temporal Coalescences in Heterogeneous Environment (http://cmpg.unibe.ch/software/splatche) Windows-based graphical version: extremely useful to explore outputs of simulation Of course a Console-version for faster execution of simulation under Linux-based system

Vegetation maps Vegetation at the Last Glacial Maximum Taking into account altitudes Expert system present potential Dvpt avec Jonathan Adams Ray et Adams. 2001. Internet Archaeology 11 Vegetation at the Last Glacial Maximum Present potential vegetation Last Glacial Maximum

Demography and spatial expansion Population density Ce n’est qu’un exemple

Dynamic vegetation intermediate PP LGM Ce n’est qu’un exemple

Genetic simulations we need to sample indigenous people, and avoid admixed people

Computational issues A fully spatially-explicit model using 500 loci in 800 individuals:  10 CPU-years Adding long-distance dispersal:  20 CPU-years

SPLATCHE on the grid early 2005: joined the Biomed VO of the EGEE project mid 2005: tested on GILDA test bed, and deployed on the Grid since late 2005: testing and improvement since mid 2006: production mode and optimization

Use of SPLATCHE on the grid N simulations Posterior distribution of demographic/genetic parameters of interest Statistical tools Independent simulations: the more CPUs, the better job failures are not that bad

Optimizations Reduction of the number of simulations (Daniel Wegmann) Submission time multi-threaded application using up to 30 RBs (used for the WISDOM project) Fetching time of job outputs in-house multi-threaded solution for checking status and getting outputs GRID 5 mio. simulations 4’000 jobs takes about 12 hours with sequential submission WISDOM is a Initiative for grid-enabled drug discovery against neglected and emergent diseases Wide In Silico Docking On Malaria I thank the developers of WISDOM, the team of Vincent Breton, and especially Mathieu Reichstadt (?) et Jean Weismann. Reduction of the number of simulations (Daniel Wegmann) By MCMC. Promising results (~10 times less sims)

African replacement (AFREG) model best fits with observed genetic diversity in non-genic regions Scenarios with interbreeding are less well supported Colonization of Americas seems to have occurred post LGM Use of additional statistics, loci, samples may increase the power of the method

Geographic origin of human dispersal 22 populations Talk briefly about multiregional evolution scenarios Ray et al. (2005) Genome Research

Mutations surfing during a range expansion

Mutations surfing during a range expansion Klopfstein, Currat and Excoffier (2006) MBE 23(3): 482-490 Some mutation can travel with the wave of advance New mutations can reach high frequencies More pronounced in small populations Spatial distribution of the frequency of a new mutation. K=10 Phenomenon inferred from simulations, difficult to study analytically Centroid of the spatial distribution is often far from the origin

Selection ? Currat, Excoffier, Maddison, Otto, Ray, Whitlock and Yeaman (2006) Science 313:172a La découverte du surfing souligne l'importance d'utiliser des modèles réalistes pour l'évolution humaines avant de tirer des conclusions définitives sur la sélection de certains allèles Microcephalin: about 60% difference bw Africa and the rest (for a particular haplogroup) ASPM: about 20% difference bw Africa and the rest ASPM : (abnormal spindle-like microcephaly associated) (2005) Science 509 (5741)

Interactions among populations Interaction between modern humans and Neanderthals in Europe Currat & Excoffier (2004), PLoS Biol.

Cane toad invasion in Australia Estoup, A., Baird, S. J. E., Ray, N., Currat, M., Cornuet, J.-M., Santos, F., Beaumont, M. A. and L. Excoffier. Combining genetic, historical and geographic data to reconstruct the dynamics of the bioinvasion of cane toad Bufo marinus. In prep

Take-home message A good human demographic model is important Realistic spatially-explicit approaches are essential The grid is key for sufficient exploration of parameter space User support and connections outside one’s discipline is crucial

Thank you!