Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept Muséum National d’Histoire Naturelle Paris Scientific Advisory.

Slides:



Advertisements
Similar presentations
How to Use This Presentation
Advertisements

Comparing phylogenetic and statistical classification methods for DNA barcoding Frederic Austerlitz, Olivier David, Brigitte Schaeffer, Sisi Ye, Michel.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
The Tree of Life Chapter 17.
Reading Phylogenetic Trees Gloria Rendon NCSA November, 2008.
Sampling distributions of alleles under models of neutral evolution.
Until more recent times, scientists named Things with crazy long names that Just described the organism. Apis pubescens, thorace subgriseo, abdomine.
THE EVOLUTIONARY HISTORY OF BIODIVERSITY
Phylogenetic Trees Systematics, the scientific study of the diversity of organisms, reveals the evolutionary relationships between organisms. Taxonomy,
Lecture 23: Introduction to Coalescence April 7, 2014.
Classification of Living Things. 2 Taxonomy: Distinguishing Species Distinguishing species on the basis of structure can be difficult  Members of the.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Reading Phylogenetic Trees
Chapter 26 – Phylogeny & the Tree of Life
Molecular Evolution Revised 29/12/06
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Lecture 28 Evolution. Variation Without variation (which arises from mutations of DNA molecules to produce new alleles) natural selection would have nothing.
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Phylogeny and the Tree of Life
“Species Trees”. What is the “species tree?” The true tree (when there is one) The population tree The dominant history ????
Molecular phylogenetics
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Banbury october 2007 Michel Veuille Ecole Pratique des Hautes Etudes - Paris1 Can we extend intraspecific population genetics to community population.
The Evolutionary History of Biodiversity
How classification works
Classification and Systematics Tracing phylogeny is one of the main goals of systematics, the study of biological diversity in an evolutionary context.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
LECT 4. What is Cloning? The terms recombinant DNA technology, DNA cloning, molecular cloning, or gene cloning all refer to the same process: the transfer.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Quantifying uncertainty in species discovery with approximate Bayesian computation (ABC): single samples and recent radiations Mike HickersonUniversity.
Underlying Principles of Zoology Laws of physics and chemistry apply. Principles of genetics and evolution important. What is learned from one animal group.
Population assignment likelihoods in a phylogenetic and demographic model. Jody Hey Rutgers University.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Biological inferences from barcoding data Timothy G. Barraclough Establishing a standard DNA barcode for land plants.
17.2 Modern Classification
Patterns of divergent selection from combined DNA barcode and phenotypic data Tim Barraclough, Imperial College London.
Models of Molecular Evolution III Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections 7.5 – 7.8.
Coalescent Models for Genetic Demography
Introduction to History of Life. Biological evolution consists of change in the hereditary characteristics of groups of organisms over the course of generations.
CHARACTERS USED IN RECONSTRUCTING PHYLOGENETIC TREES 1. Morphological “ Tiktaalik is the sister group of Acanthostega + Ichthyostega in one of the two.
Lecture 17: Phylogenetics and Phylogeography
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Phylogeny & the Tree of Life
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
NEW TOPIC: MOLECULAR EVOLUTION.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Integrative taxonomy Gustav Paulay Florida Museum of Natural History University of Florida.
Systematics and Phylogenetics Ch. 23.1, 23.2, 23.4, 23.5, and 23.7.
Testing the Neutral Mutation Hypothesis The neutral theory predicts that polymorphism within species is correlated positively with fixed differences between.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
PHYOGENY & THE Tree of life Represent traits that are either derived or lost due to evolution.
Classification Biology I. Lesson Objectives Compare Aristotle’s and Linnaeus’s methods of classifying organisms. Explain how to write a scientific name.
5.4 Cladistics The images above are both cladograms. They show the statistical similarities between species based on their DNA/RNA. The cladogram on the.
Lesson Overview Lesson Overview Modern Evolutionary Classification 18.2.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Phylogeny and the Tree of Life
Phylogeny & the Tree of Life
Sierra M. Love Stowell & Andrew P. Martin Student Figures
Lecture 81 – Lecture 82 – Lecture 83 Modern Classification Ozgur Unal
COALESCENCE AND GENE GENEALOGIES
Evolution and Classification
Hierarchical Classification vs. Systematics
Phylogeny and the Tree of Life
Reading Phylogenetic Trees
Chapter 18: Evolution and Origin of Species
Phylogeny and the Tree of Life
Presentation transcript:

Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept Muséum National d’Histoire Naturelle Paris Scientific Advisory Board of the CBOL Data Analysis Working Group

What is the molecular signature of speciation events? There is no molecular signature of speciation events What are the other signatures of speciation events? There is no universal signature of speciation events But there are local signatures of speciation events, and one kind of signature (e.g. morphological) can be present when the other (e.g. genetical) is absent

In 1998, the common European earwig was shown to consist of two sympatric and reproductively isolated species differing only in the number of annual broods (one or two broods per year). Wirth, Le Guellec, Vancassel, & Veuille Evolution 52: Wirth, Le Guellec, & M. Veuille MBE, 16: A case of two mtDNA species with no morphological difference The two species differ strikingly in COII sequence But since they present no apparent morphological difference, the two species remain unnamed Two examples : 1 st / 2 European earwig Forficula auricularia This is because the GC% of these species evolves at a very high rate GC% at COII in hexapoda earwigs Other hexapoda

Drosophila santomea lives in the highlands of São Tome above 1100 m Drosophila yakuba lives in the lowlands, below 1100 m. After Lachaise et al. Proc. Roy Soc. London, 2000 A case of two morphological species with no mtDNA difference Two examples : 2 nd / 2 Drosophila santomea Drosophila yakuba São Tome They hybridize at 1100 m, and nevertheless remain genetically distinct They share the same mitochondria, but can be easily identified through the colour pattern of the abdomen

1830 Tropical Africa + worldwide D. erecta D. teissieri D. yakuba D. santomea D. melanogaster D. simulans D. mauritiana D. sechellia D. orena 2000 São Tome island 1919 Tropical Africa + worldwide 1978Cameroon 1974 Tropical Africa 1971 Tropical Africa 1954 Tropical Africa 1974 Mauritius island 1981 Sechelles islands D. santomeaD. yakuba Share the same mitochondrion through common descent They belong to the Drosophila melanogaster ("black abdomen") subgroup

There are many definitions of species The species concept is hotly debated The condition of the barcoder is challenging « Species » make sense to everybody. For example, 12% of the nouns in the French vocabulary* correspond to taxa that make sense to a taxonomist (species, families, varieties) * : From the Robert a classic French dictionary A solution is to let people use whatever species concept they prefer and limit the barcoder’s activity to the domain where he/she can be helpful

?0,000,000 species Black box Data & tools « This is species A or B » « This is a new species » Data analysis consists in providing data to taxonomists, in order to make decisions about the status of specimens and taxa. (taxonomist) (barcoder) Barcoding and taxonomic decisions are logically distinct, even though they can be performed by the same person. What data analysis is about

Query sequence closest validated node Tree of life Local barcode sister group closest COI validated node Tree of life Local barcode Closest validated node using additional information If we want to be 100% sure of the assignment of a taxon, then we must look at the nodes below the closest node excluding a sister group with probability p < Below this point, a series of statistical and classificatory approaches allow us to estimate the probability that the query sequence belongs or not to an already described species, based on the available information. Alternatively, additional information using other genes, or an enlarged dataset can increase our understanding of the taxonomic status of the query. What data analysis is about (contd)

The population genetics background behind data analysis

Principle two sequences from the same population find their last common ancestor with some constant probabiilty p = 1/N It is a « death process » Very different from a normal distribution The most probable coalescence time: t = 1 the expectation: t = N P = 0.05 for: t = 3N Past (generations)

MRCA Sample n1 n p Probability p that the MRCA of a sample of size n is also the MRCA of the species assuming a standard Wright-Fisher model. p increases very rapidly. The probability is p = for n = 5, and p = 0.8 for p = 9 Increasing the sample size beyond this is useless In a very large population p = (n-1)/(n+1)

MRCA Sample n1 N generations 2N (1-1/n) generations Typically, under a standard equilibrium Wright-Fisher model(*), the expected time to the last common ancestor of the tree (MRCA) is only twice the time to the common ancestor of two randomly sampled sequences (*) assuming : - neutrality - constant population size - no structuring - mutation drift-equilibrium - N = effective number of genes

MRCA Sample n1 Sample n2 > n1 MRCA « The older nodes of a genealogy tend to be revealed in a small sample, whereas more recent portions are, on average, only revealed as the sample size per locus grows large. » Kliman et al N generations 2N (1-1/n) generations Using a larger dataset does not increase the information very much at this level

After AG Clark 1997 A long time after they have split, two species still share some neutral polymorphisms. polymorphisms can go very far, back in the past of the species, and enter the ancestral population with a sister species

Exploring shallow nodes

Derived from Nielsen and Hey’s (2001) IM method, based on MCMC (Monte Carlo Markov chains). This method estimated 5 Parameters, thus involving very long computation time 1. Nielsen and Matzen’s MCMC method

1. Matz and Nielsen’s MCMC method Derived from Nielsen and Hey’s (2001) IM method, based on MCMC (Monte Carlo Markov chains). This method estimated 5 Parameters, thus involving very long computation time Matz and Nielsen (2005) reduce it to two parameters: - the population size - time to speciation. They estimate the probability that the query sequence belongs or not to the same species as the reference sample

The classification methods partition the dataset using a few characters The distance methods work well with a small dataset, provided there are enough mutations 2. Evaluating classification and phylogenetic methods : Austerlitz et al. They compare two classification methods CART random forest And two phylogenetic methods Neighbour-joining phy-ML They simulate n +1 individuals in each species. n individuals are a reference sample the last individual is the query. Repeated simulations, allow them to record the rate of correct assigment of the query to its species

Comparison of the methods for a low  (2 populations, reference sample size = 10,  ) Classification methods perform better for a low variation

Comparison of the methods for a high  (2 populations, Reference sample size = 10, θ = 30) Phylogenetic methods perform better for a highly variable population

Conclusion : the appropriate method varies with the properties of the dataset

Comparing methods using realistic datasets

1. Litoria nannotis 2. Astraptes fulgeraptor 4 species Average sample size: 43.7 average  = species Average sample size: 38.8 average  = 23.5

3. Cowries

Other solutions: Can we replace CO1 ? Can we complement it with other genes

Properties of bilaterian mtDNAOther systems Large number of copies per cell rDNA has a high copy number High mutation rate Low variation / divergence ratio No recombination asexual HaploidX-chromosome, Y chromosome Centromeres, telomeres (documented in Drosophila) Microsatellites also Centromeres, telomeres (documented in Drosophila) The Y is asexual The other chromosomes recombine Maternally inherited The main disadvantage of asexuality is that mitochondria do not follow the 2 nd law of Mendel : mtDNA carries no information on genetic barriers.. The main disadvantage of maternal inheritance is that mitochondria can be transferred horizontally along with Wolbachia endosymbiotic bacteria. Examples: Protocalliphora and Drosophila Variation in mtDNA is lowered due to selective sweeps according to Bazin et al (2006) Variation is also lowered in some nuclear regions due to background selection

Phylogeny of the fly Protocalliphora based on AFLP (nuclear markers),according to Whitworth et al (2007). Symbols represent different Wolbachia strains Maternally transmitted endosymbiotic bacteria : hitchhiking by Wolbachia Phylogeny of Protocalliphora based on COI+COII. The authors claim that the assignment of unknown individuals to species is impossible in 60% of the species After Whitworth et al. Proc Roy. Soc. B, in press nuclearmtDNA

MRCA Phylogenetic tree of mtDNA Phylogram of nuclear DNA A phyletic tree in mtDNA represents true phyletic relationships. Mutations are in linkage disequilibrium because they do not recombine. Having two divergent clades is trivial under a FW standard model Whereas the phylogram of a recombining gene represents distances between haplotypes, where mutations can seem to « appear » repeatedly on several terminal branches. They thus inform us on the existence of barrier to gene flow

Conclusions 1.There is no mitochondrial signature of speciation. There is no room for a barcode species concept, and anything like a « barcodon ». 2.Even a moderate sample can provide a wealth of information on the history of a species. 3.Additional information can be obtained in difficult cases, either by increasing the population sample, or by using additional markers.

The END