Download presentation
Presentation is loading. Please wait.
Published byGabriel Willis Modified over 9 years ago
1
Michel Veuille Ecole pratique des Hautes Etudes Director of the Systematics and Evolution dept Muséum National d’Histoire Naturelle Paris Scientific Advisory Board of the CBOL Data Analysis Working Group
2
What is the molecular signature of speciation events? There is no molecular signature of speciation events What are the other signatures of speciation events? There is no universal signature of speciation events But there are local signatures of speciation events, and one kind of signature (e.g. morphological) can be present when the other (e.g. genetical) is absent
3
In 1998, the common European earwig was shown to consist of two sympatric and reproductively isolated species differing only in the number of annual broods (one or two broods per year). Wirth, Le Guellec, Vancassel, & Veuille. 1998. Evolution 52: 260-265 Wirth, Le Guellec, & M. Veuille. 1999 MBE, 16: 1645-1653. A case of two mtDNA species with no morphological difference The two species differ strikingly in COII sequence But since they present no apparent morphological difference, the two species remain unnamed Two examples : 1 st / 2 European earwig Forficula auricularia This is because the GC% of these species evolves at a very high rate GC% at COII in hexapoda earwigs Other hexapoda
4
Drosophila santomea lives in the highlands of São Tome above 1100 m Drosophila yakuba lives in the lowlands, below 1100 m. After Lachaise et al. Proc. Roy Soc. London, 2000 A case of two morphological species with no mtDNA difference Two examples : 2 nd / 2 Drosophila santomea Drosophila yakuba São Tome They hybridize at 1100 m, and nevertheless remain genetically distinct They share the same mitochondria, but can be easily identified through the colour pattern of the abdomen
5
1830 Tropical Africa + worldwide D. erecta D. teissieri D. yakuba D. santomea D. melanogaster D. simulans D. mauritiana D. sechellia D. orena 2000 São Tome island 1919 Tropical Africa + worldwide 1978Cameroon 1974 Tropical Africa 1971 Tropical Africa 1954 Tropical Africa 1974 Mauritius island 1981 Sechelles islands D. santomeaD. yakuba Share the same mitochondrion through common descent They belong to the Drosophila melanogaster ("black abdomen") subgroup
6
There are many definitions of species The species concept is hotly debated The condition of the barcoder is challenging « Species » make sense to everybody. For example, 12% of the nouns in the French vocabulary* correspond to taxa that make sense to a taxonomist (species, families, varieties) * : From the Robert a classic French dictionary A solution is to let people use whatever species concept they prefer and limit the barcoder’s activity to the domain where he/she can be helpful
7
?0,000,000 species Black box Data & tools « This is species A or B » « This is a new species » Data analysis consists in providing data to taxonomists, in order to make decisions about the status of specimens and taxa. (taxonomist) (barcoder) Barcoding and taxonomic decisions are logically distinct, even though they can be performed by the same person. What data analysis is about
8
Query sequence closest validated node Tree of life Local barcode sister group closest COI validated node Tree of life Local barcode Closest validated node using additional information If we want to be 100% sure of the assignment of a taxon, then we must look at the nodes below the closest node excluding a sister group with probability p < 0.01. Below this point, a series of statistical and classificatory approaches allow us to estimate the probability that the query sequence belongs or not to an already described species, based on the available information. Alternatively, additional information using other genes, or an enlarged dataset can increase our understanding of the taxonomic status of the query. What data analysis is about (contd)
9
The population genetics background behind data analysis
10
Principle two sequences from the same population find their last common ancestor with some constant probabiilty p = 1/N It is a « death process » Very different from a normal distribution The most probable coalescence time: t = 1 the expectation: t = N P = 0.05 for: t = 3N Past (generations)
11
2 39 919 MRCA Sample n1 n p Probability p that the MRCA of a sample of size n is also the MRCA of the species assuming a standard Wright-Fisher model. p increases very rapidly. The probability is p = 0.6667 for n = 5, and p = 0.8 for p = 9 Increasing the sample size beyond this is useless In a very large population p = (n-1)/(n+1)
12
MRCA Sample n1 N generations 2N (1-1/n) generations Typically, under a standard equilibrium Wright-Fisher model(*), the expected time to the last common ancestor of the tree (MRCA) is only twice the time to the common ancestor of two randomly sampled sequences (*) assuming : - neutrality - constant population size - no structuring - mutation drift-equilibrium - N = effective number of genes
13
MRCA Sample n1 Sample n2 > n1 MRCA « The older nodes of a genealogy tend to be revealed in a small sample, whereas more recent portions are, on average, only revealed as the sample size per locus grows large. » Kliman et al. 2000. N generations 2N (1-1/n) generations Using a larger dataset does not increase the information very much at this level
14
After AG Clark 1997 A long time after they have split, two species still share some neutral polymorphisms. polymorphisms can go very far, back in the past of the species, and enter the ancestral population with a sister species
15
Exploring shallow nodes
16
Derived from Nielsen and Hey’s (2001) IM method, based on MCMC (Monte Carlo Markov chains). This method estimated 5 Parameters, thus involving very long computation time 1. Nielsen and Matzen’s MCMC method
17
1. Matz and Nielsen’s MCMC method Derived from Nielsen and Hey’s (2001) IM method, based on MCMC (Monte Carlo Markov chains). This method estimated 5 Parameters, thus involving very long computation time Matz and Nielsen (2005) reduce it to two parameters: - the population size - time to speciation. They estimate the probability that the query sequence belongs or not to the same species as the reference sample
18
The classification methods partition the dataset using a few characters The distance methods work well with a small dataset, provided there are enough mutations 2. Evaluating classification and phylogenetic methods : Austerlitz et al. They compare two classification methods CART random forest And two phylogenetic methods Neighbour-joining phy-ML They simulate n +1 individuals in each species. n individuals are a reference sample the last individual is the query. Repeated simulations, allow them to record the rate of correct assigment of the query to its species
19
Comparison of the methods for a low (2 populations, reference sample size = 10, ) Classification methods perform better for a low variation
20
Comparison of the methods for a high (2 populations, Reference sample size = 10, θ = 30) Phylogenetic methods perform better for a highly variable population
21
Conclusion : the appropriate method varies with the properties of the dataset
22
Comparing methods using realistic datasets
23
1. Litoria nannotis 2. Astraptes fulgeraptor 4 species Average sample size: 43.7 average = 1.54 12 species Average sample size: 38.8 average = 23.5
24
3. Cowries
25
Other solutions: Can we replace CO1 ? Can we complement it with other genes
26
Properties of bilaterian mtDNAOther systems Large number of copies per cell rDNA has a high copy number High mutation rate Low variation / divergence ratio No recombination asexual HaploidX-chromosome, Y chromosome Centromeres, telomeres (documented in Drosophila) Microsatellites also Centromeres, telomeres (documented in Drosophila) The Y is asexual The other chromosomes recombine Maternally inherited The main disadvantage of asexuality is that mitochondria do not follow the 2 nd law of Mendel : mtDNA carries no information on genetic barriers.. The main disadvantage of maternal inheritance is that mitochondria can be transferred horizontally along with Wolbachia endosymbiotic bacteria. Examples: Protocalliphora and Drosophila Variation in mtDNA is lowered due to selective sweeps according to Bazin et al (2006) Variation is also lowered in some nuclear regions due to background selection
27
Phylogeny of the fly Protocalliphora based on AFLP (nuclear markers),according to Whitworth et al (2007). Symbols represent different Wolbachia strains Maternally transmitted endosymbiotic bacteria : hitchhiking by Wolbachia Phylogeny of Protocalliphora based on COI+COII. The authors claim that the assignment of unknown individuals to species is impossible in 60% of the species After Whitworth et al. Proc Roy. Soc. B, in press nuclearmtDNA
28
MRCA Phylogenetic tree of mtDNA Phylogram of nuclear DNA A phyletic tree in mtDNA represents true phyletic relationships. Mutations are in linkage disequilibrium because they do not recombine. Having two divergent clades is trivial under a FW standard model Whereas the phylogram of a recombining gene represents distances between haplotypes, where mutations can seem to « appear » repeatedly on several terminal branches. They thus inform us on the existence of barrier to gene flow
29
Conclusions 1.There is no mitochondrial signature of speciation. There is no room for a barcode species concept, and anything like a « barcodon ». 2.Even a moderate sample can provide a wealth of information on the history of a species. 3.Additional information can be obtained in difficult cases, either by increasing the population sample, or by using additional markers.
30
The END
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.