Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pete Lockhart Massey University Allan Wilson Centre, New Zealand

Similar presentations


Presentation on theme: "Pete Lockhart Massey University Allan Wilson Centre, New Zealand"— Presentation transcript:

1 Pete Lockhart Massey University Allan Wilson Centre, New Zealand

2 Can we reconstruct the evolutionary history of ancient divergences from analyses of protein sequences? It is in such cases that model misspecification is most important

3 If the substitution model is misspecified it can: (1) reduce reconstruction accuracy (and not favour a particular topology) e.g. such as can happen when you assume all sites are variable when some are invariable Or e.g. if you assume sequences have evolved under one evolutionary model when they may evolved under more than one model

4 Felsenstein (1978) A B C D A B C D out Hendy&Penny(1989)

5 PNAS (1996) 93, pp

6 If the substitution model is misspecified it can:
(2) induce topological distortion Such a might be the case if you assume a stationary distribution of amino acid or DNA bases and some lineages have increased proportions of some residues Or if you assume a constant proportion of variable sites and some lineages have increased in their proportion

7 Biosynthesis of chlorophyll and bacteriochlorophyll
PNAS (1996) 93, pp

8 bchL chlL chlL ? bchX nifH

9 Asymmetrical rate variation (XTSRV)
rRNA, EF-1,  -tubulin, RPBI, actin e.g. Embly and Hirt (1998) Current Opinion in Genetics and Development 8, ; Philippe et al. (2000) Proc R Soc Lond B 267, ; Inagaki et al. (2004) MBE ; Guo and Stiller (2005) MBE 22,

10 Eukaryotic RNA Polymerase II Evolution
core functions co-evolution opportunistic interactions Guo and Stiller (2005) MBE 22,

11 “in different lineages, co-evolution of proteins canalizes the evolution of a protein in different directions” Lopez et al MBE 19, 1-7 ..”some of the EF-1 auxillary functions may have been lost/weakened during the reductive evolution of microsporidia” Inagaki et al 2004 MBE 21,

12 spatial heterogeneity
tempo of evolution of genes encoding core metabolic processes in the photosynthetic apparatus is highly constrained by protein-protein, protein-lipid, and protein-cofactor interactions (collectively called "protein interactions").

13 Rates Across Sites (Uzzell and Corbin 1971)
slow fast etc slow =20 =5 =1 =0.1 Yang (1994) =0.5

14 An alternative model Fitch and Markowitch (1970)
plant animal The concept of Heterotachy was motivated not only by Herve’s own observations but also by observations from many others including those of Walter Fitch, and colleagues – again these authors took note of sites that were unvaried in some groups but varied in others (N3, N4) – and that there seemed to be too many of these sites to be explained by simple rates across models. To explain their observations, Fitch and Markowitch proposed a covarion model that at an given time only a small proportion of sites were able to accept substitutions. In mammalian cytochrome C they thought this was about 10% of the sites. Over time the proportion of variable sites remained the same but the actual sites that could accept substitutions changed so that a site that was invariable might become variable in a lineage and correspondingly a site that was variable might become invariable. ~ only 10% sites variable at any given time N4

15 covarion Tuffley and Steel (1998) Huelsenbeck (2002) slow off on fast
1 S01 S10 S01 slow 1 S10 S01 off on fast 1 S10 S01 faster 1 S10 off on

16 covarion R1 S11 S11 R2 R2 S11 S11 S11 R3 S11 R4 Galtier (2001)

17 “the number of variable positions can be different between lineages (Germot and Philippe 1999), suggesting that a constant c is a limitation of the covarion model…” Herve Philippe was intrigued by the covarion model but felt it was too restrictive for describing protein evolution - he and others have suggested that the proportion of variable sites in different lineages might also change through time. The covarion model of Fitch and Markowitch and all current implementations of covarion models have an expectation for a constant proportion of variable sites across all taxa. Lopez, Casane & Philippe (2002) Mol Biol Evol 19: 1-7

18 increased rate in B&C increased pvar in B&C
A B C D increased pvar in B&C A B C D A B C D A B D C Sys Biol (2005) 54,

19 mixtures can also be used to simulate changing pvar
long branch attraction A D B C induced topological distortion B D on A on C Now Mike Steel recognised about 5 years ago that you could also use mixtures to simulate change pvar – the embarrassing thing is that I have just got round to running a few simulations. The sort of question you can ask is whether how many variable sites do you need to turn on in non adjacent lineages, before tree building starts to get into trouble. What happens as you increase the proportion of variable sites in a lineage is that the branches become longer and eventually this topological distortion leads to long branch attraction.

20 Simulation: 0-0.3 invariable sites switch on in B+C (TS98)
assume ancestral pvar = 0.2 x = point where we increase pvar in B+C simulate with seqgen-cov.exe reconstruct with PAUP* (assuming simple model), report support for each of 3 unrooted trees AB|CD, AD|BC, AC|BD A B C D 0.3 0.3 0.4 0.4 0.1 0.1 So here is a simple simulation – lets assume an ancestral proportion of variable sites of 20% and at an aribitrary point (1/3 of the way along edges leading to B+C) we increase the proportion of variable sites by 2-30% (convergently – so sites turning on in B are also turning on in C). We ask the question how much increase in variable sites do you need to make before methods start getting into trouble? We have also studied this in two cases – where the underlying substitution model is JC69 and the other is JC69+TS. I have reconstructed trees from this simulated data assuming a Jukes Cantor model and making some allowance for rate variation using gamma or invariable sites model So we have two types of model misspecification going on when the data have been simulated assuming and Tuffley and Steel model and when the proportion of variable sites in increasing Increased Pvar will induces a topological distortion TS will not induce a topological distortion but it may reduce the reconstruction accuracy 0.02 0.02

21 Here are some first results when we assume a constant proportion of invariable sites in the reconstruction. For each treatment we have simulated 100 data sets, where the sequences are 10,000 in length and we are looking at the reconstruction accuracy. So for example when you don’t increase pvar ML has a good reconstruction accruacy, but as you increase pvar the accuracy falls. By the time you have made 8% of the invariable sites variable the method is struggling.

22 You see a similar result if, instead of assuming a constant proportion of variable sites in the reconstruction you estimate the proportion of variable sites for every dataset and treatment.

23 A worrying result concerns the case where you reconstruct assuming a gamma distribution and you estimate alpha. By optimising on this misspecified model you actually increase your chance of finding an correct topology.

24 Interestingly if you assume gamma + I and estimate the relevant parametres then reconstruction accuracy falls but the wrong tree is not greatly favoured as was the case when gamma was used alone.

25 Good ol parsimony looks isnt terribly good – the results look like the worst case we found with ML – ie when we assumed gamma and estimate alpha

26 Summary Lineage specific differences in structural and functional constraint will affect which sites vary and how many of them vary Lineage specific changes in proportions of variable sites motivated the concept of heterotachy Simulations suggest that a relatively small increase in the proportion of variable sites in non adjacent lineages is a problem for reconstruction accuracy

27 Ellen Nisbet Chris Howe Bill Martin Nicole Gruenheit Mike Steel PLG organisers Microsoft

28 Heterotachy fast slow fast slow Lopez et al. 2002 MBE 19, 1-7
Herve Philippe and colleagues coined the concept of heterotachy based on inferences of lineage specific differences in rates of evolution. They relied on observations such as those in the slide to infer that, in some proteins, some lineages were evolving faster at some sites. Herve was particularly worried about the effect heterotachy might have when building trees, - because it is well known from earlier work of Felsenstein (1978) that the disproportionate rates of evolution in different lineages can induce topological distortion and lead to the general long branches attraction problem discussed by Hendy and Penny in 1989. Lopez et al MBE 19, 1-7

29 Philippe et al. BMC Evolutionary Biology 2005, 5:50

30


Download ppt "Pete Lockhart Massey University Allan Wilson Centre, New Zealand"

Similar presentations


Ads by Google