Mareike Fischer Revisiting the question: How many characters are needed to reconstruct the true tree? Mareike Fischer and Marta Casanellas Isaac Newton Institute, 20 June 2011
Mareike Fischer The Problem Given: Alignment (e.g. DNA) Wanted: Reconstruction of the ‘true’ tree Solution: e.g. But: Is the alignment long enough for a reliable reconstruction?
Mareike Fischer Previous Approaches 1.Churchill, von Haeseler, Navidi (1992) 4 taxa scenario Observations: The probability of reconstructing the true tree increases with the length of the interior edge. more characters Rec. Prob. int. edge
Mareike Fischer Previous Approaches 2. Yang (1998) 4 taxa scenario, interior edge ‘fixed’ at 5% of tree length 5 different tree-shapes were investigated Observations: ‘Farris Zone’: MP better ‘Felsenstein Zone’: ML better The optimal length for the interior edge ranges between and Tree length Rec. Prob.
Mareike Fischer Limitation of previous approaches The approaches mentioned so far are based on simulations Still needed: Mathematical analysis of influence of branch lengths on tree reconstruction
Mareike Fischer Previous theoretical results x y y y y Here, the number k of characters needed to reconstruct the true tree grows at rate. But what happens if we fix the ratio (y:=px), and then take the value of x that minimizes k? Steel and Székely (2002):
Mareike Fischer Previous theoretical results Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, 2-state symmetric model x px MF and M. Steel (2009): Sequence length needed to reliably reconstruct the tree grows at rate p 2
Mareike Fischer Limitation of previous approaches Previous approaches are based on simulations, or employ only 2 states (oops ) Still needed: Mathematical analysis of influence of branch lengths on tree reconstruction for 4 states, so…
Mareike Fischer Our Approach Setting: 4 taxa, pending edges of length px (with p>1), short interior edge of length x, Jukes Cantor model x px
Mareike Fischer Most importantly… We kindly apologize for criticizing Miss Parsimony in the past and... Remorsefully offer her an assistant position on our current project. MISS PARSIMONY
Mareike Fischer Main Result k grows at least at rate p 2 For optimal value of x, k grows at rate p 2, so this rate can be achieved for 4 states, too! For reliable MP reconstruction:
Mareike Fischer Idea of Proof. Then (by CLT) Set X i i.i.d., and Note that the true tree T 1 will be favored over T 2 if and only if Z k >0.
Mareike Fischer Idea of Proof Since the X i are i.i.d., μ k and σ k depend only on k and the probabilities P(X 1 =1) and P(X 1 =-1). These probabilities can be calculated (e.g. using Felsenstein, Hadamard or Fourier Transform): (Here, θ=e -4/3x.) Then, for fixed p, the ratio to find a value of x that minimizes k. Note that P(X 1 =1) and P(X 1 =-1) only depend on x and p. can be used
Mareike Fischer Idea of Proof: 2. X i are i.i.d. Since the X i are i.i.d., we have
Mareike Fischer Summary and Extension For MP, the number k of characters needed to reliably reconstruct the true tree grows at rate p 2. What about other methods? Can they do better (e.g. rate p)?
Mareike Fischer Extension Other methods cannot do better!!! (Can be shown using the so-called Hellinger distance.)
Mareike Fischer The Hellinger Distance S: set of site patterns p, q: probability distributions
Mareike Fischer Outlook Questions for future work: What happens when you approach the Felsenstein Zone? What happens in general with different tree shapes or more taxa?
Mareike Fischer Advantages of mathematics… Questions so far? Else, let’s finally see why boring maths formulas can be less frustrating than biology at times…
Mareike Fischer Thanks… … to Marta Casanellas … to the WWTF and the CRM for funding, … to Roger Hargreaves for his terrific cartoons, … to YOU for listening (or at least waking up just on time to read this message ).