Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phylogenetic Trees Lecture 4

Similar presentations


Presentation on theme: "Phylogenetic Trees Lecture 4"— Presentation transcript:

1 Phylogenetic Trees Lecture 4
Based on: Durbin et al Chapter 8 .

2 Phylogenetic Tree Assumptions
leaf branch internal node Topology T : bifurcating Leaves - 1…N Internal nodes N+1 … 2N-2 Lengths t = { ti } for each branch Phylogenetic tree = (Topology, Lengths) = (T, t )

3 Maximum Likelihood Approach
Consider the phylogenetic tree to be a stochastic process. AAA Unobserved AAA AGA AAA AGA Observed AAG GGA The probability of transition from character a to character b is given by parameters b|a. The probability of letter a in the root is qa. These parameters are defined via rates of change per time unit times the time unit. Given the complete tree, the probability of data is defined by the values of the b|a ’s and the qa’s.

4 Maximum Likelihood Approach
Assume each site evolves independently of the others. A G G A A G Pr(D|Tree, )=i Pr(D(i)|Tree, ) Write down the likelihood of the data (leaves sequences) given each tree. Use EM to estimate the b|a parameters. When the tree is not given: Search for the tree that maximizes Pr(D|Tree, EM)=i Pr(D(i)|Tree, EM)

5 Probabilistic Methods
The phylogenetic tree represents a generative probabilistic model (like HMMs) for the observed sequences. Background probabilities: q( a ) Mutation probabilities: P( a | b, t ) Models for evolutionary mutations Jukes Cantor Kimura 2-parameter model Such models are used to derive the probabilities

6 Jukes Cantor model A model for mutation rates
Mutation occurs at a constant rate Each nucleotide is equally likely to mutate into any other nucleotide with rate a.

7 The Jukes-Cantor model (1969)
We need to develop a formula for DNA evolution via Prob(y | x, t) where x and y are taken from {A, C, G, T} and t is the time length. Jukes-Cantor assumes equal rate of change: -3 G A T C

8 The Jukes-Cantor model (Cont.)
We denote by S(t) the transition probabilities: We assume the matrix is multiplicative in the sense that: S ( t + s ) = S ( t ) S ( s ) for any time lengths s or t .

9 The Jukes-Cantor model (Cont.)
For a short time period , we write: By multiplicatively: S(t+ ) = S(t) S()  S(t)(I+R) Hence: [S(t+ ) - S(t)] /  S(t) R Leading to the linear differential equation: S’ (t)  S(t)R With the additional condition that in the limit as t goes to infinity:

10 The Jukes-Cantor model (Cont.)
Substituting S(t) into the differential equation yields: Yielding the unique solution which is known as the Jukes-Cantor model:

11 Kimura 2-parameter model
Allows a different rate for transitions and transversions.

12 Kimura’s K2P model (1980) Jukes-Cantor model does not take into account that transitions rates (between purines) AG and (between pyrmidine) CT are different from transversions rates of AC, AT, CG, GT. Kimura used a different rate matrix:

13 Kimura’s K2P model (Cont.)
Leading using similar methods to: Where:

14 Mutation Probabilities
Both models satisfy the following properties: Lack of memory: Reversibility: Exist stationary probabilities { Pa } s. t. A G T C

15 Probabilistic Approach
Given P,q, the tree topology and branch lengths, we can compute: x5 t4 x4 t1 t2 t3 x1 x2 x3

16 1. Calculate likelihood for each site on a specific tree.
2. Sum up the L values for all sites on the tree. 3. Compare the L value for all possible trees. 4. Choose tree with highest L value.

17 Computing the Tree Likelihood
We are interested in the probability of observed data given tree and branch “lengths”: Computed by summing over internal nodes This can be done efficiently using a tree upward traversal pass.

18 Tree Likelihood Computation
Define P( Lk | a ) = prob. of leaves below node k given that xk = a Init: for leaves: P( Lk | a ) = 1 if xk = a ; 0 otherwise Iteration: if k is node with children i and j , then Termination:Likelihood is

19 Maximum Likelihood (ML)
Score each tree by Assumption of independent positions “m” Branch lengths t can be optimized Gradient Ascent EM We look for the highest scoring tree Exhaustive Sampling methods (Metropolis)

20 Parametric optimization (EM)
Optimal Tree Search Perform search over possible topologies Parameter space T1 T2 T3 Parametric optimization (EM) Local Maxima T4 Tn

21 Computational Problem
Such procedures are computationally expensive! Computation of optimal parameters, per candidate, requires non-trivial optimization step. Spend non-negligible computation on a candidate, even if it is a low scoring one. In practice, such learning procedures can only consider small sets of candidate structures


Download ppt "Phylogenetic Trees Lecture 4"

Similar presentations


Ads by Google