Download presentation
Presentation is loading. Please wait.
Published byAdela Collins Modified over 9 years ago
1
Lecture 10 – Models of DNA Sequence Evolution Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities for likelihood-based methods. Prob(R r | ) = x P m,k (v 3,1 ) x P k,A (v 1,w ) x P k,G (v 1,x ) x P m,l (v 3,2 ) x P l,C (v 2,y ) x P l,C (v 2,z ) It’s the P i,j ’s that we need a substitution model to calculate. The models typically used are Markov processes. Poisson process is a stochastic process that can be used to model events in time. The time between events is exponentially distributed, with rate.
2
Jukes-Cantor Model The probability of a site remaining constant is: p ii(t) = ¼ + ¾ e -4at The probability of a site changing is : p ij(t) = ¼ - ¼ e -4at is the rate at which any nucleotide changes to any other per unit time. Given that the state at the site is i at t 0, we start by estimating the probability of state i at that site at t 1. p i(0) = 1 p i(1) = 1-3
3
Now, what’s the probability of this site having state i at t 2 There are two ways for the site to have state i at t 2 : 1 – It still hasn’t changed since time t 0. 2 – It has changed to something else and back again. Therefore, p i(2) = (1 – 3 ) p i(1) + [1 – p i(1) ], where (1 – 3a) p i(1) = probability of no change at the site during time t 2, (1-3 ), times the probability of the site having state i at time t 1, (p i(1) ). and [1-p i(1) ] = probability of a change to i, ( ), times the probability that the site is not state i at time t 1, (1-p i(1) ) Jukes-Cantor Model
4
We have a recurrence equation. p i(t+1) = (1 - 3 ) p i(t) + [1 – p i(t) ] = p i(t) - 3 p i(t) + – p i(t) We can calculate the change in p i(t) across time, t. p i(t+1) – p i(t) = -3 p i(t) + – p i(t) so and
5
Jukes-Cantor Model p i(t) = 1/4 + (p i(0) – 1/4) e -4 t We have a probability that a site has a particular nucleotide after time t, given in terms of its initial state. If i = j, p i(0) = 1. Therefore, p ii(t) = 1/4 + 3/4 e -4 t If i not = j, p i(0) = 0, and p ij(t) = 1/4 - 1/4 e -4 t is an instantaneous rate, so we’ve modeled branch length (rate x time) explicitly in our expectations.
6
The JC model makes several assumptions. 1) All substitutions are equally likely; we have a single substitution type. 2) Base frequencies are assumed to be equal; each of the four nucleotides occurs at 25% of sites. 3) Each site has the same probability of experiencing a substitution as any other; we have an equal-rates model. 4) The process is constant through time. 5) Sites are independent of each other. 6) Substitution is a Markov process. Q = Q - matrix
7
Substitution types and base frequencies. - (a C + b G + c T ) a C b G c T g A - (g A + d G - e T ) d G e T Q = h A j C - (h A + j C + f T ) f T i A k C l G - (i A + k C + l G ) For the general case: where, = the average instantaneous substitution rate, a, b, c, …, l are relative rate parameters (one of them is set to 1). and i ’s are the frequencies of the base that is being substituted to. Note that this is not symmetric, and therefore, the full model is non-reversible. a = g, b = h, c = i, d = j, e = k, & f = l.
8
Substitution types and base frequencies. - (a C + b G + c T ) a C b G c T a A - (a A + d G + e T ) d G e T Q = b A d C - (b A + d C + f T ) f T c A e C f G - (c A + e C + f G ) General Time-Reversible Model There are six relative transformation rates (one of which is set to 1). There are four base frequencies that must sum to 1. Note that this is not a symmetric matrix, but it can be decomposed into R and .
9
Substitution types and base frequencies. - (a+b+c) a b c a - (a+d+e) d e R = b d - (b+d+f) f c e f - (c+e+f) A 000 0 C 00 = 00 G 0 000 T Visual GTR
10
Common Simplifications Transition type substitutions occur at a higher rate than transversion substitutions. K2P Model was the first to address this. So we set b = e = (for transitions), and a = c = d = f = 1 (for transversions). -( )( + 2)/4 /4 /4 /4 /4-( )( + 2)/4 /4 /4 for K2P: Q = /4 /4-( )( + 2)/4 /4 /4 /4 /4-( )( + 2)/4 All i = ¼ where = /4 and = /4. Thus, = and
11
Hasegawa-Kishino-Yano (HKY) Model - ( G + Y ) C G T A - ( T + R ) G for HKY: Q = C - ( A + Y ) T A C G - ( C + R ) where = R = A + G, and Y = C + T. There are lots of other models that restrict the Q-matrix.
12
Some common models There are 203 special cases of the GTR, 406 if we allow for equal base frequencies.
13
Calculating Transformation Probabilities. So the Q & R matrices we’ve been discussing define the instantaneous rates of substitutions from one nucleotide to another. Convert the rates to probabilities by matrix exponentiation: P(t) = e Qt Jukes-Cantor K2P Again, it’s these P ij that are used in the likelihood function.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.