Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10 – Models of DNA Sequence Evolution

Similar presentations


Presentation on theme: "Lecture 10 – Models of DNA Sequence Evolution"— Presentation transcript:

1 Lecture 10 – Models of DNA Sequence Evolution
Correct for multiple substitutions in calculating pairwise genetic distances. Derive transformation probabilities for likelihood-based methods. Prob(Rr | t ) = pm x Pm,k(v3,1) x Pk,A(v1,w) x Pk,G(v1,x) x Pm,l(v3,2) x Pl,C(v2,y) x Pl,C(v2,z) It’s the Pi,j’s that we need a substitution model to calculate. The models typically used are Markov processes. Poisson process is a stochastic process that can be used to model events in time. The time between events is exponentially distributed, with rate l.

2 Jukes-Cantor Model The probability of a site remaining constant is: pii(t) = ¼ + ¾ e-4at The probability of a site changing is : pij(t) = ¼ - ¼ e-4at a is the rate at which any nucleotide changes to any other per unit time. Given that the state at the site is i at t0, we start by estimating the probability of state i at that site at t1. pi(0) = 1 pi(1) = 1-3a

3 Jukes-Cantor Model Now, what’s the probability of this site having state i at t2 ? There are two ways for the site to have state i at t2: 1 – It still hasn’t changed since time t0. (1 – 3a) pi(1) = probability of no change at the site during time t2, (1-3a), times the probability of the site having state i at time t1, (pi(1)). 2 – It has changed to something else and back again. and a(1-pi(1)) = probability of a change to i, (a), times the probability that the site is not state i at time t1, (1 - pi(1)). Therefore, pi(2) = (1 – 3a) pi(1) + a (1 – pi(1)), where

4 Jukes-Cantor Model We have a recurrence equation.
pi(t+1) = (1 - 3a) pi(t) + a (1 – pi(t)) = pi(t) - 3api(t) + a – api(t) We can calculate the change in pi(t) across time, Dt. pi(t+1) – pi(t) = -3api(t) + a – api(t) so and

5 given in terms of its initial state.
Jukes-Cantor Model pi(t) = 1/4 + (pi(0) – 1/4) e -4at We have a probability that a site has a particular nucleotide after time t, given in terms of its initial state. If i = j, pi(0) = 1. Therefore, pii(t) = 1/4 + 3/4 e -4at If i not = j, pi(0) = 0, and pij(t) = 1/4 - 1/4 e -4at is an instantaneous rate, so we’ve modeled branch length (rate x time) explicitly in our expectations.

6 The JC model makes several assumptions.
1) All substitutions are equally likely; we have a single substitution type. 2) Base frequencies are assumed to be equal; each of the four nucleotides occurs at 25% of sites. 3) Each site has the same probability of experiencing a substitution as any other; we have an equal-rates model. 4) The process is constant through time. -3a a a a a -3a a a Q = a a -3a a   a a a -3a Q - matrix 5) Sites are independent of each other. 6) Substitution is a Markov process.

7 Substitution types and base frequencies.
For the general case: -m(apC + bpG + cpT) mapC mbpG mcpT mgpA -m(gpA + dpG - epT) mdpG mepT Q = mhpA mjpC -m(hpA + jpC + fpT) mfpT mipA mkpC mlpG -m(ipA + kpC + lpG) where, m = the average instantaneous substitution rate, a, b, c, …, l are relative rate parameters (one of them is set to 1). and pi’s are the frequencies of the base that is being substituted to. Note that this is not symmetric, and therefore, the full model is non-reversible. a = g, b = h, c = i, d = j, e = k, & f = l.

8 Substitution types and base frequencies.
General Time-Reversible Model -m(apC + bpG + cpT) mapC mbpG mcpT mapA -m(apA + dpG + epT) mdpG mepT Q = mbpA mdpC -m(bpA + dpC + fpT) mfpT mcpA mepC mfpG -m(cpA + epC + fpG) There are six relative transformation rates (one of which is set to 1). There are four base frequencies that must sum to 1. Note that this is not a symmetric matrix, but it can be decomposed into R and P.

9 Substitution types and base frequencies.
-m(a+b+c) ma mb mc ma -m(a+d+e) md me R = mb md -m(b+d+f) mf mc me mf -m(c+e+f) Visual GTR pA 0 pC 0 0 P = 0 0 pG 0 pT

10 Common Simplifications
Transition type substitutions occur at a higher rate than transversion substitutions. K2P Model was the first to address this. So we set b = e = k (for transitions), and a = c = d = f = 1 (for transversions) . All pi = ¼ -(m)(k + 2)/ m/ mk/ m/4 m/4 -(m)(k + 2)/ m/ mk/4 for K2P: Q = mk/ m/4 -(m)(k + 2)/ m/4 m/ mk/ m/4 -(m)(k + 2)/4 where a = mk/4 and b = m/4. Thus, k = a / b and

11 Hasegawa-Kishino-Yano (HKY) Model
-m(kpG + pY) mpC mkpG mpT mpA -m(kpT + pR) mpG mkp for HKY: Q = mkpA mpC m(kpA + pY) mpT mpA mkpC mpG m(kpC + pR) where a = mk, b = m, pR = pA + pG, and pY = pC + pT. There are lots of other models that restrict the Q-matrix.

12 Some common models There are 203 special cases of the GTR, 406 if we allow for equal base frequencies.

13 Calculating Transformation Probabilities.
So the Q & R matrices we’ve been discussing define the instantaneous rates of substitutions from one nucleotide to another. Convert the rates to probabilities by matrix exponentiation: P(t) = e Qt Jukes-Cantor K2P Again, it’s these Pij that are used in the likelihood function.


Download ppt "Lecture 10 – Models of DNA Sequence Evolution"

Similar presentations


Ads by Google