Download presentation
Presentation is loading. Please wait.
1
How to date Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca
2
Slide 2 Objectives Two major objectives of molecular phylogenetics –Branching patterns (speciation or gene duplication events) –Dating of the speciation or gene duplication events Classification of methods –Criteria used Maximum likelihood (ML) method (e.g., PAML) Bayesian methods (e.g., BEAST) Least-squares (LS) method based on distance-based matrices (e.g., DAMBE) –Hard- or soft-bound –Global or local clock Data needed –A topology –A set of aligned sequences OR a distance matrix satisfying the molecular clock hypothesis (either globally or locally) –Calibration points One or more from fossil record From sampling time for rapidly evolving species (e.g., RNA viruses)
3
Rationale Given two species i and j, we can compute the evolutionary distance between them (d ij which is the number of substitutions per site), but how do we know the time (t) that these two species have diverged from their common ancestor? Knowing d ij alone does not give us time. We also need to know the rate of change (r, which is equivalent to speed). If we know that a runner runs at a constant speed of 10 km/hr and that he has covered a distance of 20 km, then he has run 2 hours ( = 20km/(10 km/hr) If we know that r is 0.02 substitutions/myr, and d ij = 0.04 substitutions, then the two species have diverged from each other 2 myr (or 1 million each from their common ancestor). Estimation: r (rate) and T (time). Slide 3
4
Slide 4 The LS method in linear regression XYR(Residual) 311.5a+b*3 – 11.5 27.5a+b*2 – 7.5 15a+b*1 – 5 414a+b*4 – 14 Y = a + b x RSS = 0 means a perfect fit of the linear model to the data. A large RSS means a poor fit.
5
Slide 5 The rational of the LS method 4 Sp1 Sp2 d 12 Sp3 d 13 d 23 Sp4 d 14 d 24 d 34 t3t3 t2t2 T1T1 4 2 1 3
6
Slide 6 Multiple calibration points 4 Sp1 Sp2 d 12 Sp3 d 13 d 23 Sp4 d 14 d 24 d 34 T3T3 t2t2 T1T1 4 2 1 3
7
Slide 7
8
human chimpanzee bonobo 1.818±0.180 5.487±0.434 gorilla 7.258±0.530 orangutan sumatran 3.206±0.280 14.757±0.217 gibbon 20.903±1.503 Soft calibration point = 14 million years Soft calibration point = 7 million years 1.754±0.184 7±0 7.079±0.527 3.104±0.273 14±0 20.655±1.221 human chimpanzee bonobo gorilla orangutan sumatran gibbon Hard calibration point = 14 million years Hard calibration point = 7 million years a) b)
9
OTU1 OTU2 OTU3 OTU4 5r15r1 6 3 3 2 2r22r2 OTU1 OTU2 7 OTU3 10 7 OTU4 16 13 12 T 1 = 10 t 2 = 5 t 3 = 1.6667 OTU1 OTU2 OTU3 OTU4 T 1 = 10 t 2 = 6.2195 t 3 = 5.1220 RSS = 13.1667 r = 0.6833 RSS = 0 r 0 = 0.6, r 1 = 3, r 2 = 1.2 a) b) c) Dating with local clocks
10
Slide 10 RY07 BEAST Method comparison
11
Galago Loris 46.073±5.575 Varecia Eulemur Lemur Hapalemur 9.608±1.533 14.668±1.861 18.125±2.285 Propithecus 26.049±2.955 Daubentonia 49.231±4.101 66.992±5.038 Callithrix Macaca Pongo Gorilla Homo Pan 7.890±1.308 9.450±1.522 12.988±2.053 32.564±4.103 56.059±5.436 78.210±1.871 Lepilemur M.murinus M.griseorufus 7.089±1.119 M.sambiranensis M.rufus2 4.348±0.845 M.rufus1 M.myoxinus M.berthae 2.3±0.4 2.2±0.4 4.617±0.808 M.tavaratra 5.3±0.9 M.ravelobensis 7.9±1.4 9.7±1.3 Mirza 21.639±2.906 Cheirogaleus 26.761±3.528 37.682±3.785 36.351±2.849 calibration time = 77 Myr calibration time = 35 Myr calibration time = 10 Myr
12
Homo Macaca Daubentonia M.myoxinus Gorilla Loris M.murinus M.rufus1 M.sambiranensis Mirza Galago Lemur M.tavaratra Varecia Cheirogaleus M.ravelobensis Hapalemur M.griseorufus Pan Propithecus M.rufus2 Lepilemur Eulemur Callithrix Pongo M.berthae [6.8351,11.8498] [20.9195,33.1815] [11.2396,17.6519] [1.3,2.6] [3.7,6.0] [53.26,79.827] [10.9426,17.9562] [18.7945,29.488] [68.927,92.9419] [3.3,5.3] [8.059,12.1832] [13.8197,21.595] [2.7447,4.8025] [1.6314,2.9896] [27.9235,37.5614] [39.2193,60.6649] [22.6131,34.6002] [4.8645,8.6395] [31.6677,53.3297] [46.4962,69.6653] [25.534,38.8127] [7.3,11.7] [14.5445,23.4733] [5.7784,9.3783] [5.5629,9.3534] calibration time = 77 my calibration time = 35 my calibration time = 10 my
14
Slide 14 Rationale of Tip-Dating RSS=(d 12 /r+15-2*t 1 ) 2 +(d 13 /r+10-2*t 3 ) 2 +(d 14 /r+20-2*t 3 ) 2 +(d 15 /r+30-2*t 5 ) 2 +(d 16 /r+25-2*t 5 ) 2 +(d 23 /r+15+10-2*t 3 ) 2 +(d 24 /r+15+20-2*t 3 ) 2 +(d 25 /r+15+30-2*t 5 ) 2 +(d 26 /r+15+25-2*t 5 ) 2 +(d 34 /r+10+20-2*t 2 ) 2 +(d 35 /r+10+30-2*t 5 ) 2 +(d 36 /r+10+25-2*t 5 ) 2 +(d 45 /r+20+30-2*t 5 ) 2 +(d 46 /r+20+25-2*t 5 ) 2 +(d 56 /r+30+25-2*t 4 ) 2 s1@1990 s2@1975 t 1 =? s4@1970 t 2 =? t 3 =? s5@1960 s6@1965 t 4 =? t 5 =? 15 yr 25 yr 30 yr 20 yr 10 yr s3@1980 20 50 r = 0.01 40 30 40
15
Slide 15 Final dated tree s1@1990 s2@1975 1970 s3@1980 s4@1970 1960 1950 s5@1960 s6@1965 1950 1940
16
Slide 16 Dates with standard deviation S1@1980 S2@1965 S6@1970 S5@2000 S3@1945 S4@1968 S7@1962 S8@1985 1,902.81±8.42 1,875.51±13.38 1,817.23±16.71 1,791.98±21.08 1,792.77±22.00 1,770.96±24.30 1,766.78±25.64
17
Slide 17 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 Dating and cospeciation P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14
18
Dating and cospeciation H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 12 10 8 6 4 2 00 2 4 6 8 10 12
21
Dating gene duplication S11A S12A S10A S9A S8A S7A S6A S5A S4A S3A S1A S2A S6B S5B S4B S3B S1B S2B Dating gene duplication events: 1. The gene duplication event occurred at T 0. 2. Two approaches are used to approximate T 0. 1)If the duplicated genes, or their third codon positions, conform to molecular clock, then estimate T 0 (next slide) 2)If duplicated genes do not conform to molecular clock, then use genes that do conform to molecular clock to estimate T 1, which underestimates T 0. T0T0 T1T1 T2T2
22
Estimating T 0 S11A S12A S10A S9A S8A S7A S6A S5A S4A S3A S1A S2A S6B S5B S4B S3B S1B S2B T0T0 T1T1 T2T2 T1'T1' T2'T2' T3T3 T4T4 T5T5 T3'T3' T4'T4' T5'T5' T i and T i ' can be estimated with either nonsynonymous substitutions or synonymous substitutions, designated T i.N and T i.N ', and T i.S and T i.S ' Ideally, both paralogous genes conform to molecular clock and T i = T i ', i = 1..5. : 1.T i.N T i.N ', and T i.S T i.S ': Rare. 2.T i.N T i.N ', and T i.S T i.S ': Very rare 3.T i.N T i.N ', and T i.S T i.S ': Common 4.T i.N T i.N ', and T i.S T i.S ': Most common.
23
Dating gene duplication S11A S12A S10A S9A S8A S7A S6A S5A S4A S3A S1A S2A S6B S5B S4B S3B S1B S2B N The loss of the paralogous lineage leading to S5B and S6B (or S5A and S6A) leads to an underestimate of the gene duplication time, shifting from Node M to Node N. Failing to sample the lineage leading to S5B and S6B has the same effect. M
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.