Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genealogies of time structured data, an application on cave bear ancient DNA Frantz Depaulis Ludovic Orlando Catherine Hannï UMR 5534 Centre de Génétique.

Similar presentations


Presentation on theme: "Genealogies of time structured data, an application on cave bear ancient DNA Frantz Depaulis Ludovic Orlando Catherine Hannï UMR 5534 Centre de Génétique."— Presentation transcript:

1 Genealogies of time structured data, an application on cave bear ancient DNA Frantz Depaulis Ludovic Orlando Catherine Hannï UMR 5534 Centre de Génétique Moléculaire et Cellulaire Université Claude Bernard, Lyon I UMR 7625 Laboratoire d’écologie Paris 6/ENS

2 Outline of the presentation 1 Introduction: Gene genealogies 2 Results 2.1 Simulation exploratory results 2.2 Cave bear application 3 Conclusions

3 Wright Fisher Neutral model Assumptions  Selective neutrality (N e s <<1)  Demography - Isolated panmictic Population, - Constant size N - Poisson Distribution of offspring P (1) - Same sampling time  Mutational, sequence data: infinite site model (ISM) - No recombination - Independent mutations - Constant mutation rate µ Along the sequence Across time - Each mutation affects a new nucleotide site -Coalescence-

4 Genealogy of a gene sample gene sample ancestral lineage coalescence= common ancestor Most recent common ancestor (MRCA) -Coalescence-

5 Coalescent ab cde f Most recent common ancestor of the sample (MRCA) sample of “genes” / of individuals Common ancestor (CA) neutral mutations T C C G C G A A -Coalescence-

6 Constructing coalescents, abcdef 1°) Ages of the nodes t3t3 p=1/2N Exp( p ) t1t1 t2t2 t4t4 t5:t5: additional assumption: n << N p = (n (n -1)/2) /2N -Coalescence-

7 neutral mutations G T C C G C A 3°) uniform distribution of mutations gene sample Topology of the tree 2°) C A C G C G T T neutral distribution of sequence polymorphism AA A A abcdef MRCA common ancestor (CA) t1t1 t2t2 t3t3 t4t4 t5:t5: 100 000 times Constructing- deconstructing coalescents -Coalescence-

8 T A C C G C G C C G AA C TT G A A T A A C G T C C T C AA T T G A T C T A C C G C G C T G GG CC C G A A A A T Haplotype tests: simulations Haplotype tests: simulations parameters ‡ : S =8 n =6 K = 6K = 5 K = 4 10 000 simulations haplotype number K { haplotype diversity H = 1-  f i 2 H = 0.83H = 0.78H = 0.72 C C A T { Depaulis and Veuille MBE 1998 ‡ Hudson 1993... 0.20.30.40.50.60.70.80.9 H density observed H : P = 0.03 * Distribution of simulated H 0.1 -Coalescence-

9 GCGCGCGAACCCATT outgroup 121531416121423 frequencies Alignment of polymorphic sites: frequencies of mutations GCCCGCGAATCCATT GCGTGCGATCCGATT GCGTACAATCCCGTC GTGTACAATCTCGAC GCGTGGAATCCCGTT CCGCGCGGTCCCATT n =7 S =15 CTCT → T C C -Coalescence-

10 Frequency spectrum of mutations & neutrality tests f i : number of occurrence s in a sample Number of polymorphic sites = 0= 0= 0= 0 (Tajima Genetics 1989)  =4N e  H=-HH=-H (Fay and Wu Genetics 2000)(Fu and Li Genetics 1993) -Coalescence-

11 Mitochondria, correlation LD/distance recombination or mutational effects? distance d r 2 = ↘ (d ) Pearson’s statistic tested by permutations of sites Awadalla et al. (Science 1999)

12 Time structured data & genealogies - Parasites during disease evolution (virus…) - Microbial experimental evolution - Ancient DNA   Issue: - - To what extent the analyses are affected by time structure? - - How to correct for this? -Coalescence-

13 n =2 n =5 Algorithm for time structured coalescent abc def n =3 n =2 t 1 n =4 The exponential law is memoryless ! n 1 =3 - Simulations-

14 Age structure effect on gene genealogies Contemporaneous sample Limited time structure Two subsets with large time spacing Excess of rare variants Deficit of LD Deficit of rare variants Excess of LD Differentiation t 1 n 1 =4 - Simulations-

15 Pearson 10 S/S0 Effect of subset size on statistical tests : mean t 1 =0.2 N e generations Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to  ); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst. n1n1n1n1 n =40, S =20 - Simulations- -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 0.20.40.60.8 n1/n Mean DtD*flHfwZnSKH pi/pi0 Fst

16 Effect of subset size on statistical tests : significance rate Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations t 1 =0.2 N e generations n1n1n1n1 n =40, S =20 - Simulations- 0 0.05 0.1 0.15 00.10.20.30.40.50.60.70.80.91 n1/n significance rate Dt_infD*fl_infHfw_infZnS_infK_supH_supFst

17 Effect of a half subset age on statistical tests: mean Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests (K is scaled to its expected maximal value S+1 corresponding to  ); Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutations; Fst Hudson et al's (1992) Fst. t 1 n 1 =n/2 - Simulations-

18 Effect of a half subset age on statistical tests: significance rates Empty symbols: deficit of the statistics; Filled symbols: excess of the statistics. Dt: Tajima's (1989) D; D*fl Fu and Li (1993)'s D*; Hfl Fay and Wu's (2000) H; ZnS Kelly (1997)'s ZnS; K and H Depaulis and Veuille (1998)'s haplotype tests; Pearson: Pearson correlation coefficient between pairwise allelic correlation and distance between mutation tested by permutations according to Awaddala et al. (1999); Fst Hudson et al's (1992) Fst tested by permutations t 1 n 1 =n/2 - Simulations-

19 Cave bear: Ursus spelaeus (12-300kYA) - Application-

20 Sampling sites - Application-

21 Alignment of polymorphic sites: D-loop of cave bear REF TTGTCAACTT TCGAATTGAA GT #NOASC3500_40-45..A....T.C..A......... #NOASC3800_40-45..A....T.C..A......... #NOASC85F16_40-45...................... #NOASC95456_40-45..A....T.C..A......... #NOASC92386_40-45..A....T.C..A......... #NOASC92413_40-45 C.A....T.C..A......... #NOASC92152_40-45 C.A....T.C..A....... A. #NOASC5300_50-60..A....T.C..A......... #NOASC11600_80...................... #NOASC12500_80...................... #NOASC13800_80...................... #NOASC100801_80...................... #NOASC12400_80..A....T.C..A......... #NOASC11800_80.CA....T.C..A.G....... #NOASC11700_80 C.A....T.C..A....... A. #NOASC84E16_90-130 C.A....T.C..A......... #NOASC84G19_90-130 C.A....T.C..A......... #NOASCbrC5-02_90-130 C.A....T.C..A......... #NOASC15400_90-130 C.A....T.C..A......G.. #NOASC15700_90-130....T.G.C..TA..C..G... #NOATAB2_40...................... #NOAGrotteMerve_?...........T.......... #NOAAZE_80-130.....................C #NOAGigny189F3_?..A....T.C..A......... #NOAJAL104_? C.A....T.C..A......... #NOATAB15_25-35..A......C..A......... #NOAGailenreuth_?..A......C..A......... #NOA47910_30..A....T.C..A....A.... #NOAHohleFels_?..A....T.C..A..C...... #NOACLA_35..A....T.C C.A......... #NOACLB_35..A....T.C C.A......... #NOAChiemsee_35..A..G......A...C..... #NOARamesch1_?..A..G......A...C..... #NOARamesch2_?..A..G......A...C..... #NOAGeissenklt1_?...CT......T.G.C...... #NOAGeissenklt2_?...CT......T.G.C...... #NOANixloch_?...CT......T...C...... REF TTGTCAACTT TCGAATTGAA GT #NOASC3500_40-45..A....T.C..A......... #NOASC3800_40-45..A....T.C..A......... #NOASC85F16_40-45...................... #NOASC95456_40-45..A....T.C..A......... #NOASC92386_40-45..A....T.C..A......... #NOASC92413_40-45 C.A....T.C..A......... #NOASC92152_40-45 C.A....T.C..A....... A. #NOASC5300_50-60..A....T.C..A......... #NOASC11600_80...................... #NOASC12500_80...................... #NOASC13800_80...................... #NOASC100801_80...................... #NOASC12400_80..A....T.C..A......... #NOASC11800_80.CA....T.C..A.G....... #NOASC11700_80 C.A....T.C..A....... A. #NOASC84E16_90-130 C.A....T.C..A......... #NOASC84G19_90-130 C.A....T.C..A......... #NOASCbrC5-02_90-130 C.A....T.C..A......... #NOASC15400_90-130 C.A....T.C..A......G.. #NOASC15700_90-130....T.G.C..TA..C..G... #NOATAB2_40...................... #NOAGrotteMerve_?...........T.......... #NOAAZE_80-130.....................C #NOAGigny189F3_?..A....T.C..A......... #NOAJAL104_? C.A....T.C..A......... #NOATAB15_25-35..A......C..A......... #NOAGailenreuth_?..A......C..A......... #NOA47910_30..A....T.C..A....A.... #NOAHohleFels_?..A....T.C..A..C...... #NOACLA_35..A....T.C C.A......... #NOACLB_35..A....T.C C.A......... #NOAChiemsee_35..A..G......A...C..... #NOARamesch1_?..A..G......A...C..... #NOARamesch2_?..A..G......A...C..... #NOAGeissenklt1_?...CT......T.G.C...... #NOAGeissenklt2_?...CT......T.G.C...... #NOANixloch_?...CT......T...C...... --------------------------------------------- Alp barrier --------------------------------------------- Alp barrier #SOAPoto_?...CT......T...C...... #SOAVind1_?...CT......T...C...... #SOAVind2_?...CT......T...C...... #SOAConturi_?.......T.............. n =41 S =22 (Loreille et al. 2001) (Orlando et al. 2002) (Hofreiter et al. 2002) (Kühn et al. 2001) N e = 13 000 - Application-

22 Neutrality tests, Belgium cave a permutation test - Application- Statistic D t D * fl H fw K H Z nS Pearson Observed -0.82 -1.55 -1.32 7 0.79 0.24 -0.39 (2.8*) a ( P value %) (21.0) (5.3) (18.4) (16.4) (37.7) (43.7) (2.8*) Mean 0.06 -0.05 0.30 8.3 0.79 0.26 0.00 CI [-1.42;1.51] [-1.89;1.18] [-4.46;2.62] [5;11] [0.64;0.88] [0.10;0.55] [-0.25;0.20] No time structure % rejected (4.9;5.5) (5.2;2.8) (5.4;4.8) (1.7;3.9) (4.9;4.6) (5.5;5.1) (5.0;/) ( P value %) (30.0) (8.8) (17.2) (8.6) (31.2) (31.7) (2.7*) Mean -0.30 -0.38 0.39 9.1 0.80 0.22 0.00 CI [-1.56;1.26] [-1.89;0.84] [-4.04;2.56] [6;12] [0.66;0.89] [0.08;0.47] [-0.29;0.23] Average time structure % rejected (7.8;3.0) (8.2;1.0) (4.2;3.7) (0.8;9.5) (3.3;7.8) (11.5;2.9) (4.9;/) ( P value %) (30.0) (8.6) (17.4) (7.9) (30.9) (31.9) (2.8*) Mean -0.33 -0.42 0.37 9.1 0.80 0.22 0.00 CI [-1.59;1.18] [-1.89;0.84] [-4.20;2.54] [6;12] [0.66;0.89] [0.08;0.48] [-0.29;0.24] Scladina n =20 S =15 Uncertainty in time structure % rejected (9.3;2.8) (9.3;0.8) (4.5;3.6) (0.7;9.8) (3.7;7.5) (11.6;2.8) (4.8;/)

23 Neutrality tests, dated subsample a permutation test - Application- Statistic D t D * fl H fw K H Z nS Pearson Observed -1.21 -2.28 -0.69 12 0.86 0.14 -0.27 (11.4) a ( P value %) (10.5) (0.6**) (25.7) (16.5) (32.1) (24.3) (11.5) Mean -0.09 -0.08 0.29 10.3 0.82 0.23 0.00 CI [-1.49;1.50] [-1.98;1.32] [-5.66;3.18] [7;14] [0.69;0.90] [0.09;0.48] [-0.19;0.16] No time structure % rejected (5.0;5.2) (3.6;1.4) (5.3;4.7) (4.0;2.8) (5.3;4.7) (5.7;5.0) (4.7;/) ( P value %) (17.7) (1.7*) (24.3) (38.2) (42.6) (41.8) (11.2) Mean -0.42 -0.59 0.35 11.8 0.84 0.18 0.00 CI [-1.69;1.11] [-2.28;0.72] [-5.34;2.98] [8;15] [0.71;0.91] [0.07;0.39] [-0.23;0.20] Average time structure % rejected (9.3;2.1) (6.9;0.3) (4.7;2.6) (1.2;11.1) (3.4;9.5) (13.7;2.4) (4.9;/) ( P value %) (18.5) (1.9*) (23.4) (39.9) (43.2) (41.1) (11.9) Mean -0.44 -0.61 0.37 11.8 0.84 0.18 0.00 CI [-1.70;1.09] [-2.28;0.72] [-5.23;2.99] [8;16] [0.71;0.91] [0.07;0.40] [-0.24;0.19] all dated n =27, S =20 Uncertainty in time structure % rejected (9.3;2.4) (7.0;0.2) (4.6;2.7) (1.2;11.7) (3.5;9.7) (14.1;2.5) (5.4;/)

24 Neutrality tests, total sample a permutation test - Application- Statistic D t D * fl H fw K H Z nS Pearson F st Observed -0.45 -0.88 1.35 17 0.91 0.10 -0.09 (22.0) a 0.32 (0.4**) a ( P value %) (37.1) (14.7) (47.1) (1.7*) (3.7*) (18.1) (21.5) (0.4**) Mean -0.09 - 0.30 12.3 0.83 0.19 0.00 -0.03 CI [-1.44;1.52] [-1.85;1.38] [-5.84;3.15] [8;16] [0.70;0.90] [0.07;0.41] [-0.20;0.17] [-0.38;0.27] No time structure % rejected (4.5;5.3) (4.1;1.1) (4.8;4.7) (3.0;4.3) (4.8;4.9) (5.5;4.6) (4.8;/) (/;4.6) ( P value %) (45.5) (35.6) (45.6) (7.8) (5.5) (36.6) (21.8) (1.3*) Mean -0.45 -0.74 0.32 13.9 0.84 0.15 0.00 -0.01 CI [-1.71;1.10] [-2.49;0.73] [-5.38;2.93] [9;18] [0.71;0.91] [0.05;0.34] [-0.23;0.20] [-0.40;0.38] Average time structure % rejected (10.2;2.2) (10.7;0.1) (4.2;2.4) (0.8;16.1) (4.3;7.9) (15.2;2.2) (4.9;/) (/;8.9) ( P value %) (42.1) (40.7) (44.9) (10.3) (6.2) (39.2) (21.8) (1.7*) Mean -0.54 -0.90 0.26 14.3 0.84 0.14 0.00 -0.01 CI [-1.76;0.96] [-2.81;0.73] [-5.70;2.90] [10;18] [0.71;0.91] [0.05;0.32] [-0.24;0.21] [-0.40;0.41] n =41, S =22 Uncertainty in time structure % rejected (12.2;1.4) (14.2;0.1) (4.5;2.3) (0.5;19.8) (4.0;7.9) (16.7;2.1) (4.7;/) (/;9.7)

25 LD as a function of distance - Application-

26  Can substantially bias the results –Even if within 10% of the age of the MRCA bottom of the tree with more branches bottom of the tree with more branches non random subset of mutations (rare ones) non random subset of mutations (rare ones) –small: long external branches, excess of rare variants (negative D, deficit of LD) –great: a long internal branch apparent differentiation excess of intermediate frequency variants (positive D, excess of LD) if equilibrated Time structure, Conclusion

27 Acknowledgements  CNRS  Nick Barton


Download ppt "Genealogies of time structured data, an application on cave bear ancient DNA Frantz Depaulis Ludovic Orlando Catherine Hannï UMR 5534 Centre de Génétique."

Similar presentations


Ads by Google