Presentation is loading. Please wait.

Presentation is loading. Please wait.

Available at DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic.

Similar presentations


Presentation on theme: "Available at DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic."— Presentation transcript:

1 Available at http://planet.uwc.ac.za/nisl DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction Maria Eugenia D’Amato BCB 705:Biodiversity

2 Organization of the presentation Phylogenetic reconstruction Networks Multivariate analysis Distance ML MP

3 Characters: Characters: independent homologous Continuous Discrete Binary Multistate

4 DNA sequence characters Alignment = hypothesizing of a homology relationship for each site Sequence comparison BLAST search - GenBank Coding sequenceblastn blastx Non-coding DNA blastn

5 Blast search results Score E Score E Sequences producing significant alignments: (Bits) Value gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5... 101 3e-18 gi|343991|dbj|D10368.1|FRGMTURF2 Rana catesbeiana mitochondri... 97.6 5e-17 gi|14209845|gb|AF314017.1|AF314017 Rana sylvatica NADH dehydr... 93.7 8e-16 The lower the E-value, the better the alignment GeneBank Accession numbers for the sequence Species that match the query

6 Blast search results >gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5, ND1, ND2 genes for NADH dehydrogenasegi|87299397|dbj|AB239568.1| subunit 5, NADH dehydrogenase subunit 1, NADH dehydrogenase subunit 2, complete cds Length=10814 Identities = 99/115 (86%), Gaps = 0/115 (0%) Strand=Plus/Minus 5’end Score = 101 bits (51), Expect = 3e-18 Query 451 TTAGTTGAGGATTAAATTTTAGGATAATAACTATTCAGCCGAGGTGGCTGATGGAAGAAA 510 ||||||||||||||||||||| ||||||| ||||||||| ||||| | |||||||| | Sbjct 10203 TTAGTTGAGGATTAAATTTTAAAATAATAAGTATTCAGCCCAGGTGACCAATGGAAGAGA 10144 Query 511 AAGCTAAAATTTTACGTAGTTGTGTTTGGCTAATGCCGCCTCATCCGCCTACAAG 565 | |||| ||||||||||||||| |||||| |||| || ||||| || |||||||| Sbjct 10143 AGGCTATAATTTTACGTAGTTGAGTTTGGTTAATACCCCCTCAACCTCCTACAAG 10089 Description of the genes contained in the sequence with this Accession number Strands aligned alignment

7 Phylogenetic reconstruction Phylogenetic reconstruction Distance methods C1 C2 C3 C4 C5 C6 C7 1234512345 Distance criterion Similarity / dissimilarity criterion dendrogram 5 x 5 5 X 7

8 Distances criterion for binary data a a + b + c a = bands common to a and b b = bands exclusive to a c = bands exclusive to b J = (x1, y1) (x2, y2) Jaccard’s distance Manhattan distance M = P1 P2  (x1-x2) 2 + (x2-y2) 2 Euclidean distance

9 Distance criterion for DNA data- Distance criterion for DNA data- Models of DNA susbstitution p = n of different nucleotides/ total n nucleotides f AA f AC f AG f AT f CA f CC f CG f CT f GA f GC f GG f GT f TA f TC f TG f TT Fxy = a b c d e f g h i j k l m n o p Fxy =

10 Models of DNA susbstitution Jukes and Cantor D = 1 – ( a + f + k + p) dxy = - ¾ ln (1- 4/3 D) F81 B = 1 – (  2 A +  2 C +  2 G +  2 T ) dxy = - B ln (1- D/B) Equal rate Unequal base freqs K2P P = c + h + i + nTransitions Q = b + d + e + g + j + l + m + oTransversions 1 1-2P-Q dxy = 1 ln 2 1 ln 1 4 1-2Q +

11 Distances criterion for diploid data Dn -ln Jx i y i  Jx i Jy i Nei 1972 = I Jx = xi 2 Jx = yi 2 Jxy = xiyi Cavalli Sforza 1967 Darc =  (1/L)  (2  /  ) 2  = cos -1   xiyi

12 Phylogenetic reconstruction criterion for distance data V1 V2 V3 V4 V5 A B C D Additive tree (NJ) Ultrametric tree (UPGMA) A B C V1 V2 V3 V4 Properties dAB = v1 + v2 dAC = v1 + v3 + v4 dAD = v1 + v3 + v5 dBC = v2 + v3 + v5 dCD = v4 + v5 dAB = v1 + v2 + v3 dAC = v1 + v2+ v4 dBC = v3 + v4 v3 = v4 v1 = v2 + v3 = v2 = v4

13 Maximum Likelihood (1) (1)C….GGACACGTTTA….C (2) (2)C….AGACACCTCTA….C (3) (3)C….GGATAAGTTAA….C (4) (4)C….GGATAGCCTAG….C 1 J n 4 3 2 1 C ACG C ACG 6 5 4321 Unrooted tree Tree after rooting at an internal node Lj = Prob A A C ACG + Prob A C + Prob……. L = L 1 x L 2 x L 3 …x L N. =  Lj LnL = ln L 1 + ln L 2 + …. L N =  ln Lj L D = Pr (D H)

14 Hypothesis testing Hypothesis testing Likelihood ratio test  = log L 1 – log L 0 Rate variation Appropriate substitution Model 2   2 distribution d.f. = N sequences in the tree –2; or d.f = difference number of parameters H1 and H0

15 Bootstrapping Bootstrapping H ow well supported are the groups? Trumpet fish

16 Maximum Parsimony Minimize tree length To obtain rooted trees (and character polarity) use an outgroup. The ingroup is monophyletic. 1 1ATATT 2 2ATCGT 3 3GCAGT 4 4GCCGT Tree (first site) 1 2 3 4 1 change 5 changes G G AG A A G G GA A A

17 C Maximum Parsimony- Maximum Parsimony- example C T C T T Site 2Site 3 AAC A A C C C C C AA Site 4 T G G G GG Site 5 No changes TT T T T T Tree length L =  k i=1 li

18 Maximum parsimony: Maximum parsimony: example Sites 1 2 3 4 5 Total ((1,2),(3,4)) 1 1 2 1 0 5 ((1,3),(2,4)) 2 2 1 1 0 6 ((1,4),(2,3)) 2 2 2 1 0 7 Tree Phylogenetically informative sites

19 Networks Phylogenetic representation allowing reticulation More appropriate for intraespecific data Ancestor is alive hybridization, recombination, horizontal transfer, polyploidization agct acat acct acatagctacct 1 2 3 4 5 6 7

20 Multivariate clustering C1 C2 C3 C4 C5 C6 C7 1234512345 5 X 7 similarity criterion correlations 7 x 7 Calculate eigenvectors with highest eigenvalues Project data onto new axes (eigenvectors) X 1 st axis Y 2 nd axis Z 3 rd axis


Download ppt "Available at DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic."

Similar presentations


Ads by Google