Download presentation
Presentation is loading. Please wait.
Published byTimothy Townsend Modified over 9 years ago
1
Available at http://planet.uwc.ac.za/nisl DNA variation in Ecology and Evolution DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction Maria Eugenia D’Amato BCB 705:Biodiversity
2
Organization of the presentation Phylogenetic reconstruction Networks Multivariate analysis Distance ML MP
3
Characters: Characters: independent homologous Continuous Discrete Binary Multistate
4
DNA sequence characters Alignment = hypothesizing of a homology relationship for each site Sequence comparison BLAST search - GenBank Coding sequenceblastn blastx Non-coding DNA blastn
5
Blast search results Score E Score E Sequences producing significant alignments: (Bits) Value gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5... 101 3e-18 gi|343991|dbj|D10368.1|FRGMTURF2 Rana catesbeiana mitochondri... 97.6 5e-17 gi|14209845|gb|AF314017.1|AF314017 Rana sylvatica NADH dehydr... 93.7 8e-16 The lower the E-value, the better the alignment GeneBank Accession numbers for the sequence Species that match the query
6
Blast search results >gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5, ND1, ND2 genes for NADH dehydrogenasegi|87299397|dbj|AB239568.1| subunit 5, NADH dehydrogenase subunit 1, NADH dehydrogenase subunit 2, complete cds Length=10814 Identities = 99/115 (86%), Gaps = 0/115 (0%) Strand=Plus/Minus 5’end Score = 101 bits (51), Expect = 3e-18 Query 451 TTAGTTGAGGATTAAATTTTAGGATAATAACTATTCAGCCGAGGTGGCTGATGGAAGAAA 510 ||||||||||||||||||||| ||||||| ||||||||| ||||| | |||||||| | Sbjct 10203 TTAGTTGAGGATTAAATTTTAAAATAATAAGTATTCAGCCCAGGTGACCAATGGAAGAGA 10144 Query 511 AAGCTAAAATTTTACGTAGTTGTGTTTGGCTAATGCCGCCTCATCCGCCTACAAG 565 | |||| ||||||||||||||| |||||| |||| || ||||| || |||||||| Sbjct 10143 AGGCTATAATTTTACGTAGTTGAGTTTGGTTAATACCCCCTCAACCTCCTACAAG 10089 Description of the genes contained in the sequence with this Accession number Strands aligned alignment
7
Phylogenetic reconstruction Phylogenetic reconstruction Distance methods C1 C2 C3 C4 C5 C6 C7 1234512345 Distance criterion Similarity / dissimilarity criterion dendrogram 5 x 5 5 X 7
8
Distances criterion for binary data a a + b + c a = bands common to a and b b = bands exclusive to a c = bands exclusive to b J = (x1, y1) (x2, y2) Jaccard’s distance Manhattan distance M = P1 P2 (x1-x2) 2 + (x2-y2) 2 Euclidean distance
9
Distance criterion for DNA data- Distance criterion for DNA data- Models of DNA susbstitution p = n of different nucleotides/ total n nucleotides f AA f AC f AG f AT f CA f CC f CG f CT f GA f GC f GG f GT f TA f TC f TG f TT Fxy = a b c d e f g h i j k l m n o p Fxy =
10
Models of DNA susbstitution Jukes and Cantor D = 1 – ( a + f + k + p) dxy = - ¾ ln (1- 4/3 D) F81 B = 1 – ( 2 A + 2 C + 2 G + 2 T ) dxy = - B ln (1- D/B) Equal rate Unequal base freqs K2P P = c + h + i + nTransitions Q = b + d + e + g + j + l + m + oTransversions 1 1-2P-Q dxy = 1 ln 2 1 ln 1 4 1-2Q +
11
Distances criterion for diploid data Dn -ln Jx i y i Jx i Jy i Nei 1972 = I Jx = xi 2 Jx = yi 2 Jxy = xiyi Cavalli Sforza 1967 Darc = (1/L) (2 / ) 2 = cos -1 xiyi
12
Phylogenetic reconstruction criterion for distance data V1 V2 V3 V4 V5 A B C D Additive tree (NJ) Ultrametric tree (UPGMA) A B C V1 V2 V3 V4 Properties dAB = v1 + v2 dAC = v1 + v3 + v4 dAD = v1 + v3 + v5 dBC = v2 + v3 + v5 dCD = v4 + v5 dAB = v1 + v2 + v3 dAC = v1 + v2+ v4 dBC = v3 + v4 v3 = v4 v1 = v2 + v3 = v2 = v4
13
Maximum Likelihood (1) (1)C….GGACACGTTTA….C (2) (2)C….AGACACCTCTA….C (3) (3)C….GGATAAGTTAA….C (4) (4)C….GGATAGCCTAG….C 1 J n 4 3 2 1 C ACG C ACG 6 5 4321 Unrooted tree Tree after rooting at an internal node Lj = Prob A A C ACG + Prob A C + Prob……. L = L 1 x L 2 x L 3 …x L N. = Lj LnL = ln L 1 + ln L 2 + …. L N = ln Lj L D = Pr (D H)
14
Hypothesis testing Hypothesis testing Likelihood ratio test = log L 1 – log L 0 Rate variation Appropriate substitution Model 2 2 distribution d.f. = N sequences in the tree –2; or d.f = difference number of parameters H1 and H0
15
Bootstrapping Bootstrapping H ow well supported are the groups? Trumpet fish
16
Maximum Parsimony Minimize tree length To obtain rooted trees (and character polarity) use an outgroup. The ingroup is monophyletic. 1 1ATATT 2 2ATCGT 3 3GCAGT 4 4GCCGT Tree (first site) 1 2 3 4 1 change 5 changes G G AG A A G G GA A A
17
C Maximum Parsimony- Maximum Parsimony- example C T C T T Site 2Site 3 AAC A A C C C C C AA Site 4 T G G G GG Site 5 No changes TT T T T T Tree length L = k i=1 li
18
Maximum parsimony: Maximum parsimony: example Sites 1 2 3 4 5 Total ((1,2),(3,4)) 1 1 2 1 0 5 ((1,3),(2,4)) 2 2 1 1 0 6 ((1,4),(2,3)) 2 2 2 1 0 7 Tree Phylogenetically informative sites
19
Networks Phylogenetic representation allowing reticulation More appropriate for intraespecific data Ancestor is alive hybridization, recombination, horizontal transfer, polyploidization agct acat acct acatagctacct 1 2 3 4 5 6 7
20
Multivariate clustering C1 C2 C3 C4 C5 C6 C7 1234512345 5 X 7 similarity criterion correlations 7 x 7 Calculate eigenvectors with highest eigenvalues Project data onto new axes (eigenvectors) X 1 st axis Y 2 nd axis Z 3 rd axis
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.