Download presentation
Presentation is loading. Please wait.
2
1 Chapter 5 Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05
3
2 5.1 Parsimony Mutations are exceedingly rare events. The most unlikely events a model invokes, the less likely the model is to be correct. The fewest number of mutations to explain a state is the most likely to be correct.
4
3 Ockham's Razor the philosophic rule states that entities should not be multiplied unnecessarily
5
4
6
5
7
6 5.1.1 Informative and Uninformative Sites
8
7
9
8 informative sites have information to construct a tree uninformative sites have no information in the sense of parsimony principle.
10
9 uninformative
11
10 uninformative
12
11 informative
13
12 informative
14
13 A position to be informative must have at least two different nucleotides each of these nucleotides to present at least twice.
15
14 informative sites synapomorphy: support the internal branches (true) homoplasy: acquired as a result of parallel evolution of convergence (false) 眼睛: humans, flies, mollusks ( 軟體動物 )
16
15 5.1.2 Unweighted Parsimony Every possible tree is considered individually for each informative site. The tree with the minimum overall costs are reported.
17
16
18
17 There are several problems: The number of alternative unrooted trees increases dramatically. Calculating the number of substitutions invoked by each alternative tree is difficult.
19
18 The second problem can be solved by intersection: if the intersection of the two sets of its children is not empty union: if it is empty. The number of unions is the minimum number of substitutions. For uninformative site, it is the number of different nucleotides minus one.
20
19 /* the u th position in the k th sequence */
21
20 5.1.4 Weighted Parsimony Not all mutations are equivalent Some sequences (e.g., non-coding seq.) are more prone to indel than others. Functional importance differs from gene to gene. Subtle substitution biases usually vary between genes and between species. Weights (scoring matrices) can be added to reflect these differences.
22
21
23
22
24
23
25
24
26
25 Calculating the optimal costs
27
26 Finding the internal nodes
28
27 5.2 Inferred Ancestral Sequences Can be derived while constructing the tree. No missing link! 如何取樣本 ? It may be bias.
29
28 5.3 Strategies for Faster Searches The number of different phylogenetic tree grows enormously. 10 sequences 2M for exhaustive search
30
29 5.3.1 Branch and Bound Provided by Hardy & Penny in 1982. L: an upper bound (for minimum problem) obtained from random search or by heuristics (e.g., UPGMA) Incrementally growing a tree. (branch) Prune any branch with cost already greater than L. (bound)
31
30
32
31 Properties complete search efficient w.r.t. exhaustive search 20 sequences are doable.
33
32 5.3.2 Heuristic Searches local search Alternative trees are not all independent of each other. branch swapping (Fig. 5.5) Properties not complete, may lose the optimal solution fast and efficient local minimal
34
33
35
34 5.4 Consensus Trees Problem Parsimony approaches may yield more than one trees. consensus tree an agreement or a summary of these trees agree bifurcation not agree multi-furcation
36
35
37
36 5.5 Tree Confidence How much confidence can be attached to the overall tree and its component parts How much more likely is one tree to be correct than a particular or randomly chosen alternative tree?
38
37 5.5.1 Bootstrap Tests 1. Randomly choose columns to combine into a new alignment of the same order. 2. Reconstruct the tree for the new sample. 3. Repeat (1) (2) for many times. 4. Consensus the sampled trees w.r.t. the tested one.
39
38
40
39
41
40
42
41
43
42 Caution Test based on fewer than several hundred iterations are not reliable. Underestimate the confidence level at high values and overestimate it at low values. Some results may appear to be statistically significant by chance simply so many groupings are being considered.
44
43 Strategy doing thousands of iterations using a correction method to adjust for estimation biases collapsing branches to multi-furcations What happens if a tree-building algorithm always produces the same tree?
45
44 5.5.2 Parametric Tests (???) What is the limit of Parsimony Principle? especially for distant sequences the most parsimonious tree v.s. a particular alternative (this can be used to estimate the significance of the built tree)
46
45 H. Kishino & M. Hasegawa (1989) Assume that informative sites within an alignment are both independent and equivalent. D: difference of minimum number of substitutions invoked by two trees
47
46 5.6 Comparison of Phylogenetic Methods 用兩種不同的方法, 如果建構出相同的樹, 那 麼其正確性就很高.
48
47 5.7 Molecular Phylogenies Implications medicine: drug treatment agriculture: disease resistance factors conservation ( 保育 ): 絕種物種之認定
49
48 5.7.1 The Tree of LifeThe Tree of Life Carl Woese and his colleagues (1970s) 16S rRNA (all organisms possess)
50
49 5.7.2 Human Origins mtDNA The mean difference between two human populations is about 0.33%. The greatest differences are found in Alfrica, not across the different continents! out-of-Africa theory mtRNA & Y chromosome are consistent with this hypothesis
51
50 They concluded mitochondrial Eve & Y chromosome Adam 200’000 years ago
52
51
53
52 參考資料及圖片出處 1. Fundamental Concepts of Bioinformatics Dan E. Krane and Michael L. Raymer, Benjamin/Cummings, 2003. Fundamental Concepts of Bioinformatics 2. Biological Sequence Analysis – Probabilistic models of proteins and nucleic acids R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Cambridge University Press, 1998. Biological Sequence Analysis 3. Biology, by Sylvia S. Mader, 8th edition, McGraw-Hill, 2003. Biology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.