Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 5 Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05.

Similar presentations


Presentation on theme: "1 Chapter 5 Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05."— Presentation transcript:

1

2 1 Chapter 5 Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05

3 2 5.1 Parsimony Mutations are exceedingly rare events. The most unlikely events a model invokes, the less likely the model is to be correct.  The fewest number of mutations to explain a state is the most likely to be correct.

4 3 Ockham's Razor the philosophic rule states that entities should not be multiplied unnecessarily

5 4

6 5

7 6 5.1.1 Informative and Uninformative Sites

8 7

9 8 informative sites  have information to construct a tree uninformative sites  have no information in the sense of parsimony principle.

10 9 uninformative

11 10 uninformative

12 11 informative

13 12 informative

14 13 A position to be informative must have  at least two different nucleotides  each of these nucleotides to present at least twice.

15 14 informative sites  synapomorphy: support the internal branches (true)  homoplasy: acquired as a result of parallel evolution of convergence (false) 眼睛: humans, flies, mollusks ( 軟體動物 )

16 15 5.1.2 Unweighted Parsimony Every possible tree is considered individually for each informative site. The tree with the minimum overall costs are reported.

17 16

18 17 There are several problems:  The number of alternative unrooted trees increases dramatically.  Calculating the number of substitutions invoked by each alternative tree is difficult.

19 18 The second problem can be solved by  intersection: if the intersection of the two sets of its children is not empty  union: if it is empty.  The number of unions is the minimum number of substitutions.  For uninformative site, it is the number of different nucleotides minus one.

20 19 /* the u th position in the k th sequence */

21 20 5.1.4 Weighted Parsimony Not all mutations are equivalent  Some sequences (e.g., non-coding seq.) are more prone to indel than others.  Functional importance differs from gene to gene.  Subtle substitution biases usually vary between genes and between species.  Weights (scoring matrices) can be added to reflect these differences.

22 21

23 22

24 23

25 24

26 25 Calculating the optimal costs

27 26 Finding the internal nodes

28 27 5.2 Inferred Ancestral Sequences Can be derived while constructing the tree.   No missing link! 如何取樣本 ? It may be bias.

29 28 5.3 Strategies for Faster Searches The number of different phylogenetic tree grows enormously.  10 sequences  2M for exhaustive search

30 29 5.3.1 Branch and Bound Provided by Hardy & Penny in 1982. L: an upper bound (for minimum problem)  obtained from random search or by heuristics (e.g., UPGMA) Incrementally growing a tree. (branch) Prune any branch with cost already greater than L. (bound)

31 30

32 31 Properties  complete search  efficient w.r.t. exhaustive search 20 sequences are doable.

33 32 5.3.2 Heuristic Searches local search  Alternative trees are not all independent of each other.  branch swapping (Fig. 5.5) Properties  not complete, may lose the optimal solution  fast and efficient  local minimal

34 33

35 34 5.4 Consensus Trees Problem  Parsimony approaches may yield more than one trees. consensus tree  an agreement or a summary of these trees agree  bifurcation not agree  multi-furcation

36 35

37 36 5.5 Tree Confidence How much confidence can be attached to the overall tree and its component parts How much more likely is one tree to be correct than a particular or randomly chosen alternative tree?

38 37 5.5.1 Bootstrap Tests 1. Randomly choose columns to combine into a new alignment of the same order. 2. Reconstruct the tree for the new sample. 3. Repeat (1) (2) for many times. 4. Consensus the sampled trees w.r.t. the tested one.

39 38

40 39

41 40

42 41

43 42 Caution  Test based on fewer than several hundred iterations are not reliable.  Underestimate the confidence level at high values and overestimate it at low values.  Some results may appear to be statistically significant by chance simply so many groupings are being considered.

44 43 Strategy  doing thousands of iterations  using a correction method to adjust for estimation biases  collapsing branches to multi-furcations What happens if a tree-building algorithm always produces the same tree?

45 44 5.5.2 Parametric Tests (???) What is the limit of Parsimony Principle?  especially for distant sequences  the most parsimonious tree v.s. a particular alternative (this can be used to estimate the significance of the built tree)

46 45 H. Kishino & M. Hasegawa (1989)  Assume that informative sites within an alignment are both independent and equivalent.  D: difference of minimum number of substitutions invoked by two trees

47 46 5.6 Comparison of Phylogenetic Methods 用兩種不同的方法, 如果建構出相同的樹, 那 麼其正確性就很高.

48 47 5.7 Molecular Phylogenies Implications  medicine: drug treatment  agriculture: disease resistance factors  conservation ( 保育 ): 絕種物種之認定

49 48 5.7.1 The Tree of LifeThe Tree of Life Carl Woese and his colleagues (1970s)  16S rRNA (all organisms possess)

50 49 5.7.2 Human Origins mtDNA  The mean difference between two human populations is about 0.33%.  The greatest differences are found in Alfrica, not across the different continents!  out-of-Africa theory  mtRNA & Y chromosome are consistent with this hypothesis

51 50 They concluded  mitochondrial Eve & Y chromosome Adam  200’000 years ago

52 51

53 52 參考資料及圖片出處 1. Fundamental Concepts of Bioinformatics Dan E. Krane and Michael L. Raymer, Benjamin/Cummings, 2003. Fundamental Concepts of Bioinformatics 2. Biological Sequence Analysis – Probabilistic models of proteins and nucleic acids R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Cambridge University Press, 1998. Biological Sequence Analysis 3. Biology, by Sylvia S. Mader, 8th edition, McGraw-Hill, 2003. Biology


Download ppt "1 Chapter 5 Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05."

Similar presentations


Ads by Google