Download presentation
Presentation is loading. Please wait.
Published byErik Parsons Modified over 9 years ago
1
1 Chapter 4 Distance–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/03/29
2
2 Motivation Evolution events on genomes: substitutions insertions deletions rearrangements We focus on cluster analysis in this chapter.
3
3 4.1 History of Molecular Phylogenetics taxonomists ( 分類學家 ) : naming ( 命名 ) & grouping ( 分類 ) traditional approach based on anatomic difference ( 解剖 ) Linnaeu’s system 界 門 綱 目 科 屬 種 Darwing ( 達爾文 ) Nuttall (1902~1904) humans & apes 最晚分化 ( 從免疫系統的角度 )
4
4 1950s protein electrophoresis (size, charge) 1960s protein sequencing 1970s genomic information 先有 restriction enzyme 後有 DNA sequencing
5
5 比較解剖學
6
6 蟒蛇與人
7
7 Linnaeus 林奈 十八世紀博物學家 帶領未受訓練的學生到世界各地蒐集標本, 遠征過 程中有三分之一的學生死亡. 創立「二名法」 (binomial system of nomenclature) 屬名 (genus) +種名 (species)
8
8
9
9 4.2 Advantages of Molecular Phylogenies fundamental evolution is defined as genetic changes molecular clock hypothesis (Chap. 3) In early days, taxonomists inferred genotypes from phenotypes. phenotypes( 表現型 ): how organisms looks genotypes: the genes that gave rise to their physical appearance
10
10 And then behavior ( 行為 ) ultrastructural ( 超顯微結構 ) biochemical characteristics were studied.
11
11 傳統研究方法有以下問題無法解決 convergent evolution 眼睛: humans, flies, mollusks ( 軟體動物 ) many organisms do not have easily studied phenotypic features bacteria ( 細菌 ) comparing distantly related organisms bacteria, worms, mammals few characteristics in common!
12
12 4.3 Phylogenetic Trees
13
13
14
14 4.3.1 Terminology of Tree Reconstruction phylogenetic tree, or dendrogram nodes: taxonomical units branches terminal nodes collected data (I, II, III, IV, V) internal nodes inferred ancestors (A, B, C, D) Newick format (((I, II), (III, IV)), V)
15
15 bifurcate: 一變二 multifurcate: ≥3 scaled trees branch lengths are proportional to the differences between pairs of neighboring nodes additive in a scaled tree physical length of two nodes reflects their accumulated difference unscaled trees convey only their relative kinship
16
16 4.3.2 Rooted and Unrooted Trees
17
17
18
18 N R =#(rooted binary trees) N U =#(unrooted binary trees)
19
19 4.3.3 Gene vs. Species Trees gene tree within a single homologous gene species tree best obtained from analysis of multiple genes Note: Evolution occurs at the level of populations of organisms, not at the level of individuals. Gene tree & species tree are different!
20
20
21
21 4.3.4 Character and Distance Data characters ( 特質 ) DNA sequences, protein sequences, color, behavior, response time, …… distance overall, pairwise difference character data distance data pheneticist: prefers distance based methods cladist: prefer character based methods
22
22 4.4 Distance Matrix Methods UPGMA (Unweighted-Pair-Group Method with Arithmetic mean) Transformed Distance Method Neighbor ’ s Relation Method Neighbor-Joining Method
23
23 4.4.1 UPGMA Unweighted-Pair-Group Method with Arithmetic mean 1960s Assume a constant rate of evolution across all lineages.
24
24
25
25
26
26
27
27
28
28
29
29 The definition of d ij is and can be calculated by
30
30
31
31 Ultrametric Test A matrix is ultrametric iff
32
32 Theorem UPGMA can reconstruct the correct phylogenetic tree as long as the distance matrix is ultrametric.
33
33 Estimation of Branch Lengths Once the tree topology is given.topology
34
34 4.4.2 Transformed Distance Method Weakness of UPGMA It assumes a constant rate of evolution across all lineages. Modify the distance matrix so that UPGMA can perform better.
35
35 Outgroup J. Farris, 1977 Branch length
36
36 It only gives a tree topology and does not provide estimates of branch lengths (Nei, 1987). The transformed matrix is ultrametric. A matrix is ultrametric iff
37
37
38
38 4.4.3 Neighbor ’ s Relation Method Four-point condition d AB +d CD <d AC +d BD d AB +d CD <d AD +d BC holds if the tree is additive.
39
39 Given any four points, say A, B, C, D, we have d AB +d CD d AC +d BD d AD +d BC. The smallest indicates how to pair up.
40
40 S. Sattath & A. Tversky, 1977 For any four points, say A, B, C, D, compute d AB +d CD d AC +d BD d AD +d BC. The smallest should be paired, and wins a score 1 for each pair. After trying all possible quadruples, the pair wins the highest scores is grouped.
41
41 Example
42
42
43
43
44
44 The length of the branches can be determined by the outgroup method.outgroup
45
45 Theorem If a matrix is additive, then its phylogenetic tree (unrooted, binary) can be reconstructed correctly and uniquely by the Neighbor ’ s Relation Method.
46
46 4.4.4 Neighbor-Joining Methods
47
47
48
48 where L :the set of all leaves (7.4)
49
49
50
50 Theorem If a matrix is additive, then its phylogenetic tree (unrooted, binary) can be reconstructed correctly and uniquely by the Neighbor-Joining Method.
51
51 4.5 Maximum Likelihood Approaches purely statistically based method multiple substitutions All sites are not necessarily independent. No one substitution model is as yet as close to general acceptance.
52
52 4.6 Multiple Sequence Alignment
53
53 參考資料及圖片出處 1. Fundamental Concepts of Bioinformatics Dan E. Krane and Michael L. Raymer, Benjamin/Cummings, 2003. Fundamental Concepts of Bioinformatics 2. Biological Sequence Analysis – Probabilistic models of proteins and nucleic acids R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Cambridge University Press, 1998. Biological Sequence Analysis 3. Biology, by Sylvia S. Mader, 8th edition, McGraw-Hill, 2003. Biology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.