Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 4 Distance–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/03/29.

Similar presentations


Presentation on theme: "1 Chapter 4 Distance–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/03/29."— Presentation transcript:

1 1 Chapter 4 Distance–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/03/29

2 2 Motivation Evolution events on genomes:  substitutions  insertions  deletions  rearrangements We focus on cluster analysis in this chapter.

3 3 4.1 History of Molecular Phylogenetics taxonomists ( 分類學家 ) :  naming ( 命名 ) & grouping ( 分類 )  traditional approach based on anatomic difference ( 解剖 ) Linnaeu’s system  界 門 綱 目 科 屬 種 Darwing ( 達爾文 ) Nuttall (1902~1904)  humans & apes 最晚分化 ( 從免疫系統的角度 )

4 4 1950s  protein electrophoresis (size, charge) 1960s  protein sequencing 1970s  genomic information  先有 restriction enzyme  後有 DNA sequencing

5 5 比較解剖學

6 6 蟒蛇與人

7 7 Linnaeus 林奈  十八世紀博物學家  帶領未受訓練的學生到世界各地蒐集標本, 遠征過 程中有三分之一的學生死亡.  創立「二名法」 (binomial system of nomenclature) 屬名 (genus) +種名 (species)

8 8

9 9 4.2 Advantages of Molecular Phylogenies fundamental  evolution is defined as genetic changes  molecular clock hypothesis (Chap. 3) In early days, taxonomists inferred genotypes from phenotypes.  phenotypes( 表現型 ): how organisms looks  genotypes: the genes that gave rise to their physical appearance

10 10 And then  behavior ( 行為 )  ultrastructural ( 超顯微結構 )  biochemical characteristics were studied.

11 11 傳統研究方法有以下問題無法解決  convergent evolution 眼睛: humans, flies, mollusks ( 軟體動物 )  many organisms do not have easily studied phenotypic features bacteria ( 細菌 )  comparing distantly related organisms bacteria, worms, mammals few characteristics in common!

12 12 4.3 Phylogenetic Trees

13 13

14 14 4.3.1 Terminology of Tree Reconstruction phylogenetic tree, or dendrogram  nodes: taxonomical units  branches  terminal nodes collected data (I, II, III, IV, V)  internal nodes inferred ancestors (A, B, C, D) Newick format  (((I, II), (III, IV)), V)

15 15 bifurcate: 一變二 multifurcate: ≥3 scaled trees  branch lengths are proportional to the differences between pairs of neighboring nodes additive in a scaled tree  physical length of two nodes reflects their accumulated difference unscaled trees  convey only their relative kinship

16 16 4.3.2 Rooted and Unrooted Trees

17 17

18 18 N R =#(rooted binary trees) N U =#(unrooted binary trees)

19 19 4.3.3 Gene vs. Species Trees gene tree  within a single homologous gene species tree  best obtained from analysis of multiple genes Note:  Evolution occurs at the level of populations of organisms, not at the level of individuals.   Gene tree & species tree are different!

20 20

21 21 4.3.4 Character and Distance Data characters ( 特質 )  DNA sequences, protein sequences, color, behavior, response time, …… distance  overall, pairwise difference character data  distance data pheneticist: prefers distance based methods cladist: prefer character based methods

22 22 4.4 Distance Matrix Methods UPGMA (Unweighted-Pair-Group Method with Arithmetic mean) Transformed Distance Method Neighbor ’ s Relation Method Neighbor-Joining Method

23 23 4.4.1 UPGMA Unweighted-Pair-Group Method with Arithmetic mean 1960s  Assume a constant rate of evolution across all lineages.

24 24

25 25

26 26

27 27

28 28

29 29 The definition of d ij is and can be calculated by

30 30

31 31 Ultrametric Test A matrix is ultrametric iff

32 32 Theorem UPGMA can reconstruct the correct phylogenetic tree as long as the distance matrix is ultrametric.

33 33 Estimation of Branch Lengths Once the tree topology is given.topology

34 34 4.4.2 Transformed Distance Method Weakness of UPGMA  It assumes a constant rate of evolution across all lineages.   Modify the distance matrix so that UPGMA can perform better.

35 35 Outgroup J. Farris, 1977 Branch length

36 36 It only gives a tree topology and does not provide estimates of branch lengths (Nei, 1987). The transformed matrix is ultrametric. A matrix is ultrametric iff

37 37

38 38 4.4.3 Neighbor ’ s Relation Method Four-point condition d AB +d CD <d AC +d BD d AB +d CD <d AD +d BC holds if the tree is additive.

39 39 Given any four points, say A, B, C, D, we have d AB +d CD d AC +d BD d AD +d BC. The smallest indicates how to pair up.

40 40 S. Sattath & A. Tversky, 1977 For any four points, say A, B, C, D, compute d AB +d CD d AC +d BD d AD +d BC. The smallest should be paired, and wins a score 1 for each pair. After trying all possible quadruples, the pair wins the highest scores is grouped.

41 41 Example

42 42

43 43

44 44 The length of the branches can be determined by the outgroup method.outgroup

45 45 Theorem If a matrix is additive, then its phylogenetic tree (unrooted, binary) can be reconstructed correctly and uniquely by the Neighbor ’ s Relation Method.

46 46 4.4.4 Neighbor-Joining Methods

47 47

48 48 where L :the set of all leaves (7.4)

49 49

50 50 Theorem If a matrix is additive, then its phylogenetic tree (unrooted, binary) can be reconstructed correctly and uniquely by the Neighbor-Joining Method.

51 51 4.5 Maximum Likelihood Approaches purely statistically based method multiple substitutions All sites are not necessarily independent. No one substitution model is as yet as close to general acceptance.

52 52 4.6 Multiple Sequence Alignment

53 53 參考資料及圖片出處 1. Fundamental Concepts of Bioinformatics Dan E. Krane and Michael L. Raymer, Benjamin/Cummings, 2003. Fundamental Concepts of Bioinformatics 2. Biological Sequence Analysis – Probabilistic models of proteins and nucleic acids R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Cambridge University Press, 1998. Biological Sequence Analysis 3. Biology, by Sylvia S. Mader, 8th edition, McGraw-Hill, 2003. Biology


Download ppt "1 Chapter 4 Distance–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/03/29."

Similar presentations


Ads by Google