1 Chapter 5 Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
1 Chapter 7 Protein and RNA Structure Prediction 暨南大學資訊工程學系 黃光璿 2004/05/24.
An Introduction to Phylogenetic Methods
1 Dan Graur Methods of Tree Reconstruction. 2 3.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
1 Chapter 4 Distance–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/03/29.
1 Chapter 2 Data Searches and Pairwise Alignments 暨南大學資訊工程學系 黃光璿 2004/03/08.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetic reconstruction
1 Chapter 3 Substitution Patterns 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/03/22.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Molecular Phylogeny Fredj Tekaia Institut Pasteur
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Building phylogenetic trees Jurgen Mourik & Richard Vogelaars Utrecht University.
1 高等演算法 Homework One 暨南大學資訊工程學系 黃光璿 2004/11/11. 2 Problem 1.
Lecture 24 Inferring molecular phylogeny Distance methods
1 Chapter 1-- Introduction Chapter 2-- Algorithms and Complexity 暨南大學資訊工程系 黃光璿 (Guan-Shieng Huang) 2005/02/21.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
With astonishing advance of the Human Genome Project, essentially all human genomic sequences are available in public databases. The major task for the.
Probabilistic methods for phylogenetic trees (Part 2)
Phylogenetic Tree Construction and Related Problems Bioinformatics.
1 Multiple Sequence Alignment 暨南大學資訊工程學系 黃光璿 2004/05/31.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
Characterbased phylogenetic methods
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
分子演化 Molecular Evolution
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Lecture 2: Principles of Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Evolutionary tree reconstruction
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Doug Raiford Lesson 9.  3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part.
Phylogeny Ch. 7 & 8.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Step 3: Tools Database Searching
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Introduction to Bioinformatics Resources for DNA Barcoding
Evolutionary genomics can now be applied beyond ‘model’ organisms
Phylogenetic basis of systematics
Inferring a phylogeny is an estimation procedure.
Methods of molecular phylogeny
Patterns in Evolution I. Phylogenetic
Presentation transcript:

1 Chapter 5 Character–Based Methods of Phylogenetics 暨南大學資訊工程學系 黃光璿 (HUANG, Guan-Shieng) 2004/04/05

2 5.1 Parsimony Mutations are exceedingly rare events. The most unlikely events a model invokes, the less likely the model is to be correct.  The fewest number of mutations to explain a state is the most likely to be correct.

3 Ockham's Razor the philosophic rule states that entities should not be multiplied unnecessarily

4

5

Informative and Uninformative Sites

7

8 informative sites  have information to construct a tree uninformative sites  have no information in the sense of parsimony principle.

9 uninformative

10 uninformative

11 informative

12 informative

13 A position to be informative must have  at least two different nucleotides  each of these nucleotides to present at least twice.

14 informative sites  synapomorphy: support the internal branches (true)  homoplasy: acquired as a result of parallel evolution of convergence (false) 眼睛: humans, flies, mollusks ( 軟體動物 )

Unweighted Parsimony Every possible tree is considered individually for each informative site. The tree with the minimum overall costs are reported.

16

17 There are several problems:  The number of alternative unrooted trees increases dramatically.  Calculating the number of substitutions invoked by each alternative tree is difficult.

18 The second problem can be solved by  intersection: if the intersection of the two sets of its children is not empty  union: if it is empty.  The number of unions is the minimum number of substitutions.  For uninformative site, it is the number of different nucleotides minus one.

19 /* the u th position in the k th sequence */

Weighted Parsimony Not all mutations are equivalent  Some sequences (e.g., non-coding seq.) are more prone to indel than others.  Functional importance differs from gene to gene.  Subtle substitution biases usually vary between genes and between species.  Weights (scoring matrices) can be added to reflect these differences.

21

22

23

24

25 Calculating the optimal costs

26 Finding the internal nodes

Inferred Ancestral Sequences Can be derived while constructing the tree.   No missing link! 如何取樣本 ? It may be bias.

Strategies for Faster Searches The number of different phylogenetic tree grows enormously.  10 sequences  2M for exhaustive search

Branch and Bound Provided by Hardy & Penny in L: an upper bound (for minimum problem)  obtained from random search or by heuristics (e.g., UPGMA) Incrementally growing a tree. (branch) Prune any branch with cost already greater than L. (bound)

30

31 Properties  complete search  efficient w.r.t. exhaustive search 20 sequences are doable.

Heuristic Searches local search  Alternative trees are not all independent of each other.  branch swapping (Fig. 5.5) Properties  not complete, may lose the optimal solution  fast and efficient  local minimal

33

Consensus Trees Problem  Parsimony approaches may yield more than one trees. consensus tree  an agreement or a summary of these trees agree  bifurcation not agree  multi-furcation

35

Tree Confidence How much confidence can be attached to the overall tree and its component parts How much more likely is one tree to be correct than a particular or randomly chosen alternative tree?

Bootstrap Tests 1. Randomly choose columns to combine into a new alignment of the same order. 2. Reconstruct the tree for the new sample. 3. Repeat (1) (2) for many times. 4. Consensus the sampled trees w.r.t. the tested one.

38

39

40

41

42 Caution  Test based on fewer than several hundred iterations are not reliable.  Underestimate the confidence level at high values and overestimate it at low values.  Some results may appear to be statistically significant by chance simply so many groupings are being considered.

43 Strategy  doing thousands of iterations  using a correction method to adjust for estimation biases  collapsing branches to multi-furcations What happens if a tree-building algorithm always produces the same tree?

Parametric Tests (???) What is the limit of Parsimony Principle?  especially for distant sequences  the most parsimonious tree v.s. a particular alternative (this can be used to estimate the significance of the built tree)

45 H. Kishino & M. Hasegawa (1989)  Assume that informative sites within an alignment are both independent and equivalent.  D: difference of minimum number of substitutions invoked by two trees

Comparison of Phylogenetic Methods 用兩種不同的方法, 如果建構出相同的樹, 那 麼其正確性就很高.

Molecular Phylogenies Implications  medicine: drug treatment  agriculture: disease resistance factors  conservation ( 保育 ): 絕種物種之認定

The Tree of LifeThe Tree of Life Carl Woese and his colleagues (1970s)  16S rRNA (all organisms possess)

Human Origins mtDNA  The mean difference between two human populations is about 0.33%.  The greatest differences are found in Alfrica, not across the different continents!  out-of-Africa theory  mtRNA & Y chromosome are consistent with this hypothesis

50 They concluded  mitochondrial Eve & Y chromosome Adam  200’000 years ago

51

52 參考資料及圖片出處 1. Fundamental Concepts of Bioinformatics Dan E. Krane and Michael L. Raymer, Benjamin/Cummings, Fundamental Concepts of Bioinformatics 2. Biological Sequence Analysis – Probabilistic models of proteins and nucleic acids R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Cambridge University Press, Biological Sequence Analysis 3. Biology, by Sylvia S. Mader, 8th edition, McGraw-Hill, Biology