Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.

Slides:



Advertisements
Similar presentations
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
An Introduction to Phylogenetic Methods
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Lecture 13 CS5661 Phylogenetics Motivation Concepts Algorithms.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Phylogenetics - Distance-Based Methods CIS 667 March 11, 2204.
Phylogenetic reconstruction
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Maximum Likelihood. Historically the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Its slow uptake by the scientific community.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
Maximum Likelihood Flips usage of probability function A typical calculation: P(h|n,p) = C(h, n) * p h * (1-p) (n-h) The implied question: Given p of success.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
We have shown that: To see what this means in the long run let α=.001 and graph p:
07/05/2004 Evolution/Phylogeny Introduction to Bioinformatics MNW2.
Lecture 24 Inferring molecular phylogeny Distance methods
Phylogenetic Trees Tutorial 6. Measuring distance Bottom-up algorithm (Neighbor Joining) –Distance based algorithm –Relative distance based Phylogenetic.
Chapter 5 The Evolution Trees.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Analysis. 2 Phylogenetic Analysis Overview Insight into evolutionary relationships Inferring or estimating these evolutionary relationships.
Phylogenetic trees Tutorial 6. Distance based methods UPGMA Neighbor Joining Tools Mega phylogeny.fr DrewTree Phylogenetic Trees.
Phylogenetic trees Sushmita Roy BMI/CS 576
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Terminology of phylogenetic trees
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Molecular basis of evolution. Goal – to reconstruct the evolutionary history of all organisms in the form of phylogenetic trees. Classical approach: phylogenetic.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Models of sequence evolution GTR HKY Jukes-Cantor Felsenstein K2P Tree building methods: some examples Assessing phylogenetic data Popular phylogenetic.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Calculating branch lengths from distances. ABC A B C----- a b c.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Rooting Phylogenetic Trees with Non-reversible Substitution Models Von Bing Yap* and Terry Speed § *Statistics and Applied Probability, National University.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Fitch-Margoliash Algorithm 1.From the distance matrix find the closest pair, e.g., A & B 2.Treat the rest of the sequences as a single composite sequence.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Lecture 14 CS5661 Neighbor Joining Generates unrooted tree, allowing for unequal branches Given: Distance matrix for sequences Steps: Repeat 1-3 till all.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Phylogeny - based on whole genome data
Inferring phylogenetic trees: Distance methods
Inferring a phylogeny is an estimation procedure.
Multiple Alignment and Phylogenetic Trees
Goals of Phylogenetic Analysis
Methods of molecular phylogeny
Patterns in Evolution I. Phylogenetic
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
BNFO 602 Phylogenetics Usman Roshan.
#30 - Phylogenetics Distance-Based Methods
Phylogeny.
Lecture 19: Evolution/Phylogeny
Presentation transcript:

Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002

Outline Comments about Trees UPGMA (Unweighted Pair Group Method with Arithmetic Mean) analysis Other uses of phylogenetic trees Conversion of Alignment Scores to distances Maximum Likelihood Approach Comments on Neighbor Joining Algorithm Conclusion

Comments on Trees Trees give insights into underlying data Identical trees can appear differently depending upon the method of display Information maybe lost when creating the tree. The tree is not the underlying data.

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) Algorithm 1.Create distance matrix 2.Build internal representation of Tree Find two closest members Combine members Calculate new distances for combined node Repeat until only one node is left (root node) 3.Draw Tree If node Draw Node and Exit Else Find Short and Tall Trees Recursive Call DrawTree(Tall_Tree) Recursive Call DrawTree(Short_Tree) DrawConnection(Tall_Tree, Short_Tree) Exit 4.Calculate loss of information (Cophenetic Correlation Coefficient) Number between –1 and 1. 1 Perfect Correlation 0 No Correlation -1 Perfect Reverse Correlation

Distance Matrix of 16s rna gene Global alignments were done between 6 species of bacteria Sequences were 500 base pair sequences from MIDI LABS. Mismatches were used as the data points for the distance matrix. sequences.txt alignments.txtsequences.txt alignments.txt

UPGMA Analysis UPGMA Spread Sheet UPGMAfinal.xlsUPGMAfinal.xls

Other uses of phylogenetic trees Verification of Taxonomy –Organisms have been classified into various groups before gene sequencing. –Is there a relationship between genetic differences and existing taxonomy? –bacpseu.txt CLUSTALW.docbacpseu.txtCLUSTALW.doc –bacpseustaph.txt Taxonomy.docbacpseustaph.txtTaxonomy.doc Identification of Unknowns –Unknown is placed in the tree along with known samples –The relationship between the known and unknown sample allows for identification –unknown_id.txt Unknown_Results.docunknown_id.txtUnknown_Results.doc Non genetic analysis (Fatty Acids) –FattyAcid_PseuBaci.rtfFattyAcid_PseuBaci.rtf

Conversion of Alignment Scores to Distances Alignment scores are large for similar sequences. Distance methods require that the distances between similar sequences are smaller than the distances between less similar sequences. Large alignment scores need to be mapped to small distances and vice versa.

Maximum Likelihood Analysis Same as Maximum Parsimony except rates of nucleic acids substitutions are not considered to have equal probability. All possible unrooted trees are evaluated. (Same for Parsimony) Each column of the alignment is processed. (Same for Parsimony) The transition of A -> T will have a different probability than the transition from G -> C Start with a frequency distribution table that specifies the probability of one base being substituted for another base. See probabilities of nucleotide substitution. (Table 6.5 pg 275) Probability that unrooted tree predicts each column of the alignment is calculated. Probabilities for each column are summed together for each tree. The unrooted tree with the highest probability is chosen.

Maximum Likelihood Example Four sequences are compared (w, x, y and z) All unrooted trees are shown In this example we will examine the first unrooted tree.

Maximum Likelihood Example Continued L(Tree x) = L0 * L1 * L2 * L3 * L4 * L5 * L6 L0 base probability of nucleotide at 0 (0.25) L1 probability of nucleotide changing from value at 0 to value at 1. L2 probability of nucleotide changing from value at 0 to value at 1. L3 probability of nucleotide changing from value at 1 to value at 3 (T). L4, L5, L6 probability of nucleotide changing to value at leaf.

Maximum Likelihood Example Continued There are 64 likelihood trees to evaluate. (number of internal nodes) ^ (number of bases) or 3^4. We will show evaluation TTG against the first unrooted tree for column TTAG Determine values for L0, … L6. Values are determined by looking up probabilities in transition probability table. Probability of L2 is T->G Probability of L5 is G -> A Probability of L3 is T->T Determine combined probability L0 * L1 * L2 * … * L6

Maximum Likelihood Example Continued Determine probability for combination TGG Determine probability for the other 62 combinations. Sum all the trees together. L(Tree) = (LTree1) + L(Tree2) + … + L(Tree64) Move to next column and repeat the same procedure. Once all columns are complete sum all the probabilities. This is the likelihood of the first unrooted tree. Continue this process for the other unrooted trees. Pick the unrooted tree with the highest probability. This is the most likely unrooted tree.

Comments on Neighbor Joining compared with Fitch Not nearest neighbor (objective is to create the smallest tree). Nearest Neighbor is almost identical to Fitch except for the evaluation function. –Start with star –Evaluate all possibilities by combine any two nodes and run Fitch. –Evaluate size of tree by summing lengths of branchesPick smallest and continue. Evaluation of Fitch is done by calculating the predicted distance between each pair of sequences for each tree to find the tree that best fits the original data. Question? Is summing the braches faster that calculating predicted distance?

Conclusion Phylogenetic Prediction can be used for more than Evolutionary Distance –Verification of Taxonomy –Identification of unknown –Techniques work for genetic and non genetic data (Fatty Acid). Use multiple methods for verification –Pick at least two different types of methods from Parsimony, Distance and Likelihood. –If the analysis is in agreement there is a higher level of confidence that the analysis is correct.