Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
An Introduction to Phylogenetic Methods
Multiple Sequence Alignment & Phylogenetic Trees.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
1 Dan Graur Methods of Tree Reconstruction. 2 3.
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Bioinformatics and Phylogenetic Analysis
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
Phylogenetic reconstruction
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Phylogenetic trees Sushmita Roy BMI/CS 576
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Terminology of phylogenetic trees
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Fixations along phylogenetic lineages. Phylogenetic reconstruction: a simplification of the evolutionary process.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Introduction to Phylogenetics
Calculating branch lengths from distances. ABC A B C----- a b c.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Phylogenetic trees Sushmita Roy BMI/CS 576 Sep 23 rd, 2014.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
Phylogenetics-2 Marek Kimmel (Statistics, Rice)
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Occam's razor: states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference to any.
Phylogenetic trees. 2 Phylogeny is the inference of evolutionary relationships. Traditionally, phylogeny relied on the comparison of morphological features.
Introduction to Bioinformatics Resources for DNA Barcoding
Phylogenetic basis of systematics
Phylogeny - based on whole genome data
Distance based phylogenetics
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Inferring phylogenetic trees: Distance and maximum likelihood methods
Phylogenetic Trees.
Phylogenetic tree based on 16S rRNA gene sequence comparisons over 1,260 aligned bases showing the relationship between species of the genus Actinomyces.
Phylogeny.
Presentation transcript:

Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Who am I? Faculty at Genome Sciences Computational systems biologist Training: CS, physics, hi-tech, biology Research interests: Complex biological networks | Evolutionary dynamics | Microbial communities and metagenomics What will change? Not much! Informatics: From sequence to genes and to systems Programming: More emphasis on design and coding practices Tip of the day Coding style Website:

A quick review Trees: Represent sequence relationships A sequence tree has a topology and branch lengths (distances) The number of tree topologies grows very fast! Distance trees Compute pairwise corrected distances Build tree by sequential clustering algorithm (UPGMA or Neighbor-Joining). These algorithms don't consider all tree topologies, so they are very fast, even for large trees.

Maximum Parsimony Algorithm A fundamentally different method: Instead of reconstructing a tree, we will search for the best tree.

Pluralitas non est ponenda sine necessitate

William of Ockham (c – c. 1348) (Maximum) Parsimony Principle Pluralitas non est ponenda sine necessitate (plurality should not be posited without necessity) William of Ockham Occams Razor: Of two equivalent theories or explanations, all other things being equal, the simpler one is to be preferred. "when you hear hoof beats, think horses, not zebras Medical diagnosis The KISS principle: "Keep It Simple, Stupid!" Kelly Johnson, Engineer Make everything as simple as possible, but not simpler Albert Einstein

Parsimony principle for phylogenetic trees Find the tree that requires the fewest evolutionary changes!

Consider 4 species

positions in alignment (usually called "sites) Sequence data: The same approach would work for any discrete property that can be associated with the various species: Gene content (presence/absence of each gene) Morphological features (e.g., has wings, purple or white flowers) Numerical features (e.g., number of bristles)

Consider 4 species positions in alignment (usually called "sites) Sequence data: Parsimony Algorithm 1)Construct all possible trees 2)For each site in the alignment and for each tree count the minimal number of changes required 3)Add all sites up to obtain the total number of changes for each tree 4)Pick the tree with the lowest score

Consider 4 species All possible unrooted trees: H closest to C Sequence data: H closest to G or H closest to O or

Consider 4 species All possible unrooted trees: H closest to C Sequence data: H closest to G or H closest to O or For each site and for each tree count the minimal number of changes required:

c c a a c c Consider site 1 What is the minimal number of evolutionary changes that can account for the observed pattern? (Note: This is the small parsimony problem)

c c a a c c Consider site 1 What is the minimal number of evolutionary changes that can account for the observed pattern? (Note: This is the small parsimony problem)

c c a a c c Consider site 1

c c a a c c

Uninformative (no changes) Consider site 2

Consider site 3

Put sites 1 and 3 together

Which tree is the most parsimonious? Now put all of them together parsimony score

1)Construct all possible trees 2)For each site in the alignment and for each tree count the minimal number of changes required 3)Add all sites up to obtain the total number of changes for each tree 4)Pick the tree with the lowest score The parsimony algorithm

1)Construct all possible trees 2)For each site in the alignment and for each tree count the minimal number of changes required 3)Add all sites up to obtain the total number of changes for each tree 4)Pick the tree with the lowest score The parsimony algorithm Too many!

1)Construct all possible trees 2)For each site in the alignment and for each tree count the minimal number of changes required 3)Add all sites up to obtain the total number of changes for each tree 4)Pick the tree with the lowest score The parsimony algorithm Too many! How?

1)Construct all possible trees 2)For each site in the alignment and for each tree count the minimal number of changes required 3)Add all sites up to obtain the total number of changes for each tree 4)Pick the tree with the lowest score The parsimony algorithm Too many! How? Fitchs algorithm Search algorithm