Phylogenetic Tree Construction and Related Problems Bioinformatics.

Slides:



Advertisements
Similar presentations
DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
Advertisements

Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Multiple Sequence Alignment
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Phylogenetic Trees: Assumptions All existing species have a common ancestor Each species is descended from a single ancestor Each speciation gives rise.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple sequence alignment
Multiple Sequence alignment Chitta Baral Arizona State University.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Finding Regulatory Motifs in DNA Sequences. Motifs and Transcriptional Start Sites gene ATCCCG gene TTCCGG gene ATCCCG gene ATGCCG gene ATGCCC.
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Alignment Modified from Tolga Can’s lecture notes (METU)
Sequence Alignment.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Phylogenetic Trees Tutorial 5. Agenda How to construct a tree using Neighbor Joining algorithm Phylogeny.fr tool Cool story of the day: Horizontal gene.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Evolutionary tree reconstruction (Chapter 10). Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships.
Evolutionary tree reconstruction
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Foundation of Computing Systems
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Tutorial 5 Phylogenetic Trees.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Multiple Alignment.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Multiple String Comparison – The Holy Grail. Why multiple string comparison? It is the most critical cutting-edge toοl for extracting and representing.
WABI: Workshop on Algorithms in Bioinformatics
Chapter 3 Brute Force Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
Character-Based Phylogeny Reconstruction
Multiple Alignment and Phylogenetic Trees
Intro to Alignment Algorithms: Global and Local
Chapter 3 Brute Force Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
CSE 589 Applied Algorithms Spring 1999
3. Brute Force Selection sort Brute-Force string matching
Multiple Sequence Alignment
Multiple Sequence Alignment (I)
3. Brute Force Selection sort Brute-Force string matching
Computational Genomics Lecture #3a
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

Phylogenetic Tree Construction and Related Problems Bioinformatics

Problems Multiple Sequence Alignment Construction of Phylogenetic Trees Determining Genomic Distance through Rearrangements

Alignment of Multiple Sequences We can extend the notion of alignment to multiple strings: -att-c- t-ttac- tat-a-t An alignment of strings S1…Sn is described by strings S1’…Sn’ Each Si’ contains the characters of Si in order interspersed with spaces (-) No position exists that contain spaces for all Si’

Scoring a Multiple Alignment Sum of pairs Consider each pair in the set of sequences, determine the similarity score (using gap, match, and mismatch weights) for each pair, and add all pair-wise scores Distance from consensus Consider all columns and count the total number of differences from the consensus character Variations: just count characters that differ from consensus or have a difference score for each differing character

Multiple Alignment Problem Formulation: Given sequences S1…Sn, obtain an optimal multiple alignment of the sequences “Optimal” depends on multiple alignment scoring method No known (correct) efficient algorithms for this problem

Strategies Brute force algorithm: consider all possible alignments, then determine the one that results in a best score (time complexity?) Common Heuristic: Use regular (two-string) alignment and then repeatedly add a string to a growing alignment O(nm 2 ) – where m is the the max string length Does not always produce an optimal alignment

Phylogenetic Tree Construction Multiple alignments are often performed on similar species Next step: Construct an evolutionary tree of these species Input to problem: evolutionary distances between each pair Not the same as similarity score Example: edit distance--count number of indels needed to transform one string to the other

Problem Formulation Given a set of species and evolutionary distance between each pair (2d matrix of numbers), construct a phylogenetic tree consistent with the distances Details needed: Characteristics/constraints of a tree Notion of “consistent”

Phylogenetic Tree Rooted Leaves are species Internal nodes are speculated ancestors Distances associated with each edge

Example ancestor cat bat dog 14 23

Minimizing Distance Deviation Phylogenetic tree implies pairwise distances (sum all edge distances in path) Compare with input distances to assess consistency Sum of Differences Alternative computations for deviation (least squares)

Example ancestor cat bat dog Suppose Input: D(dog,cat): 4 D(dog,bat): 7 D(cat,bat): 10 Implied by Tree: D(dog,cat): 5 D(dog,bat): 7 D(cat,bat): 8 Difference = (5- 4) + (7-7) + (10-8) = 3

Character-based Tree Construction Input: a set C of m characters possible values for each character a set S of n species where each element of S is associated with a value for each character in C Output: a phylogenetic tree T such that Elements in S are the leaves of T Each internal node of T has an assigned value per character Each character value induces a connected subtree of T

Mutations on Genomes Mutations can occur at the gene level (indels of nucleotides) or at the genome level (operations on genes) At the genome level, for similar species, the operations are rearrangements Reversals Transpositions Translocations

Genomes, Chromosomes, and Permutations Genome: set of chromosomes Chromosome: sequence of genes Genes: unique across entire genome Can simplify the representation of a genome by arbitrarily concatenating the chromosomes Result: a permutation of a set of unique genes

Determining Genomic Distance Recall: Phylogenetic tree construction need pair-wise distances For similar species with the same set of genes: Can use number of rearrangement operations for distance Common distance model: obtain the fewest (optimal) rearrangements necessary to transform genome X to genome Y

Some Assumptions Made Can limit allowed rearrangements to some reasonable subset of possible arrangements e.g., reversals apparently most common for plant species Sometimes, gene-blocks instead of individual genes are the elements that are permuted

Problem Formulations Given permutations (species) p and q of the set {1…n}, find the shortest sequence of rearrangements (mutations) that transform p to q p.r 1. r 2 … r s = q Given a permutation p, find the shortest sequence of rearrangements that sort p p.r 1. r 2 … r s = … n