Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.

Slides:



Advertisements
Similar presentations
Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Advertisements

FA08CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Hidden Markov Model in Biological Sequence Analysis – Part 2
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
. Sequence Alignment III Lecture #4 This class has been edited from Nir Friedman’s lecture which is available at Changes made by.
Optimal Sum of Pairs Multiple Sequence Alignment David Kelley.
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Multiple String.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 14.10: Common Multiple.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
Computational Genomics Lecture #3a Much of this class has been edited from Nir Friedman’s lecture which is available at Changes.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Multiple alignment: heuristics
Similar Sequence Similar Function Charles Yan Spring 2006.
Multiple Sequence alignment Chitta Baral Arizona State University.
Sequence Alignment III CIS 667 February 10, 2004.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
PAM250. M. Dayhoff Scoring Matrices Point Accepted Mutations or PAM matrices Proteins with 85% identity were used -> the function is not significantly.
Multiple Sequence Alignments
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Multiple Sequence Alignment
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Phylogenetic Tree Construction and Related Problems Bioinformatics.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
. Pairwise and Multiple Alignment Lecture #4 This class has been edited from Nir Friedman’s lecture which is available at Changes.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Sequence Alignments
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Gene expression & Clustering (Chapter 10)
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Multiple Sequence Alignment. How to score a MSA? Very commonly: Sum of Pairs = SP Compute the pairwise score of all pairs of sequences and sum them. Gap.
Chapter 3 Computational Molecular Biology Michael Smith
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Pairwise Sequence Alignment. Three modifications for local alignment The scoring system uses negative scores for mismatches The minimum score for.
Multiple String Comparison – The Holy Grail. Why multiple string comparison? It is the most critical cutting-edge toοl for extracting and representing.
Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.
Multiple sequence alignment (msa)
Multiple Sequence Alignment (I)
Computational Genomics Lecture #3a
MULTIPLE SEQUENCE ALIGNMENT
Presentation transcript:

Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14

Why care about similarity? Similar sequences have similar structure

Similar structure -> similar sequence? No, the converse is not true! Convergent evolution. Outwardly similar solutions to similar problems may be internally different. Tiger and ‘Tasmanian tiger’. Fish and dolphin. Bat and bird. Same is true of molecular ‘species’ and ‘anatomies’!

Sequence --> function Similar sequences have similar function ‘[T]he same genes that work in flies are the ones that work in humans.’ -- Eric Wieshaus 1995 Nobel for drosophila work

Common origins Similar sequences have common origins ‘Descent with modification’ is Nature’s design mechanism Strong similarity may imply recent common origin (what do we mean by ‘strong’ and ‘recent’?) Strong similarity may imply strong conservation of sequence or motif

Is multiple sequence comparison a generalization? From cs point of view, we’re going from two strings to many strings, a generalization Yes, in that it helps detect faint similarities No, in that we go from known biological similarity to suspected sequence similarity

‘Big’ uses for MSC Represent protein families Identify conserved sequence features Deduce evolutionary history

Profile representation Definition Given a multiple alignment of a set of strings, a profile specifies for each column the frequency of each character

Profile example Alignment a b c - a a b a b a a c c b - c b - b c Profile C1 C2 C3 C4 C5 a b c d

Fit string S to profile P Given a profile P and a string S, what is the best alignment (fit) of S to P? Example: S: A a b - b c P:

Two key issues How to score an alignment of a string to a profile How to compute an optimal alignment, given a scoring system

Scoring and alignment of profile Scoring Assuming letter-to-letter scores are given, use the weighted sum for each column Optimal alignment By DP, similar to S-S optimal alignment Q: How would you do profile-to-profile scoring and alignment?

Signature (motif) representation A motif is a regular expression (re) Example: a helicase motif [&H][&AD[DE]x n [TSN][x 4 ][QK]Gx 7 [&A], where –[abc] = any of a,b,c –& = [ILVMFYW] –x = any amino –a 3 = up to 3 a’s –a n = any number of a’s Find a motif by grep-ing

Finding optimal MS alignment Need a scoring system Given a scoring system, an (efficient) method of calculation If no efficient method of getting the right answer, an efficient way of getting a plausible answer

Need MSC measure Desirable characteristics: –variable number of sequences –column-wise calculation –order independence MQPILLL MLR-LL- MK-ILLL MPPVLIL

Sum-of-pairs (SP) measure Column score = sum pairwise scores k Choose 2 pairs Reduces to pairwise alignment when k = 2 Need to assign (-,-) value May compute in either row or column order

DP approach Generalization of two-sequence comparison k-dimensional array space complexity is O(n k ) MSC with SP measure is NP-complete

MSA speedup heuristic This ‘heuristic’ guarantees the right answer! But.. it doesn’t guarantee the speedup General idea: –find a lower bound on L –if value for a cell exceeds L, it cannot enter into opt solution

Commonly method -- iterative Simplest implementation Begin with S i and S j which are pairwise closest Iteratively merge in additional string with smallest edit distance from any in multiple alignment Equivalent to finding MSP on edit tree

Clustering method Almost any clustering algorithm can be adapted to MSC Usually start with small clusters and build big ones Also possible start with big cluster, and divide-and-conquer Not clear which method is best