BNFO 602 Multiple sequence alignment Usman Roshan.

Slides:



Advertisements
Similar presentations
COFFEE: an objective function for multiple sequence alignments
Advertisements

A Hidden Markov Model for Progressive Multiple Alignment Ari Löytynoja and Michel C. Milinkovitch Appeared in BioInformatics, Vol 19, no.12, 2003 Presented.
Multiple alignment: heuristics. Consider aligning the following 4 protein sequences S1 = AQPILLLV S2 = ALRLL S3 = AKILLL S4 = CPPVLILV Next consider the.
Sequence Similarity. The Viterbi algorithm for alignment Compute the following matrices (DP)  M(i, j):most likely alignment of x 1 …x i with y 1 …y j.
CIS786, Lecture 7 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
1 Protein Multiple Alignment by Konstantin Davydov.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Expected accuracy sequence alignment
Lecture 1 BNFO 240 Usman Roshan. Course overview Perl progamming language (and some Unix basics) Sequence alignment problem –Algorithm for exact pairwise.
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.
BNFO 602 Lecture 2 Usman Roshan. Sequence Alignment Widely used in bioinformatics Proteins and genes are of different lengths due to error in sequencing.
CS262 Lecture 9, Win07, Batzoglou Phylogeny Tree Reconstruction
BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Sequence alignment.
Multiple sequence alignments and motif discovery Tutorial 5.
Pairwise profile alignment Usman Roshan BNFO 601.
Multiple sequence alignment
Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols.
BNFO 602 Multiple sequence alignment Usman Roshan.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Protein Multiple Sequence Alignment Sarah Aerni CS374 December 7, 2006.
Marina Sirota CS374 October 19, 2004 P ROTEIN M ULTIPLE S EQUENCE A LIGNMENT.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
CIS786, Lecture 6 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Multiple Sequence Alignments
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Needleman-Wunsch with affine gaps
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequence alignment Che-Lun Hung,
Chapter 3 Computational Molecular Biology Michael Smith
Progressive multiple sequence alignments from triplets by Matthias Kruspe and Peter F Stadler Presented by Syed Nabeel.
Expected accuracy sequence alignment Usman Roshan.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Expected accuracy sequence alignment Usman Roshan.
Multiple Sequence Alignment
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Local alignment and BLAST Usman Roshan BNFO 601. Local alignment Global alignment recursions: Local alignment recursions.
Multiple Sequence Alignment (cont.) (Lecture for CS397-CXZ Algorithms in Bioinformatics) Feb. 13, 2004 ChengXiang Zhai Department of Computer Science University.
Multiple Sequence Alignments. The Global Alignment problem AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC x y z.
Sequence alignment CS 394C: Fall 2009 Tandy Warnow September 24, 2009.
Evaluation of protein alignments Usman Roshan BNFO 236.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
BIOINFORMATICS Ayesha M. Khan Spring Lec-6.
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Local alignment and BLAST
BNFO 602 Lecture 2 Usman Roshan.
BNFO 602 Lecture 2 Usman Roshan.
Presentation transcript:

BNFO 602 Multiple sequence alignment Usman Roshan

Optimal pairwise alignment Sum of pairs (SP) optimization: find the alignment of two sequences that maximizes the similarity score given an arbitrary cost matrix. We can find the optimal alignment in O(mn) time and space using the Needleman-Wunsch algorithm. Recursion: Traceback: where M(i,j) is the score of the optimal alignment of x 1..i and y 1..j, s(x i,y j ) is a substitution scoring matrix, and g is the gap penalty

Multiple sequence alignment “Two sequences whisper, multiple sequences shout out loud”---Arthur Lesk Computationally very hard---NP-hard

Multiple sequence alignment Unaligned sequences GGCTT TAGGCCTT TAGCCCTTA ACACTTC ACTT Aligned sequences _G_ _ GCTT_ TAGGCCTT_ TAGCCCTTA A_ _CACTTC A_ _C_ CTT_ Conserved regions help us to identify functionality

Sum of pairs score

What is the sum of pairs score of this alignment? 1. Since computing the alignment with the optimal SP score is NP-hard we resort to heuristics. 2. Plenty of work done in this area. Many standard heuristic approaches in computer science have been applied. 3. Popular programs are based on profiles.

Profile A profile can be described by a set of vectors of nucleotide/residue frequencies. For each position i of the alignment, we we compute the normalized frequency of nucleotides A, C, G, and T

Aligning a profile vector to a nucleotide ClustalW/MUSCLE –Let f be the profile vector –Score(f,j)= –where S(i,j) is substitution scoring matrix

Iterative alignment (heuristic for sum-of-pairs) Pick a random sequence from input set S Do (n-1) pairwise alignments and align to closest one t in S Remove t from S and compute profile of alignment While sequences remaining in S –Do |S| pairwise alignments and align to closest one t –Remove t from S

Iterative alignment Once alignment is computed randomly divide it into two parts Compute profile of each sub-alignment and realign the profiles If sum-of-pairs of the new alignment is better than the previous then keep, otherwise continue with a different division until specified iteration limit

Progressive alignment Idea: perform profile alignments in the order dictated by a tree Given a guide-tree do a post-order search and align sequences in that order Widely used heuristic

Popular alignment programs ClustalW: most popular, progressive alignment MUSCLE: progressive and iterative combination; uses the log expectation score T-COFFEE: consistency based alignment; align sequences in multiple alignment to be close to the optimal pairwise alignment PROBCONS: expected accuracy; probabilistic consistency progressive based scheme MAFFT: alignment based on Fast Fourier Transform

Evaluation of multiple sequence alignments Compare to benchmark “true” alignments Use simulation Measure conservation of an alignment Measure accuracy of phylogenetic trees How well does it align motifs? More…

Benchmarking alignment programs 15/4917.abstract