1 CAP5510 – Bioinformatics Multiple Alignment Tamer Kahveci CISE Department University of Florida.

Slides:



Advertisements
Similar presentations
Bioinformatics Multiple sequence alignments Scoring multiple sequence alignments Progressive methods ClustalW Other methods Hidden Markov Models Lecture.
Advertisements

Multiple Sequence Alignment (MSA) I519 Introduction to Bioinformatics, Fall 2012.
BNFO 602 Multiple sequence alignment Usman Roshan.
1 “INTRODUCTION TO BIOINFORMATICS” “SPRING 2005” “Dr. N AYDIN” Lecture 4 Multiple Sequence Alignment Doç. Dr. Nizamettin AYDIN
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
Lecture 8: Multiple Sequence Alignment
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple Alignment. Outline Problem definition Can we use Dynamic Programming to solve MSA? Progressive Alignment ClustalW Scoring Multiple Alignments.
Multiple sequence alignments and motif discovery Tutorial 5.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Multiple Sequence Alignment Mult-Seq-Align allows to detect similarities which cannot be detected with Pairwise-Seq-Align methods. Detection of family.
Multiple alignment: heuristics
Multiple sequence alignment
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 23rd, 2014.
Multiple Sequence Alignments
Project Phase II Report l Due on 10/20, send me through l Write on top of Phase I report. l 5-20 Pages l Free style in writing (use 11pt font or.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Needleman-Wunsch with affine gaps
Bioinformatics Sequence Analysis III
Scoring a multiple alignment Sum of pairsStarTree A A C CA A A A A A A CC CC.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Construction of Substitution Matrices
Using Traveling Salesman Problem Algorithms to Determine Multiple Sequence Alignment Orders Weiwei Zhong.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Multiple sequence alignment
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Construction of Substitution matrices
Multiple Sequence Alignment
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
9/19/07BCB 444/544 F07 ISU Dobbs #13- Star Alignment; HMMs1 BCB 444/544 Lecture 13 Star Alignment & Clustal (for MSA) Perhaps: Profiles & Hidden Markov.
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
Multiple alignment One of the most essential tools in molecular biology Finding highly conserved subregions or embedded patterns of a set of biological.
Multiple sequence alignment (msa)
Multiple Sequence Alignment
Multiple Sequence Alignment
In Bioinformatics use a computational method - Dynamic Programming.
Sequence Based Analysis Tutorial
Multiple Sequence Alignment
Introduction to Bioinformatics
Presentation transcript:

1 CAP5510 – Bioinformatics Multiple Alignment Tamer Kahveci CISE Department University of Florida

2 Goals Understand –What is multiple alignment –Why align multiple sequences Learn –How multiple alignments are scored –Major multiple alignment methods Dynamic programming –Standard –MSA Progressive alignment –Star –CLUSTALW

3 What is Multiple Alignment? Alignment of more than two sequences Global: multiple alignment – scxa_buteu vrdgyiaddk dcayfcgr...naycdeeck...kgaesgk cwyagqygna scx1_titse.kdgypveyd ncayicwnyd.naycdklck..dkkadsgy cyw...vhil scx6_titse.regypadsk gckitcflta.agycntect..lkkgssgy caw.....pa scx1_cenno.kdgylvdak gckkncyklg kndycnrecr mkhrggsygy c.....ygfg six2_leiqu..dgyirkrd gcklsclfg..negcnkeck..syggsygy cwt...wgla scxa_buteu cwcyklpdwv pikqkvsgk. cn.... scx1_titse cycyglpdse ptktn..gk. cksgkk scx6_titse cycyglpesv kiwtsetnk. c..... scx1_cenno cyceglsdst ptwplp.nkt csgk.. six2_leiqu cwceglpd.e ktwksetn.t cg....

4 What is Local Multiple Alignment? Local: motif Local: motif ( ) ID HISTONEH5; BLOCK AC PR00624A; distance from previous block=(9,12) DE Histone H5 signature BL adapted; width=22; seqs=9; 99.5%=986; strength=1407 H10_HUMAN|P07305H10_HUMAN|P07305 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 H5A_XENLA|P22844H5A_XENLA|P22844 ( 11) AKPKRSKALKKSTDHPKYSDMI 71 H10_RAT|P43278H10_RAT|P43278 ( 10) AKPKRAKAAKKSTDHPKYSDMI 70 H10_MOUSE|P10922H10_MOUSE|P10922 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 Q91759Q91759 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5B_XENLA|P22845H5B_XENLA|P22845 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5_CHICK|P02259H5_CHICK|P02259 ( 11) AKPKRVKASRRSASHPTYSEMI 100 H5_CAIMO|P06513H5_CAIMO|P06513 ( 12) AKPKRAKAPRKPASHPSYSEMI 91 H5_ANSAN|P02258H5_ANSAN|P02258 ( 12) AKPKRARAPRKPASHPTYSEMI 100

5 Why Multiple Alignment Basis for phylogeny Helps find conserved regions in sets of proteins –Conserved regions Provide insight into substitution patterns Gives hints about functional sites

6 How to Evaluate Multiple Alignments

7 Sum of Pairs (SP) Sum of induced pairwise alignment score of all pairs Ignore space pairs aligned together A cwcyklpdwv pikqkvsgk. cn.... B cycyglpdse ptktn..gk. cksgkk C cycyglpesv kiwtsetnk. c..... D cyceglsdst ptwplp.nkt csgk.. A cwcyklpdwv pikqkvsgk cn.... B cycyglpdse ptktn..gk cksgkk A cwcyklpdwv pikqkvsgk cn C cycyglpesv kiwtsetnk c. A cwcyklpdwv pikqkvsgk. cn.. D cyceglsdst ptwplp.nkt csgk B cycyglpdse ptktn..gk cksgkk C cycyglpesv kiwtsetnk c..... B cycyglpdse ptktn.gk. cksgkk D cyceglsdst ptwplpnkt csgk.. C cycyglpesv kiwtsetnk. c... D cyceglsdst ptwplp.nkt csgk +

8 BAliBASE Benchmark Compare to a set of hand-aligned sequences Check positions of letters –If the letters appear at the same position as the benchmark => good Score between 0 ( ) and 1 ( ) strasbg.fr/BioInfo/BAliBASE/prog_scores.htmlhttp://www-igbmc.u- strasbg.fr/BioInfo/BAliBASE/prog_scores.html

9 Finding Multiple Alignments

10 Dynamic Programming

11 Similar to pairwise alignment –Compare NV and NS Dynamic Programming If k sequences are aligned –=> k-dimensional matrix is filled V S NV NS = max N + V N S N + V N - N + - N S = 3 cases

12 V S A k=3 2 k –1=7 cases Dynamic Programming

13 Complexity Space complexity: O(n k ) for k sequences each n long. Computing at a cell: O(2 k ). cost of computing δ. Time complexity: O(2 k n k ). cost of computing δ. Finding the optimal solution is exponential in k Proven to be NP-complete for a number of cost functions

14 MSA (Carrillo, Lipman’ 88)

15 MSA – Idea 1 2 3

16 MSA algorithm (1/3) Find pairwise alignment Trial multiple alignment produced by a tree, cost = d This provides a limit to the volume within which optimal alignments are found Specifics –Sequences x 1,.., x r. –Alignment A, cost = c(A) –Optimal alignment A* –A ij = induced alignment on x i,.., x j on account of A –D(x i,x j ) = cost of optimal pairwise alignment of x i,x j <= c(A ij )

17 i < j (i,j) ≠ (u,v) i < j (i,j) ≠ (u,v) MSA algorithm (2/3) d >= c(A*) = c(A* uv ) + Σ c(A* ij ) >= c(A* uv ) + Σ D(x i,x j ) c(A* uv ) <= d - Σ D(x i,x j ) = B(u,v) Compute B(u,v) for each pair of u,v Consider any cell f with projection (s,t) on u,v plane. If A* passes through f then A* uv passes through (s,t) –best st uv = best pairwise alignment of x u,x v that passes through (s,t). –best st uv = distance of the prefixes up to (s,t) + cost(x s i,x s j ) + distance of suffixes after (s,t) i < j (i,j) ≠ (u,v)

18 MSA algorithm (3/3) If best st uv > B(u,v), then –A* cannot pass through cell f –Discard such cells from computation of DP

19 Question s 1 : MPE s 2 : MKE s 3 : MSKE s 4 : SKE Align : BLOSUM 62

20 Progressive Alignment

21 Star Alignment

22 Star Alignments Heuristic method for multiple sequence alignments Select a sequence c as the center of the star For each sequence x 1, …, x k such that x i  c, perform a Needleman-Wunsch global alignment for x i and c

23 Star Alignments Example s2s2 s1s1 s3s3 s4s4 s 1 : MPE s 2 : MKE s 3 : MSKE s 4 : SKE MPE | MKE MSKE | || M-KE SKE || MKE MPE MKE M-PE M-KE MSKE S-KE M-PE M-KE MSKE All induced pairwise alignments to the center sequence is the optimal one. How should we choose a center? (Exercise: try s4 as the center) Try all of them?

24 CLUSTAL-W (Thompson, Higgins, Gibson 1994)

25 CLUSTAL-W (1/4) Given sequences A, B, C, D, E Compare all pairs and construct a distance matrix ABCDE A B C D E

26 CLUSTAL-W (2/4) Find phylogenetic tree for A, B, C, D, E using neighbor joining DB A C E DB A C E DBACE DB A C E

27 CLUSTAL-W (3/4) Align sequences starting from leaf level –Edge weights are used to compute the score of the alignment DBACE O(k 2 n 2 ) time O(n 2 ) space Result depends on sequence order

28 CLUSTAL-W (4/4) Sample query using ClustalW

29 Other Progressive Methods T-COFFEE PILUP Muscle …

30 T-coffee (Notredame, Higgins, Heringa 2000) Find a library of alignments between pairs of sequences. Create a new scoring matrix for each pair of sequences using the library –Directly from alignment of s1 and s2 –Indirectly through alignment of s1, s3 and s3, s2. s1 s2 Scoring matrix for s1 and s2 Use these scoring matrices during progressive alignment

31 Iterative Alignment

32 PRRP (Gotoh 1996) Motivation: If the initial sequences are not good ones, progressive alignment fails. Idea: Iteratively update the alignment

33 PRRP DBACE 2. Construct phylogenetic tree based on multiple alignment A cwcyklpdwv pikqkvsgk. cn.... B cycyglpdse ptktn..gk. cksgkk C cycyglpesv kiwtsetnk. c..... D cyceglsdst ptwplp.nkt csgk.. E cyceglpdst piwplp.nkt ctgk.. 3. Align sequences A cwcyklpdwv pikqkvsgk. cn.... B cycyglpdse ptktn..gk. cksgkk C cycyglpesv kiwtsetnk. c..... D cyceglsdst ptwplp.nkt csgk.. E cyceglpdst piwplp.nkt ctgk.. 1. Find some initial alignment Go back if the result has improved

34 Other methods Genetic algorithm (machine learning) Partial order graphs (graph matching) HMMER (hidden markov model) For a comparison: –

35 Motif Logos ID HISTONEH5; BLOCK AC PR00624A; distance from previous block=(9,12) DE Histone H5 signature BL adapted; width=22; seqs=9; 99.5%=986; strength=1407 H10_HUMAN|P07305H10_HUMAN|P07305 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 H5A_XENLA|P22844H5A_XENLA|P22844 ( 11) AKPKRSKALKKSTDHPKYSDMI 71 H10_RAT|P43278H10_RAT|P43278 ( 10) AKPKRAKAAKKSTDHPKYSDMI 70 H10_MOUSE|P10922H10_MOUSE|P10922 ( 10) AKPKRAKASKKSTDHPKYSDMI 63 Q91759Q91759 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5B_XENLA|P22845H5B_XENLA|P22845 ( 9) AKPRRSKASKKSTDHPKYSDMI 71 H5_CHICK|P02259H5_CHICK|P02259 ( 11) AKPKRVKASRRSASHPTYSEMI 100 H5_CAIMO|P06513H5_CAIMO|P06513 ( 12) AKPKRAKAPRKPASHPSYSEMI 91 H5_ANSAN|P02258H5_ANSAN|P02258 ( 12) AKPKRARAPRKPASHPTYSEMI 100