Introduction to bioinformatics 2007 Lecture 10

Slides:



Advertisements
Similar presentations
Introduction to bioinformatics Lecture 9 Multiple sequence alignment (3)
Advertisements

Multiple Alignment Anders Gorm Pedersen Molecular Evolution Group
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Introduction to Bioinformatics
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Structural bioinformatics
Sequence analysis lecture 6 Sequence analysis course Lecture 6 Multiple sequence alignment 2 of 3 Multiple alignment methods.
Sequence analysis course Lecture 7 Multiple sequence alignment 3 of 3 Optimizing progressive multiple alignment methods.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 07/01/08 Multiple sequence alignment 2 Sequence analysis 2007 Optimizing.
Multiple alignment: heuristics
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Multiple Sequence Alignments
1-month Practical Course Genome Analysis Lecture 5: Multiple Sequence Alignment Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Lecture 6 – 16/11/06 Multiple sequence alignment 1 Sequence analysis 2006 Multiple.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Chapter 5 Multiple Sequence Alignment.
Multiple sequence alignment
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Practical multiple sequence algorithms Sushmita Roy BMI/CS 576 Sushmita Roy Sep 24th, 2013.
Pair-wise alignment quality versus sequence identity (Vogt et al., JMB 249, ,1995)
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Protein Sequence Alignment and Database Searching.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Only things that are homologous should be compared in a phylogenetic analysis Homologous – sharing a common ancestor This is true for.
Copyright OpenHelix. No use or reproduction without express written consent1.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Medical Natural Sciences Year 2: Introduction to Bioinformatics Lecture 9: Multiple sequence alignment (III) Centre for Integrative Bioinformatics VU.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Grundlagen der Bioinformatik Multiples Sequenzalignment Juni 2007.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
Introduction to bioinformatics Lecture 7 Multiple sequence alignment (1)
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Multiple Sequence Alignment Dr. Urmila Kulkarni-Kale Bioinformatics Centre University of Pune
Pairwise alignment Now we know how to do it: How do we get a multiple alignment (three or more sequences)? Multiple alignment: much greater combinatorial.
Multiple Sequence Alignment
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Rules of thumb when looking at a multiple alignment (MA)
Introduction to bioinformatics 2008 Lecture 8
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Multiple Sequence Alignment
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
Multiple sequence alignment Why?
1-month Practical Course
Protein structure prediction.
Homology modelling by distance geometry
Introduction to bioinformatics 2007 Lecture 9
Introduction to bioinformatics lecture 9
Introduction to bioinformatics Lecture 8
Presentation transcript:

Introduction to bioinformatics 2007 Lecture 10 G A V B M S U Introduction to bioinformatics 2007 Lecture 10 Multiple Sequence Alignment (II)

Progressive multiple alignment 1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Scores Similarity matrix 5×5 Scores to distances Iteration possibilities Guide tree Multiple alignment

Progressive alignment strategy Perform pair-wise alignments of all of the sequences (all against all; e.g. make N(N-1)/2 alignments); Use the alignment scores to make a similarity (or distance) matrix Use that matrix to produce a guide tree; Align the sequences successively, guided by the order and relationships indicated by the tree (N-1 alignment steps).

Progressive alignment strategy Methods: Biopat (Hogeweg and Hesper 1984 -- first integrated method ever) MULTAL (Taylor 1987) DIALIGN (1&2, Morgenstern 1996) PRRP (Gotoh 1996) ClustalW (Thompson et al 1994) PRALINE (Heringa 1999) T-Coffee (Notredame 2000) POA (Lee 2002) MUSCLE (Edgar 2004) PROBSCONS (Do, 2005)

Pair-wise alignment quality versus sequence identity (Vogt et al Pair-wise alignment quality versus sequence identity (Vogt et al., JMB 249, 816-831,1995)

Flavodoxin fold: aligning 13 Flavodoxins + cheY

Flavodoxin-cheY NJ tree

Flavodoxin fold: helix-beta-helix Flavodoxins are mainly bacterial proteins (prokaryotes and cyanobacteria) but also found in some eukaryotic algae. They are electron-transfer proteins that function in various electron transport systems. They have a alpha/beta fold: they consist of three layers with two alpha-helical layers sandwiching a 5-stranded parallel beta-sheet.

Flavodoxin family - TOPS diagrams The basic topology of the flavodoxin fold is given below, the other four TOPS diagrams show flavodoxin folds with local insertions of secondary structure elements. 4 3 2 5 4 3 1 2 -helix -strand 5 1

Flavodoxin-cheY NJ tree

Flavodoxin-cheY: Pre-processing (prepro1500)

Protein structure hierarchical levels VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH PRIMARY STRUCTURE (amino acid sequence) SECONDARY STRUCTURE (helices, strands) QUATERNARY STRUCTURE (oligomers) TERTIARY STRUCTURE (fold)

Clustal, ClustalW, ClustalX CLUSTAL W/X (Thompson et al., 1994) uses Neighbour Joining (NJ) algorithm (Saitou and Nei, 1984), widely used in phylogenetic analysis, to construct a guide tree (see lecture on phylogenetic methods). Sequence blocks are represented by profile, in which the individual sequences are additionally weighted according to the branch lengths in the NJ tree. Further carefully crafted heuristics include: (i) local gap penalties (ii) automatic selection of the amino acid substitution matrix, (iii) automatic gap penalty adjustment (iv) mechanism to delay alignment of sequences that appear to be distant at the time they are considered. CLUSTAL (W/X) does not allow iteration (Hogeweg and Hesper, 1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002)

ClustalW web-interface

CLUSTAL X (1.64b) multiple sequence alignment Flavodoxin-cheY 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-ETTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE-DLDRAGLKDKK FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-DVELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD-SLENADLKGKK FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-EVTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE-EFNRFGLAGRK FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-EVKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID-ESSEFNLEGKL FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-DVESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF-TDLAPKLKGKK 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-DVNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI-EEISTKISGKK FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT----LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS-ELDDVDFNGKL FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD---ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP-KIEGLDFSGKT 2fcr --KIGIFFSTSTGNTTEVADFIGKTLGAKADAP---IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLYDKLPEVDMKDLP FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP---LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN-TLSEADLTGKT FLAV_ECOLI -AITGIFFGSDTGNTENIAKMIQKQLGKDVAD----VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP-TLEEIDFNGKL 3chy --ADKELKFLVVDDFSTMRRIVRNLLKELG----FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG-LELLKTIR--- . ... : . . : 1fx1 VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------- FLAV_DESVH VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------- FLAV_DESGI VGVFGCGDSSYTYF--CGAVDVIEKKAEELGATLVASS----------------LKIDGEPDSAE--VLDWAREVLARV--------------- FLAV_DESSA VSVFGCGDSDYTYF--CGAVDAIEEKLEKMGAVVIGDS----------------LKIDGDPERDE--IVSWGSGIADKI--------------- FLAV_DESDE VAAFASGDQEYEHF--CGAVPAIEERAKELGATIIAEG----------------LKMEGDASNDPEAVASFAEDVLKQL--------------- FLAV_CLOAB GAAFSTANSIAGGS--DIALLTILNHLMVKGMLVYSGGVA----FGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF----------- FLAV_MEGEL VGLFGSYGWGSGE-----WMDAWKQRTEDTGATVIGTA----------------IVN-EMPDNAPECKE-LGEAAAKA---------------- 4fxn VALFGSYGWGDGK-----WMRDFEERMNGYGCVVVETP----------------LIVQNEPDEAEQDCIEFGKKIANI---------------- FLAV_ANASP VAYFGTGDQIGYADNFQDAIGILEEKISQRGGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------ FLAV_AZOVI VALFGLGDQVGYPENYLDALGELYSFFKDRGAKIVGSWSTDGYEFESSEAVV-DGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---- 2fcr VAIFGLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVR-DGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ENTAG VALFGLGDQLNYSKNFVSAMRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL------- FLAV_ECOLI VALFGCGDQEDYAEYFCDALGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA 3chy AD--GAMSALPVL-----MVTAEAKKENIIAAAQAGAS----------------GYV-VKPFTAATLEEKLNKIFEKLGM-------------- . . : . . The secondary structures of 4 sequences are known and can be used to asses the alignment (red is -strand, blue is -helix)

There are problems … Accuracy is very important !!!! Progressive multiple alignment is a greedy strategy: Alignment errors during the construction of the MSA cannot be repaired anymore and these errors are propagated into later progressive steps. Comparisons of sequences at early steps during progressive alignment cannot make use of information from other sequences. It is only later during the alignment progression that more information from other sequences (e.g. through profile representation) becomes employed in the alignment steps. MAIN PROBLMES: 1) The progressive alignment protocol suffers from its greediness and is not able to revise any of the alignments made earlier, so that any alignment errors during the construction of the MSA cannot be repaired anymore. 2) The comparisons of sequences at early steps during progressive alignments cannot make use of information from other sequences, so that proper positional information required for correct matching is not available at early stages. 3) It is only later during the alignment progression that more information from other sequences (e.g. through profile representation) becomes employed in the alignment steps, but quite possibly after misalignment has already taken place.

“Once a gap, always a gap” Progressive multiple alignment “Once a gap, always a gap” Feng & Doolittle, 1987

Additional strategies for multiple sequence alignment Profile pre-processing (Praline) Secondary structure-induced alignment Globalised local alignment Matrix extension Objective: try to avoid (early) errors

PRALINE web-interface

Profile pre-processing 1 Score 1-2 2 1 Score 1-3 3 4 5 Score 4-5 1 Key Sequence 2 1 3 Pre-alignment 4 5 Master-slave (N-to-1) alignment A C D . Y 1 Pre-profile Pi Px

Pre-profile generation 1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Cut-off Pre-alignments Pre-profiles 1 1 A C D . Y 2 3 4 5 2 2 A C D . Y 1 3 4 5 5 A C D . Y 5 1 2 3 4

Pre-profile alignment Pre-profiles 1 A C D . Y 2 A C D . Y Final alignment 3 A C D . Y 1 2 3 4 5 4 A C D . Y 5 A C D . Y

Pre-profile alignment 1 1 2 3 4 5 2 2 1 3 4 Final alignment 5 3 3 1 1 2 2 4 3 5 4 4 5 4 1 2 3 5 5 5 1 2 3 4

Pre-profile alignment Alignment consistency Ala131 1 1 1 2 3 A131 L133 C126 4 5 2 2 1 2 3 4 5 3 3 1 2 4 5 4 4 1 2 5 3 5 5 5 1 2 3 4

PRALINE pre-profile generation Idea: use the information from all query sequences to make a pre-profile for each query sequence that contains information from other sequences You can use all sequences in each pre-profile, or use only those sequences that will probably align ‘correctly’. Incorrectly aligned sequences in the pre-profiles will increase the noise level. Select using alignment score: only allow sequences in pre-profiles if their alignment with the score higher than a given threshold value. In PRALINE, this threshold is given as prepro=1500 (alignment score threshold value is 1500 – see next two slides)

Reliable sequences for pre-profiles The curve each time gives the number of pairwise alignments (y) scoring less than x. The range 1500<x<1800 shows a flat section of the curve that can serve as a natural cut-off point for admitting sequences into the pre-alignment blocks

Global pre-processing (prepro0) Preprocessed profile for sequence 2: 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKDLPVAIFGLGDAEGYPD 1fx1 KALIVYGSTTGNTEYTAETIARQL-ANAGYEVDSRDAASVEAFEGFDLVLLGCSTW--GDD---SIELQDDFLFDSLEETGAQGRKVACFGCGDS-SY-E 4fxn -MKIVYWSGTGNTEKMAELIAKGISGKDVNTINVSDVNIDELLNE-DILILGC---SAMGDEVLEESEFEPFIEEISTKISGKKVALGSYGWGDGKWMRD FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDVSEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNGKLVAYfGTGDQIGYAD FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKRFDTMSDA-LNVNRVS-AEDFAQYQFLILgTPTLGPGLSSDCENESWEEFL-PKIEGLDFSGKTVALfGLGDQVGYPE FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEVKDAVDKKFLQESEGIIFgTPTYYANISWEMK--KW----IDESSEFNLEGKLGAAfSTANAGGSDI FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAA-GGHEVTLLNAADASALADYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAGRKVAAfASGDQE-Y-E FLAV_DESGI KALIVYGSTTGNTEGVaEAIAKTLNSEGTTVVNVADVTAPGLAEGYDVVLLgCSTW--GDDEIELQEDFVP-LYEDLDRAGLKDKKVGVfGCGDS-SY-T FLAV_DESSA KSLIVYGSTTGNTETAaEYVAEAFENK-EIDVELKNVTDVSVANGYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKGKKVSVfGCGDSD-Y-T FLAV_DESVH KALIVYGSTTGNTEYTaETIAREL-ADAGYEVDSRDAASVEAFEGFDLVLLgCSTW--GDD---SIELQDDFLFDSLEETGAQGRKVACfGCGDS-SY-E FLAV_ECOLI AIGIFFGSDTGNTENIaKMIQKQLG--KDV-ADVHDISSKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNGKLVALfGCGDQEDYAE FLAV_ENTAG TIGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDVRRATREQFL-SYPVLLLgTPTLGDGLPGVEAGSSWQEFT-NTLSEADLTGKTVALfGLGDQLNYSK FLAV_MEGEL MVEIVYWSGTGNTEAMaNEIEAAVAAGADVSVRFED-TNVDDVASKDVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKGKKVGLfGYGWGSG--- 3chy KELKFLVVDDFSTRRIVRNLLKELGFNEEAEDGVDALNKLQA-GGYGFVI---SDWNM---PNMDGL---ELLKTIRADGAMSALPVLMV---TAEAKKE 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV 1fx1 YFCGAVDAIEEKLKNLGA----------------EIVQD----GLRID--GDPRAARDDIVGWAHDVRGAI-- 4fxn -FEERMNG-YGCVVVE--TPLIVQNEPD----EAE---------------QDCIEFGKKIANI---------- FLAV_ANASP NFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL FLAV_AZOVI NYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGL FLAV_CLOAB ALLTILNHVKgMLVYSGG--VAFGKPKTHGYVHINEIQENE------D-ENARI-fGERiANkVKQIF----- FLAV_DESDE HFCGAVPAI-----EERAKELg-----------ATIIAEG--LKMEGDASND--P--EAVASfAEDVLKQL-- FLAV_DESGI YFCGAVDVIEKKAEELgATLVA----------SSLKI-DGE-------------PDSAEVLDwAREVLARV-- FLAV_DESSA YFCGAVDAIEEKLEKMgAVVIGDSLKIDGDPERDEIVSwGS--G-----IADKI------------------- FLAV_DESVH YFCGAVDAIEEKLKNLgA----------------EIVQD----GLRID--GDPRAARDDIVGwAHDVRGAI-- FLAV_ECOLI YFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDHFVGLAID--EDRQPTAERVEKwVKQISEELHL FLAV_ENTAG NFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALENNEFVGLPLDQENQYDLTEERIDSwLEKL--KPAV FLAV_MEGEL EWMDAWKQRTE---DTgATVIG-----------TAIVNE-----MP-----DNAP-ECKElG--EAAAKA--- 3chy NIIAA--------AQAGAS--GY------------VVK--PFTAATLE--------EK-----LNKIFEKLGM Iteration -1 SP= 127728.00 AvSP= 10.705 SId= 3764 AvSId= 0.315

Global pre-processing (prepro0) Preprocessed profile for sequence 3: 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFE 1fx1 ALIVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACFGSYEYFCGA-VDAIE 2fcr IGIFFSTSTGNTTEVADFIGKTL--GAKADAPIDVDDVTDPQALKDDLLFLGANTGADTERSGTSWDEFLYDKLPEVDMKDLPV-AIFGLGDAEGYPDFC FLAV_ANASP IGLFYGTQTGKTESVaEIIRD---EFGNDVVTLDVSQAEVTDLNDYQYLIIgCPTWNIGEL-QSDWEGLYSELDVDFNGKLVAYfGTIGYADNDAIGILE FLAV_AZOVI IGLFFGSNTGKTRKVaKSIKKRFDDETMS-DALNVNRVSAEDFAQYQFLILgTPTLGEGELENESWEEFLPKIGLDFSGKTVALfGQVGYPEGELYSFFK FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMNLDAVDKKFLQESEGIIFgTPTYYANI--SWEMKKWIDESSENLEGKLGAAfSTAGGSDIALLTILN FLAV_DESDE VLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLNAADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAAfAS---GDQEYVPAIE FLAV_DESGI ALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGVfGSYTYFCGA-VDVIE FLAV_DESSA MSIVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSVfGDYTYFCGA-VDAIE FLAV_DESVH ALIVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACfGSYEYFCGA-VDAIE FLAV_ECOLI TGIFFGSDTGNTENIaKMIQK---QLGKDVADVDIAKSSKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVALfGDYAFCDAGTIRDIE FLAV_ENTAG IGIFFGSDTGQTRKVaKLIHQK-LDGIADA-PLDVRRATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVALfGNYSKNFVSAMRILY FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGLfGSYGWGSGEWMDAWK 3chy DKELKFLVVDDFSTMRRIVRNLLKELG--FNNVEEAEDGVD-ALNK-LQAGGYGVISDWNMPNMDGLELLKTI--RADGAMSALPVLMVTAEAKKENIIA 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI 1fx1 EKLKNLGAEIVQDGLRIDGDPRAARDDIVGWAHDVRGA 2fcr DAIEEHDCFAKQKPVGFSNPDDESKNDQIPMEKRVAGW FLAV_ANASP EKISGYGSKALRNGKFVGLALDEDNQDLTDDRIKVAQL FLAV_AZOVI DRTDGYEAVVVGLALDLDNQSGKTDERVAAwLAQIAPE FLAV_CLOAB HLMKgYGGVAFGKPYVHINEIQENEDENARfGERiANk FLAV_DESDE ERAKELgATIIAEGLKMEGDASNDPEAVASfAEDVLKQ FLAV_DESGI KKAEELgATLVASSLKIDGEPDSAE--VLDwAREVARV FLAV_DESSA EKLEKMgAVVIGDSLKIDGDPERDE--IVSwGSGIADI FLAV_DESVH EKLKNLgAEIVQDGLRIDGDPRAARDDIVGwAHDVRGA FLAV_ECOLI PRTAGYGLAFVGLAIDEDRQPELTAERVEKwVKQISEE FLAV_ENTAG DLVIARgCVVGNWPLLENNEPDQENQDLTELEKKPAVL FLAV_MEGEL QRTEDTgATVIGT-AIVNEMPDNA-PECKElGEAAAKA 3chy AAQAGASGYVVK-PFTAATLEEKLNKIFEKLGM----- Iteration -1 SP= 121196.00 AvSP= 10.075 SId= 3288 AvSId= 0.273

Reliable sequences for pre-profiles

Pre-profiles (prepro1500) 2

Pre-profiles (prepro1500) 13 14

Local pre-processing Local alignments are calculated from high to low scoring – each time the sequence parts corresponding to a selected local alignment are blocked such that a next local alignment has to emerge before or after the earlier selected one – this preserves co-linearity of the local alignments and assocaited sequence fragments in the pre-alignments

Local pre-processing (locprepro0) Preprocessed profile for sequence 2: 2fcr 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKDLPVAIFGLGDAEGYPD 1fx1 ...IVYGSTTGNTEYTAETIARQL---ANAGYEVDDAASVEAFEGFDLVLLGCSTW--GDDSELQ----DDFLFDSLEETGAQGRKVACFGCGDS-SY-E 4fxn KI-VYWS-GTGNTEKMAELIAKGIGKDVNT-INVSDVNIDELLNE-DILILGCSA--MGDEVEES--EFEPF----IEEISTKGKKVALFGWGDGKGYG- FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDVSEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNGKLVAYfGTGDQIGYAD FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKTM---SDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGSDCENE--SWEEFL-PKIEGLDFSGKTVALfGLGDQVGYPE FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEVKDAVDKKFLQESEGIIFgTPTY-------YANISWEKWI-DESSEFNLEGKLGAAfSTANSAGGSD FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAAAADA--SAENLAD-----GYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAGRKVAAfASGDQE-Y-E FLAV_DESGI ...IVYGSTTGNTEGVaEAIAKTLNSEGTTVVNVADVTAPGLAEGYDVVLLgCSTW--GDDIELQ----EDFLYEDLDRAGLKDKKVGVfGCGDS-SY-T FLAV_DESSA ...IVYGSTTGNTETAaEYVAEAFENK---EIDVENVTD-VSVADYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKGKKVSVfGCGDSD-Y-T FLAV_DESVH ...IVYGSTTGNTEYTaETIAREL---ADAGYEVDDAASVEAFEGFDLVLLgCSTW--GDDSELQ----DDFLFDSLEETGAQGRKVACfGCGDS-SY-E FLAV_ECOLI ..GIFFGSDTGNTENIaKMIQKQLG-K-----DVADVHDKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNGKLVALfGCGDQEDYAE FLAV_ENTAG .IGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDVRRATREQFL-SYPVLLLgTPT--LG-DGELPGVSWQEFT-NTLSEADLTGKTVALfGLGDQLNYSK FLAV_MEGEL .VEIVYWSGTGNTEAMaNEIEKAAGADVESDTNVDDV----ASK--DVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKGKKVGLfGYGWGSG--- 3chy ...........................................................ADKELKFLVVDDFIVRNL----LKEL-----GFNNVEEAED 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV 1fx1 YFCDAIEE------K--LKNLG-----------AEIVQD----GLRID--GD--PRAARIVGWAHDV...... 4fxn --CVVVE-----------TPLIVQNPDE---AEQDCIEFGK................................ FLAV_ANASP NFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL FLAV_AZOVI NYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGL FLAV_CLOAB ---IALLTIH-LMVKSGG--VAFGKPKTHGYVHINEIQENE------D-ENARI-fGERiANkVKQI...... FLAV_DESDE HFCGAVPAI-----EERAKELg-----------ATIIAEGKMEG---DASND--P--EAVASfAEDVLKQ... FLAV_DESGI YFCGAVDVIEKKAEELgATLVASSEPD------SAEVLD.................................. FLAV_DESSA YFCGAVDAIEEKLEKMgAVVIGDSLKIDGDPERDEIVSwGS--G-----IADKI................... FLAV_DESVH YFCDAIEE------K--LKNLg-----------AEIVQD----GLRID--GD--PRAARIVGwAHDV...... FLAV_ECOLI YFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDHFVGLAID--EDRQPTAERVEKwVKQISEE... FLAV_ENTAG NFVSAMRILYDLVIARgACVVG--NPEGYKFSFSAALENNEFVGLPLDQENQYDLTEERIDSwLEAVL..... FLAV_MEGEL EWMDAWKQTED----TgATVIGTANPDN............................................. 3chy G-VDALNKLQ-------AGGYGFSNMPNMDLELLKTIRDGAMSALPVLMVTAEAKKENIIAGYVAATLEE...

Local pre-processing (locprepro0) Preprocessed profile for sequence 3: 4fxn 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFE 1fx1 ..IVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACFGC---GDSSYVDAIE 2fcr .KIIFFSSTGNTTEVADFIGKTL---GAKADAIDVDDVTDPQALKDDLLFLGAPTTGADT-ERSSWDEFLPEVDMK--DLPVAIF---GLGDAE------ FLAV_ANASP ..LFYGTQTGKTESVaEIIRD---EFGNDVVTLDVSQAEVTDLNDYQYLIIgCPTIGE--L-QSDWEGLYSELDVDFNGKLVAYfGTIGYADGKWSTDFN FLAV_AZOVI ..LFFGSNTGKTRKVaKSIKKRFDETMSD--ALNVNRVSAEDFAQYQFLILgTPTLGEGELNESEFLPKIEGLD--FSGKTVALfGQVGYGEGSWSTD-- FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMNLDAVD-KKFLQEEGIIFgTPTMKKWIDESSEFN--LEAfSTANSGSDIALLGGVAFGKPK------ FLAV_DESDE ..IVFGSSTGNTEKLEELIAAG----GHEVTLLNAADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAAfAS---GDQEY-EHFE FLAV_DESGI ..IVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGVfGC---GDSSYTYDIE FLAV_DESSA ..IVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSVfGC---GDS----DYE FLAV_DESVH ..IVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACfGC---GDSSYVDAIE FLAV_ECOLI ..IFFGSDTGNTENIaKMIQK---QLGKDV--ADVHDISKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVALfGC---GD---QEDYA FLAV_ENTAG ..IFFGSDTGQTRKVaKLIHQGIADAPLDVRR-----ATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVALf---GLGDQNYSKNFV FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGLfGSYGWGSGEWMDAWK 3chy .RIV......N...LKEL---GFVEEAEDVDALNISDPNMDELLRADVLMVTAEAKKENIIAAAQVKPFLEEKLNKIFEK.................... 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI 1fx1 EKLKNLGAEIVQDGLRIDGDPRAARDDIV......... 2fcr ----GYPCDAIEKPVGFSN-PDDEESKSVRDGK..... FLAV_ANASP DSRNGVGLALDE-----DNQSDLTD-DRIEFG...... FLAV_AZOVI ----GYEAVVVGLALDLDNQTDELAQIAPEFG...... FLAV_CLOAB THL-GY----VHINEIQENEDENAR---I-fGERiAN. FLAV_DESDE ERAKELgATIIAEGLKMENDP-EAAEDVLK........ FLAV_DESGI KKAEELgATLVASSLKIDGEPDSAE--VLDwAREVARV FLAV_DESSA EKLEKMgAVVIGDSLKIDGDPERDE--IVSwGSGIAD. FLAV_DESVH EKLKNLgAEIVQDGLRIDGDPRAARDDIV......... FLAV_ECOLI E----YFCDALGTDII---EP................. FLAV_ENTAG SAMRg-ACVVGNWPLLENNEPDQENQDLTE........ FLAV_MEGEL QRTEDTgATVIGTAIV--NEPDNA-PECKElGE..... 3chy ......................................

CLUSTAL X (1.64b) multiple sequence alignment Flavodoxin-cheY 1fx1 -PKALIVYGSTTGNTEYTAETIARQLANAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-ETTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE-DLDRAGLKDKK FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-DVELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD-SLENADLKGKK FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-EVTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE-EFNRFGLAGRK FLAV_CLOAB -MKISILYSSKTGKTERVAKLIEEGVKRSGNI-EVKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN---------ISWEMKKWID-ESSEFNLEGKL FLAV_MEGEL --MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-DVESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF-TDLAPKLKGKK 4fxn ---MKIVYWSGTGNTEKMAELIAKGIIESG-K-DVNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI-EEISTKISGKK FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT----LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS-ELDDVDFNGKL FLAV_AZOVI -AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD---ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP-KIEGLDFSGKT 2fcr --KIGIFFSTSTGNTTEVADFIGKTLGAKADAP---IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLYDKLPEVDMKDLP FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP---LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN-TLSEADLTGKT FLAV_ECOLI -AITGIFFGSDTGNTENIAKMIQKQLGKDVAD----VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD-------WDDFFP-TLEEIDFNGKL 3chy --ADKELKFLVVDDFSTMRRIVRNLLKELG----FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG-LELLKTIR--- . ... : . . : 1fx1 VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------- FLAV_DESVH VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG----------------LRIDGDPRAARDDIVGWAHDVRGAI--------------- FLAV_DESGI VGVFGCGDSSYTYF--CGAVDVIEKKAEELGATLVASS----------------LKIDGEPDSAE--VLDWAREVLARV--------------- FLAV_DESSA VSVFGCGDSDYTYF--CGAVDAIEEKLEKMGAVVIGDS----------------LKIDGDPERDE--IVSWGSGIADKI--------------- FLAV_DESDE VAAFASGDQEYEHF--CGAVPAIEERAKELGATIIAEG----------------LKMEGDASNDPEAVASFAEDVLKQL--------------- FLAV_CLOAB GAAFSTANSIAGGS--DIALLTILNHLMVKGMLVYSGGVA----FGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF----------- FLAV_MEGEL VGLFGSYGWGSGE-----WMDAWKQRTEDTGATVIGTA----------------IVN-EMPDNAPECKE-LGEAAAKA---------------- 4fxn VALFGSYGWGDGK-----WMRDFEERMNGYGCVVVETP----------------LIVQNEPDEAEQDCIEFGKKIANI---------------- FLAV_ANASP VAYFGTGDQIGYADNFQDAIGILEEKISQRGGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------ FLAV_AZOVI VALFGLGDQVGYPENYLDALGELYSFFKDRGAKIVGSWSTDGYEFESSEAVV-DGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---- 2fcr VAIFGLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVR-DGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ENTAG VALFGLGDQLNYSKNFVSAMRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL------- FLAV_ECOLI VALFGCGDQEDYAEYFCDALGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA 3chy AD--GAMSALPVL-----MVTAEAKKENIIAAAQAGAS----------------GYV-VKPFTAATLEEKLNKIFEKLGM-------------- . . : . .

Flavodoxin-cheY: Pre-processing (prepro1500) 1fx1 -PKALIVYGSTTGNT-EYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACF FLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-EEFNRFGLAGRKVAAf FLAV_DESVH MPKALIVYGSTTGNT-EYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACf FLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-DSLENADLKGKKVSVf FLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-EDLDRAGLKDKKVGVf 2fcr --KIGIFFSTSTGNT-TEVADFIGKTLGA---KADAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLYDKLPEVDMKDLPVAIF FLAV_AZOVI -AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-PKIEGLDFSGKTVALf FLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-NTLSEADLTGKTVALf FLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DVVTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-SELDDVDFNGKLVAYf FLAV_ECOLI -AITGIFFGSDTGNT-ENIaKMIQKQLGK---DVADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE--------AQCDWDDFF-PTLEEIDFNGKLVALf 4fxn -MK--IVYWSGTGNT-EKMAELIAKGIIESG-KDVNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-EEIS-TKISGKKVALF FLAV_MEGEL MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-TDLA-PKLKGKKVGLf FLAV_CLOAB -MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-DESSEFNLEGKLGAAf 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NVEEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-KTIRADGAMSALPVLM T 1fx1 GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-------- FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-------- FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------- FLAV_DESSA GCGDS-DY-TYFCGA-VDAIEEKLEKMgAVVIGD---------------------SLKIDGD--PE--RDEIVSwGSGIADKI-------- FLAV_DESGI GCGDS-SY-TYFCGA-VDVIEKKAEELgATLVAS---------------------SLKIDGE--PD--SAEVLDwAREVLARV-------- 2fcr GLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKS-VRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_AZOVI GLGDQVGYPENYLDA-LGELYSFFKDRgAKIVGSWSTDGYEFESSEA-VVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-- FLAV_ENTAG GLGDQLNYSKNFVSA-MRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------ FLAV_ANASP GTGDQIGYADNFQDA-IGILEEKISQRgGKTVGYWSTDGYDFNDSKA-LRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------ FLAV_ECOLI GCGDQEDYAEYFCDA-LGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA 4fxn G-----SY-GWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI--------- FLAV_MEGEL G-----SY-GWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNA-PECKElGEAAAKA--------- FLAV_CLOAB STANSIAGGSDIA---LLTILNHLMVKgMLVYSG----GVAFGKPKTHLGYVHINEIQENEDENARIfGERiANkVKQIF----------- 3chy VTAEAKK--ENIIAA---------AQAGAS-------------------------GYVV-----KPFTAATLEEKLNKIFEKLGM------ G Iteration 0 SP= 136944.00 AvSP= 10.675 SId= 4009 AvSId= 0.313

Flavodoxin-cheY: Local Pre-processing (locprepro300) 1fx1 --PKALIVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACF FLAV_DESVH -MPKALIVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACf FLAV_DESSA -MSKSLIVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPL--YDSLENADLKGKKVSVf FLAV_DESGI -MPKALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPL--YEDLDRAGLKDKKVGVf FLAV_DESDE -MSKVLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSL--FEEFNRFGLAGRKVAAf 4fxn --MK--IVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLN-EDILILGCSAMGDEVL------E-ESEFEPF--IEEIS-TKISGKKVALF FLAV_MEGEL -MVE--IVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVAS-KDVILLgCPAMGSEEL------E-DSVVEPF--FTDLA-PKLKGKKVGLf 2fcr ---KIGIFFSTSTGNTTEVADFIGKTLGAKADAPI--DVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFL-YDKLPEVDMKDLPVAIF FLAV_ANASP -SKKIGLFYGTQTGKTESVaEIIRDEFGNDVVTLH--DVSQAEV-TDLNDYQYLIIgCPTWNIGEL--------QSDWEGL--YSELDDVDFNGKLVAYf FLAV_AZOVI --AKIGLFFGSNTGKTRKVaKSIKKRFDDETMSDA-LNVNRVSA-EDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEF--LPKIEGLDFSGKTVALf FLAV_ENTAG -MATIGIFFGSDTGQTRKVaKLIHQKLDG--IADAPLDVRRATR-EQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEF--TNTLSEADLTGKTVALf FLAV_ECOLI --AITGIFFGSDTGNTENIaKMIQKQLGKDVADVH--DIAKSSK-EDLEAYDILLLgIPTWYYGEA--------QCDWDDF--FPTLEEIDFNGKLVALf FLAV_CLOAB --MKISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNLDAVDKKFLQESEGIIFgTPTYYA-----------NISWEMKKWIDESSEFNLEGKLGAAf 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM 1fx1 GCGDS--SY-EYFCGA-VD--AIEEKLKNLGAEIVQD---------------------GLRID--GDPRAARDDIVGWAHDVRGAI-------- FLAV_DESVH GCGDS--SY-EYFCGA-VD--AIEEKLKNLgAEIVQD---------------------GLRID--GDPRAARDDIVGwAHDVRGAI-------- FLAV_DESSA GCGDS--DY-TYFCGA-VD--AIEEKLEKMgAVVIGD---------------------SLKID--GDPE--RDEIVSwGSGIADKI-------- FLAV_DESGI GCGDS--SY-TYFCGA-VD--VIEKKAEELgATLVAS---------------------SLKID--GEPD--SAEVLDwAREVLARV-------- FLAV_DESDE ASGDQ--EY-EHFCGA-VP--AIEERAKELgATIIAE---------------------GLKME--GDASNDPEAVASfAEDVLKQL-------- 4fxn GS------Y-GWGDGKWMR--DFEERMNGYGCVVVET---------------------PLIVQ--NEPDEAEQDCIEFGKKIANI--------- FLAV_MEGEL GS------Y-GWGSGEWMD--AWKQRTEDTgATVIGT---------------------AI-VN--EMPDNA-PECKElGEAAAKA--------- 2fcr GLGDAE-GYPDNFCDA-IE--EIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ANASP GTGDQI-GYADNFQDA-IG--ILEEKISQRgGKTVGYWSTDGYDFNDSKALRN-GKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------ FLAV_AZOVI GLGDQV-GYPENYLDA-LG--ELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-- FLAV_ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------ FLAV_ECOLI GCGDQE-DYAEYFCDA-LG--TIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA FLAV_CLOAB STANSIAGGSDIALLTILNHLMVKgMLVYSGGVAFGKPKTHLGYVH----------INEIQENEDENARIfGERiANkVKQIF----------- 3chy VTAEA---KKENIIAA-----------AQAGAS-------------------------GYVVK-----PFTAATLEEKLNKIFEKLGM------ G

Strategies for multiple sequence alignment Profile pre-processing Secondary structure-induced alignment (Praline-SS) Globalised local alignment Matrix extension Objective: integrate secondary structure information to anchor alignments and avoid errors

Protein structure hierarchical levels VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH PRIMARY STRUCTURE (amino acid sequence) SECONDARY STRUCTURE (helices, strands) QUATERNARY STRUCTURE (oligomers) TERTIARY STRUCTURE (fold)

Why use (predicted) structural information “Structure more conserved than sequence” Many structural protein families (e.g. globins) have family members with very low sequence similarities. For example, globin sequences identities can be as low as 10% while still having an identical fold. This means that you can still observe equivalent secondary structures in homologous proteins even if sequence similarities are extremely low. But you are dependent on the quality of prediction methods. For example, secondary structure prediction is currently at 76% correctness. So, 1 out of 4 predicted amino acids is still incorrect.

Two superposed protein structures with two well-superposed helices The superposed structures lead to close pairs of C atoms that are taken as equivalent – this leads to a structural alignment in which the amino acids corresponding to equivalent C atom pairs are matched Red: well superposed Blue: low match quality C5 anaphylatoxin -- human (PDB code 1kjs) and pig (1c5a)) proteins are superposed

How to combine secondary structure and amino acid information Dynamic programming search matrix Amino acid substitution matrices MDAGSTVILCFV HHHCCCEEEEEE M D A S T I L C G H C E H H C C E E Default

In terms of scoring… So how would you score a profile using this extra information? Same way of scoring as before, but you can use sec. struct. specific substitution scores in various combinations. Where does it fit in? Very important: structure is always more conserved than sequence so secondary structure elements can help anchoring the alignments

Sequences to be aligned Predict secondary structure HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHHHCCCEEEE CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC Secondary structure Align sequences using secondary structure Multiple alignment

Using predicted secondary structure 1fx1 -PK-ALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACF e eeee b ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b ee sss ee ttthhhhtt ttss tt eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACf e eeeeee hhhhhhhhhhhhhhh eeeeee eeeeee hhhhhh eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLYED-LDRAGLKDKKVGVf e eeeeee hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee hhhhhh eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLYDS-LENADLKGKKVSVf eeeeee hhhhhhhhhhhhhh eeeee eeeee hhhhhhh h eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLFEE-FNRFGLAGRKVAAf eeee hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee hhhhhhh hh eeeee 2fcr --K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKDLPVAIF eeeee ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee stt s s s sthhhhhhhtggg tt eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFGND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLYSE-LDDVDFNGKLVAYf eeeee hhhhhhhhhhhh eee hhh hhhhhhheeeeee hhhhhhhhh eeeeee FLAV_ECOLI -AI-TGIFFGSDTGNTENIaKMIQKQLGKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA--------QCDWDDFFPT-LEEIDFNGKLVALf eee hhhhhhhhhhhh eee hhh hhhhhhheeeee hhhhh eeeeee FLAV_AZOVI -AK-IGLFFGSNTGKTRKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFLPK-IEGLDFSGKTVALf eee hhhhhhhhhhhhh hhh hhhhhhheeeee hhhhhhhhh eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFTNT-LSEADLTGKTVALf eeee hhhhhhhhhhhh hhh hhhhhhheeeee hhhhh eeeee 4fxn ----MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVNIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KISGKKVALF eeeee ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee btttb ttthhhhhhh hst t tt eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSVVEPFFTD-LAP-KLKGKKVGLf hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI--------SWEMKKWIDE-SSEFNLEGKLGAAf eee hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee hhhhhhhhh eeeee 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DALNKLQAGGYGFVISD---WNMPNM----------DGLELLKTIRADGAMSALPVLMV tt eeee s hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s sss hhhhhhhhhh ttttt eeee 1fx1 GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI-------- eee s ss sstthhhhhhhhhhhttt ee s eeees gggghhhhhhhhhhhhhh FLAV_DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI-------- eee hhhhhhhhhhhh eeeee eeeee hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVIEKKAEELgATLVAS---------------------SLKIDGE--P--DSAEVLDwAREVLARV-------- eee hhhhhhhhhhhh eeeee hhhhhhhhhhh FLAV_DESSA GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD---------------------SLKIDGD--P--ERDEIVSwGSGIADKI-------- hhhhhhhhhhhh eeeee e eee FLAV_DESDE ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL-------- e hhhhhhhhhhhhhh eeeee ee hhhhhhhhhhh 2fcr GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ eee ttt ttsttthhhhhhhhhhhtt eee b gggs s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_ANASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------ hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhh FLAV_ECOLI GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhhhh FLAV_AZOVI GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-- e hhhhhhhhhhhhhh eeeee hhhhhhhhhhh FLAV_ENTAG GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------ hhhhhhhhhhhhhhh eeee hhhhhhh hhhhhhhhhhhh 4fxn G-----SYGWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI--------- e eesss shhhhhhhhhhhhtt ee s eeees ggghhhhhhhhhhhht FLAV_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNAPE-CKElGEAAAKA--------- hhhhhhhhhhh eeeee eeee h hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfGERiANkV--KQIF-- hhhhhhhhhhhhhh eeeee hhhh hhh hhhhhhhhhhhh h 3chy -----------TAEAKKENIIAAAQAGASGY-------------------------VVK----P-FTAATLEEKLNKIFEKLGM------ ess hhhhhhhhhtt see ees s hhhhhhhhhhhhhhht G

Strategies for multiple sequence alignment not for exam Profile pre-processing Secondary structure-induced alignment Globalised local alignment Matrix extension Objectives: Instead of single amino acid positions, focus on local alignments Consider best local alignment through each cell in DP matrix Try to avoid (early) errors

Globalised local alignment not for exam 1. Local (SW) alignment (M + Po,e) + = 2. Global (NW) alignment (no M or Po,e) Double dynamic programming

Globalised local alignment not for exam 1. 2.

M = BLOSUM62, Po= 0, Pe= 0 not for exam

M = BLOSUM62, Po= 12, Pe= 1 not for exam

M = BLOSUM62, Po= 60, Pe= 5 not for exam

Strategies for multiple sequence alignment Profile pre-processing Secondary structure-induced alignment Globalised local alignment Matrix extension Objective: try to avoid (early) errors

Integrating alignment methods and alignment information with T-Coffee Integrating different pair-wise alignment techniques (NW, SW, ..) Combining different multiple alignment methods (consensus multiple alignment) Combining sequence alignment methods with structural alignment techniques Plug in user knowledge

Matrix extension T-Coffee Tree-based Consistency Objective Function For alignmEnt Evaluation Cedric Notredame (“Bioinformatics for dummies”) Des Higgins Jaap Heringa J. Mol. Biol., 302, 205-217;2000

Using different sources of alignment information Clustal Clustal Structure alignments Dialign Lalign Manual T-Coffee

T-Coffee library system Seq1 AA1 Seq2 AA2 Weight 3 V31 5 L33 10 3 V31 6 L34 14 5 L33 6 R35 21 5 l33 6 I36 35

Matrix extension 2 1 3 1 4 1 3 2 4 2 4 3

Search matrix extension – alignment transitivity

T-Coffee Other sequences Direct alignment

Search matrix extension

T-COFFEE web-interface

3D-COFFEE Computes structural based alignments Structures associated with the sequences are retrieved and the information is used to optimise the MSA More accurate … but for many (many) proteins we do not have the structure!

but..... T-COFFEE (V1.23) multiple sequence alignment Flavodoxin-cheY 1fx1 ----PKALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK----- FLAV_DESVH ---MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK----- FLAV_DESGI ---MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK----- FLAV_DESSA ---MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK----- FLAV_DESDE ---MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK----- 4fxn ------MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE-------ESEFEPF-IEEIS-TKISGKK----- FLAV_MEGEL -----MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE-------DSVVEPF-FTDLA-PKLKGKK----- FLAV_CLOAB ----MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN---------ISWEMKKW-IDESSEFNLEGKL----- 2fcr -----KIGIFFSTSTGNTTEVADFIGKTLGAKA---DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDEFLYDKLPEVDMKDLP----- FLAV_ENTAG ---MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT----- FLAV_ANASP ---SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL--------QSDWEGL-YSELDDVDFNGKL----- FLAV_AZOVI ----AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT----- FLAV_ECOLI ----AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA--------QCDWDDF-FPTLEEIDFNGKL----- 3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-NVE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE--------------LLKTIRADGAMSALPVLMV :. . . : . :: 1fx1 ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI-------- FLAV_DESVH ---------VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG---------------------LRIDGDPRAA--RDDIVGWAHDVRGAI-------- FLAV_DESGI ---------VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS---------------------LKIDGEPDSA----EVLDWAREVLARV-------- FLAV_DESSA ---------VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS---------------------LKIDGDPE----RDEIVSWGSGIADKI-------- FLAV_DESDE ---------VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG---------------------LKMEGDASND--PEAVASFAEDVLKQL-------- 4fxn ---------VALFGS------YGWGDGKWMRDFEERMNGYGCVVVETP---------------------LIVQNEPD--EAEQDCIEFGKKIANI--------- FLAV_MEGEL ---------VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA---------------------IV--NEMP--DNAPECKELGEAAAKA--------- FLAV_CLOAB ---------GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF----------- 2fcr ---------VAIFGLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDG-KFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ENTAG ---------VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL------- FLAV_ANASP ---------VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------ FLAV_AZOVI ---------VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---- FLAV_ECOLI ---------VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA 3chy TAEAKKENIIAAAQAGASGYVVKPFT---AATLEEKLNKIFEKLGM---------------------------------------------------------- .

Multiple alignment methods Multi-dimensional dynamic programming > extension of pairwise sequence alignment. Progressive alignment > incorporates phylogenetic information to guide the alignment process Iterative alignment > correct for problems with progressive alignment by repeatedly realigning subgroups of sequence

Iteration Convergence Limit cycle Divergence Iteration can help in cases where one can learn from the data produced in a preceding step, so that the next step can be taken in a ‘more informed’ way. Convergence Limit cycle Divergence

Pre-profile alignment Alignment consistency Ala131 1 1 1 2 3 A131 L133 C126 4 5 2 2 1 2 3 4 5 3 3 1 2 4 5 4 4 1 2 5 3 5 5 5 1 2 3 4

Flavodoxin-cheY consistency scores (PRALINE prepro=0) Completely consistently aligned amino acids 1fx1 --7899999999999TEYTAETIARQL8776-6657777777777777553799VL999ST97775599989-435566677798998878AQGRKVACF FLAV_DESVH -46788999999999TEYTAETIAREL7777-7757777777777777553799VL999ST97775599989-435566677798998878AQGRKVACF FLAV_DESDE -47899999999999999999999988776695658888777777778763YDAVL999SAW9877789877753556666669777776789GRKVAAF FLAV_DESGI -46788999999999TEGVAEAIAKTL9997-76678888777777887539DVVL999ST987776--9889546667776697776557777888888 FLAV_DESSA 93677799999999999999999999988759765777888888888876399999999STW77765--9999536666677797998779999999999 4fxn -878779999999999999999999776666967567788888888888777999999988777776--9889577788888897773237888888888 FLAV_MEGEL 9776779999999999999999997777766-665666677788899976799999999987777669--887362334466695555455778888888 2fcr --87899999999999TEVADFIGK996541900300000112233355679DLLF99999855312888111224555555407777777888888888 FLAV_ANASP -47899LFYGTQTGKTESVAEIIR9777653922356677777777897779999999999988843--9998555778777899998879999999999 FLAV_ECOLI 997789999GSDTGNTENIAKMIQ8774222922456678889999995569999999999755553----99262225555495777767778999999 FLAV_AZOVI --79IGLFFGSNTGKTRKVAKSIK99887759657577888888999777899999999999877761112222222244555-5555555778999999 FLAV_ENTAG 94789999999999999999999998755229223234555555555555688899999998875521111111133477777-7777777999999999 FLAV_CLOAB -86999ILYSSKTGKTERVAK9997555555057678887888887777765778899998522223--9888342234455597777777777777777 3chy 0122222223333335666665555555222922222222222221112163335555755553222888877674533344493332222222222222 Avrg Consist 8667778888888889999999998776554844455566666666665557888888888766544887666334445566586666556778888888 Conservation 0125538675848969746963946463343045244355446543473516658868567554455000000314365446505575435547747759 1fx1 G888799955555559888888888899777----7777797787787978---555555566776555677777778888799------ FLAV_DESVH G888799955555559888888888899777----7777797787787978---555555566776555677777778888799------ FLAV_DESDE A88878685555555999988888889998879--8777788-98777777--8555555554433245667777777777599------ FLAV_DESGI 87775977755555677777777777777778---88888887667778777775555555555542424667888887777-------- FLAV_DESSA 977768777555556777777777777777767887777777778888-978985555555556536556888888888877-------- 4fxn 867777555555552666666666555555577887767999877777977777665555555555444466666666555798------ FLAV_MEGEL 8577775666666525556777778888888689977888988776558677885544333222222212233223355557-------- 2fcr 877773573333333777766667777765533333333333333322833333333332244444567777777888777633------ FLAV_ANASP 977773775333344777888888777777733334444444444433833333344444444444455577777788777734------ FLAV_ECOLI 977743786444444777788888888888833334444444444444244444555554555775667788888888877734110000 FLAV_AZOVI 97776355333333466666667777777773333444444444444482333355555555555545558888888877772311---- FLAV_ENTAG 977773886555555866666666677666633333333333333322123333344444444455555665566666555582------ FLAV_CLOAB 766627222222212444444444455555587882222222222222111111122222222222344443333333233399------ 3chy 222227222222224111355431113324578-87778997666556877776322222222222322222323344444422------ Avrg Consist 866656564444444666666666666666656665555565555555655565444443444443344455666666666666889999 Conservation 73663057433334163464534444*746710000011010011000000010434744645443225474454448434301000000 Iteration 0 SP= 135136.00 AvSP= 10.473 SId= 3838 AvSId= 0.297 Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)

Flavodoxin-cheY consistency scores (PRALINE prepro=1500) 1fx1 -42444IVYGSTTGNTEYTAETIARQL886666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACF FLAV_DESVH -34444IVYGSTTGNTEYTAETIAREL776666666577777775667888DLVLLGCSTW77766----995476666769-77888788AQGRKVACF FLAV_DESSA -33444IVYGSTTGNTET99999888777655777668888899666686YDIVLFGCSTW77777----996466666779-88SL98ADLKGKKVSVF FLAV_DESGI -34444IVYGSTTGNTEGVA9999999999765555677777886666678DVVLLGCSTW77777----995466666779-88887688888KKVGVF FLAV_DESDE -44777IVFGSSTGNTE988777666655566777778899999777777YDAVLFGCSAW88877----997587777779-8887766777GRKVAAF 4fxn -32222IVYWSGTGNTE8888888876666778888888888NI8888586DILILGCSA888888------8-8888886--66665378ISGKKVALF FLAV_MEGEL -12222IVYWSGTGNTEAMA8888888888888888555555555555485DVILLGCPAMGSE77------572222288--8888755588GKKVGLF 2fcr -41456IFFSTSTGNTTEVA999998865432222765554443244779YDLLFLGAPT944411999-111112454441-8DKLPEVDMKDLPVAIF FLAV_ANASP -00456LFYGTQTGKTESVAEII987755323322427776666623589YQYLIIGCPTW55532--999843678W988899998888888GKLVAYF FLAV_AZOVI -42445LFFGSNTGKTRKVAKSIK87777434333536666665467777YQFLILGTPTLGEG862222222222355558-45666666888KTVALF FLAV_ENTAG -266IGIFFGSDTGQTRKVAKLIHQKL6664664424DVRRATR88888SYPVLLLGTPT88888644444444446WQEF8-8NTLSEADLTGKTVALF FLAV_ECOLI -51114IFFGSDTGNTENIAKMI987743311111555555588355599YDILLLGIPT954431----88355225544--44666666779KLVALF FLAV_CLOAB -63666ILYSSKTGKTERVAKLIE63333333333333333333366LQESEGIIFGTPTY63--6--------66SWE33333333333333GKLGAAF 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM----------DGLEL--LKTIRADGAMSALPVLM Avrg Consist 9334459999999999999999988776655555555666667756667889999999999767658888775555566668967777677889999999 Conservation 0236428675848969746963946463344354312564565414344366588685675544550000003144654460055575345547747759 1fx1 G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899 FLAV_DESVH G98879-89-999877977--7788899999999955--88888-9988887798999777778766553344588776666222266899899 FLAV_DESSA G98878-688688888-88--88999999999999979988888887788889-89-9787777666756645577776666654466899899 FLAV_DESGI G98879-898688888987--788888999GATLV7698899-9998789888-8899787878776663122477788888333276899899 FLAV_DESDE AS8888-68-888888899--9999999999988888-99988888988778897888776668854222212255555555333277999999 4fxn GS2228-228222222222--2388888888888888888888888888888888888887778866765535577555533221288888888 FLAV_MEGEL G4888--28-8888882MD--AWKQRTEDTGATVI77---------------------77222--224444222222244222112-------- 2fcr GLGDA5-8Y5DNFC88-88--8877777777777765444555555555544385555777774465333357799999987555333899899 FLAV_ANASP GTGDQ5-GY5899999-99--99EEKISQRGG99975555544444444433284444466665555555556666676666433333899899 FLAV_AZOVI GLGDQ5-885777555-55--55555788888888555555555555555554855555555555666555555888855555544442--288 FLAV_ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG8888EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE88842242688688 FLAV_ECOLI GC99549784688888987997777777778888855444444444444444114444777774455775567788888887433322100100 FLAV_CLOAB STANS6366663333333333336666666666666666663333363366336663333336EDENARIFGERIANKVKQI333333666666 3chy VTAEA---KKENIIAA-----------AQAGAS-------------------------GYVVK-----PFTAATLEEKLNKIFEKLGM------ Avrg Consist 9988779787777777777997788888888888866777777777767766677777676667766655455577776666433355788788 Conservation 746640037154545706300354534444*745753000001010010000000010683760144442335574454448434301000000 Iteration 0 SP= 136702.00 AvSP= 10.654 SId= 3955 AvSId= 0.308 Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)

Consistency iteration Pre-profiles Multiple alignment positional consistency scores

Pre-profile update iteration Pre-profiles Multiple alignment

Iterate similarity matrix, guide tree and MSA 1 Score 1-2 2 1 Score 1-3 3 4 5 Score 4-5 Similarity matrix Scores This way of iterating was already implemented in 1984 by Hogeweg and Hesper 5×5 Guide tree Multiple alignment

Secondary structure-induced alignment

PRALINE Using secondary structure for alignment Dynamic programming search matrix Amino acid exchange weights matrices MDAGSTVILCFV HHHCCCEEEEEE M D A S T I L C G H C E H H C C E E Default

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Is the initial SS prediction good enough?

MUSCLE Edgar 2004

PRALINE and MUSCLE method PRALINE and MUSCLE use almost the same formalism to compare two profiles: MUSCLE: PRALINE: The difference is the position of the log in the above equations: Edgar calls the Muscle scoring scheme “Log-expectation score (LE)”

So what do we do ? A single shot for a good alignment without thinking: MUSCLE, T-COFFEE, PROBCONS (maybe POA) If you want to experiment with making alignments for a given sequence set: PRALINE Profile pre-processing Iteration Secondary structure-induced alignment Globalised local alignment There is no single method that always generates the best alignment Therefore best is to use more than one method: e.g. include Dialign2 (local)

Recap Weighting schemes to use information from all sequences right from the start during the progressive MSA protocol: Profile pre-processing (global/local) (PRALINE) Matrix extension (well balanced scheme) (T-Coffee) Smoothing alignment signals: globalised local alignment (PRALINE) Consistency based mixing of local and global alignment (T-Coffee) Using additional information: secondary structure driven alignment (PRALINE) Iterative schemes to alleviate the ‘greediness’ of the progressive MSA protocol: Profile pre-processing iteration (PRALINE) secondary structure driven iteration (PRALINE) ‘classical’ distance matrix iteration Binary cutting of guide tree and realignment of groups (MUSCLE)

References Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem. 23, 341-364. Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205-217. Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26(5), 459-477. Simossis, V.A., Kleinjung, J. and Heringa, J. (2005) Homology-extended sequence alignment. Nucleic Acids Res. 33(3):816-824.