Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to bioinformatics 2007 Lecture 10

Similar presentations


Presentation on theme: "Introduction to bioinformatics 2007 Lecture 10"— Presentation transcript:

1 Introduction to bioinformatics 2007 Lecture 10
G A V B M S U Introduction to bioinformatics 2007 Lecture 10 Multiple Sequence Alignment (II)

2 Progressive multiple alignment
1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Scores Similarity matrix 5×5 Scores to distances Iteration possibilities Guide tree Multiple alignment

3 Progressive alignment strategy
Perform pair-wise alignments of all of the sequences (all against all; e.g. make N(N-1)/2 alignments); Use the alignment scores to make a similarity (or distance) matrix Use that matrix to produce a guide tree; Align the sequences successively, guided by the order and relationships indicated by the tree (N-1 alignment steps).

4 Progressive alignment strategy
Methods: Biopat (Hogeweg and Hesper first integrated method ever) MULTAL (Taylor 1987) DIALIGN (1&2, Morgenstern 1996) PRRP (Gotoh 1996) ClustalW (Thompson et al 1994) PRALINE (Heringa 1999) T-Coffee (Notredame 2000) POA (Lee 2002) MUSCLE (Edgar 2004) PROBSCONS (Do, 2005)

5 Pair-wise alignment quality versus sequence identity (Vogt et al
Pair-wise alignment quality versus sequence identity (Vogt et al., JMB 249, ,1995)

6 Flavodoxin fold: aligning 13 Flavodoxins + cheY

7 Flavodoxin-cheY NJ tree

8 Flavodoxin fold: helix-beta-helix
Flavodoxins are mainly bacterial proteins (prokaryotes and cyanobacteria) but also found in some eukaryotic algae. They are electron-transfer proteins that function in various electron transport systems. They have a alpha/beta fold: they consist of three layers with two alpha-helical layers sandwiching a 5-stranded parallel beta-sheet.

9 Flavodoxin family - TOPS diagrams
The basic topology of the flavodoxin fold is given below, the other four TOPS diagrams show flavodoxin folds with local insertions of secondary structure elements. 4 3 2 5 4 3 1 2 -helix -strand 5 1

10 Flavodoxin-cheY NJ tree

11 Flavodoxin-cheY: Pre-processing (prepro1500)

12 Protein structure hierarchical levels
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH PRIMARY STRUCTURE (amino acid sequence) SECONDARY STRUCTURE (helices, strands) QUATERNARY STRUCTURE (oligomers) TERTIARY STRUCTURE (fold)

13 Clustal, ClustalW, ClustalX
CLUSTAL W/X (Thompson et al., 1994) uses Neighbour Joining (NJ) algorithm (Saitou and Nei, 1984), widely used in phylogenetic analysis, to construct a guide tree (see lecture on phylogenetic methods). Sequence blocks are represented by profile, in which the individual sequences are additionally weighted according to the branch lengths in the NJ tree. Further carefully crafted heuristics include: (i) local gap penalties (ii) automatic selection of the amino acid substitution matrix, (iii) automatic gap penalty adjustment (iv) mechanism to delay alignment of sequences that appear to be distant at the time they are considered. CLUSTAL (W/X) does not allow iteration (Hogeweg and Hesper, 1984; Corpet, 1988, Gotoh, 1996; Heringa, 1999, 2002)

14 ClustalW web-interface

15 CLUSTAL X (1.64b) multiple sequence alignment Flavodoxin-cheY
1fx PKALIVYGSTTGNTEYTAETIARQLANAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-ETTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE-DLDRAGLKDKK FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-DVELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD-SLENADLKGKK FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-EVTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE-EFNRFGLAGRK FLAV_CLOAB MKISILYSSKTGKTERVAKLIEEGVKRSGNI-EVKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN ISWEMKKWID-ESSEFNLEGKL FLAV_MEGEL MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-DVESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF-TDLAPKLKGKK 4fxn MKIVYWSGTGNTEKMAELIAKGIIESG-K-DVNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI-EEISTKISGKK FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT----LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS-ELDDVDFNGKL FLAV_AZOVI AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD---ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP-KIEGLDFSGKT 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAP---IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLYDKLPEVDMKDLP FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP---LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN-TLSEADLTGKT FLAV_ECOLI AITGIFFGSDTGNTENIAKMIQKQLGKDVAD----VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD WDDFFP-TLEEIDFNGKL 3chy ADKELKFLVVDDFSTMRRIVRNLLKELG----FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG-LELLKTIR--- : : 1fx VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG LRIDGDPRAARDDIVGWAHDVRGAI FLAV_DESVH VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG LRIDGDPRAARDDIVGWAHDVRGAI FLAV_DESGI VGVFGCGDSSYTYF--CGAVDVIEKKAEELGATLVASS LKIDGEPDSAE--VLDWAREVLARV FLAV_DESSA VSVFGCGDSDYTYF--CGAVDAIEEKLEKMGAVVIGDS LKIDGDPERDE--IVSWGSGIADKI FLAV_DESDE VAAFASGDQEYEHF--CGAVPAIEERAKELGATIIAEG LKMEGDASNDPEAVASFAEDVLKQL FLAV_CLOAB GAAFSTANSIAGGS--DIALLTILNHLMVKGMLVYSGGVA----FGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF FLAV_MEGEL VGLFGSYGWGSGE-----WMDAWKQRTEDTGATVIGTA IVN-EMPDNAPECKE-LGEAAAKA 4fxn VALFGSYGWGDGK-----WMRDFEERMNGYGCVVVETP LIVQNEPDEAEQDCIEFGKKIANI FLAV_ANASP VAYFGTGDQIGYADNFQDAIGILEEKISQRGGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------ FLAV_AZOVI VALFGLGDQVGYPENYLDALGELYSFFKDRGAKIVGSWSTDGYEFESSEAVV-DGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---- 2fcr VAIFGLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVR-DGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ENTAG VALFGLGDQLNYSKNFVSAMRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL FLAV_ECOLI VALFGCGDQEDYAEYFCDALGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA 3chy AD--GAMSALPVL-----MVTAEAKKENIIAAAQAGAS GYV-VKPFTAATLEEKLNKIFEKLGM : The secondary structures of 4 sequences are known and can be used to asses the alignment (red is -strand, blue is -helix)

16 There are problems … Accuracy is very important !!!!
Progressive multiple alignment is a greedy strategy: Alignment errors during the construction of the MSA cannot be repaired anymore and these errors are propagated into later progressive steps. Comparisons of sequences at early steps during progressive alignment cannot make use of information from other sequences. It is only later during the alignment progression that more information from other sequences (e.g. through profile representation) becomes employed in the alignment steps. MAIN PROBLMES: 1) The progressive alignment protocol suffers from its greediness and is not able to revise any of the alignments made earlier, so that any alignment errors during the construction of the MSA cannot be repaired anymore ) The comparisons of sequences at early steps during progressive alignments cannot make use of information from other sequences, so that proper positional information required for correct matching is not available at early stages ) It is only later during the alignment progression that more information from other sequences (e.g. through profile representation) becomes employed in the alignment steps, but quite possibly after misalignment has already taken place.

17 “Once a gap, always a gap”
Progressive multiple alignment “Once a gap, always a gap” Feng & Doolittle, 1987

18 Additional strategies for multiple sequence alignment
Profile pre-processing (Praline) Secondary structure-induced alignment Globalised local alignment Matrix extension Objective: try to avoid (early) errors

19 PRALINE web-interface

20 Profile pre-processing
1 Score 1-2 2 1 Score 1-3 3 4 5 Score 4-5 1 Key Sequence 2 1 3 Pre-alignment 4 5 Master-slave (N-to-1) alignment A C D . Y 1 Pre-profile Pi Px

21 Pre-profile generation
1 Score 1-2 2 1 Score 1-3 3 4 Score 4-5 5 Cut-off Pre-alignments Pre-profiles 1 1 A C D . Y 2 3 4 5 2 2 A C D . Y 1 3 4 5 5 A C D . Y 5 1 2 3 4

22 Pre-profile alignment
Pre-profiles 1 A C D . Y 2 A C D . Y Final alignment 3 A C D . Y 1 2 3 4 5 4 A C D . Y 5 A C D . Y

23 Pre-profile alignment
1 1 2 3 4 5 2 2 1 3 4 Final alignment 5 3 3 1 1 2 2 4 3 5 4 4 5 4 1 2 3 5 5 5 1 2 3 4

24 Pre-profile alignment Alignment consistency
Ala131 1 1 1 2 3 A131 L133 C126 4 5 2 2 1 2 3 4 5 3 3 1 2 4 5 4 4 1 2 5 3 5 5 5 1 2 3 4

25 PRALINE pre-profile generation
Idea: use the information from all query sequences to make a pre-profile for each query sequence that contains information from other sequences You can use all sequences in each pre-profile, or use only those sequences that will probably align ‘correctly’. Incorrectly aligned sequences in the pre-profiles will increase the noise level. Select using alignment score: only allow sequences in pre-profiles if their alignment with the score higher than a given threshold value. In PRALINE, this threshold is given as prepro=1500 (alignment score threshold value is 1500 – see next two slides)

26 Reliable sequences for pre-profiles
The curve each time gives the number of pairwise alignments (y) scoring less than x. The range 1500<x<1800 shows a flat section of the curve that can serve as a natural cut-off point for admitting sequences into the pre-alignment blocks

27 Global pre-processing (prepro0)
Preprocessed profile for sequence 2: 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKDLPVAIFGLGDAEGYPD 1fx KALIVYGSTTGNTEYTAETIARQL-ANAGYEVDSRDAASVEAFEGFDLVLLGCSTW--GDD---SIELQDDFLFDSLEETGAQGRKVACFGCGDS-SY-E 4fxn MKIVYWSGTGNTEKMAELIAKGISGKDVNTINVSDVNIDELLNE-DILILGC---SAMGDEVLEESEFEPFIEEISTKISGKKVALGSYGWGDGKWMRD FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDVSEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNGKLVAYfGTGDQIGYAD FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKRFDTMSDA-LNVNRVS-AEDFAQYQFLILgTPTLGPGLSSDCENESWEEFL-PKIEGLDFSGKTVALfGLGDQVGYPE FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEVKDAVDKKFLQESEGIIFgTPTYYANISWEMK--KW----IDESSEFNLEGKLGAAfSTANAGGSDI FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAA-GGHEVTLLNAADASALADYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAGRKVAAfASGDQE-Y-E FLAV_DESGI KALIVYGSTTGNTEGVaEAIAKTLNSEGTTVVNVADVTAPGLAEGYDVVLLgCSTW--GDDEIELQEDFVP-LYEDLDRAGLKDKKVGVfGCGDS-SY-T FLAV_DESSA KSLIVYGSTTGNTETAaEYVAEAFENK-EIDVELKNVTDVSVANGYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKGKKVSVfGCGDSD-Y-T FLAV_DESVH KALIVYGSTTGNTEYTaETIAREL-ADAGYEVDSRDAASVEAFEGFDLVLLgCSTW--GDD---SIELQDDFLFDSLEETGAQGRKVACfGCGDS-SY-E FLAV_ECOLI AIGIFFGSDTGNTENIaKMIQKQLG--KDV-ADVHDISSKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNGKLVALfGCGDQEDYAE FLAV_ENTAG TIGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDVRRATREQFL-SYPVLLLgTPTLGDGLPGVEAGSSWQEFT-NTLSEADLTGKTVALfGLGDQLNYSK FLAV_MEGEL MVEIVYWSGTGNTEAMaNEIEAAVAAGADVSVRFED-TNVDDVASKDVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKGKKVGLfGYGWGSG--- 3chy KELKFLVVDDFSTRRIVRNLLKELGFNEEAEDGVDALNKLQA-GGYGFVI---SDWNM---PNMDGL---ELLKTIRADGAMSALPVLMV---TAEAKKE 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV 1fx YFCGAVDAIEEKLKNLGA EIVQD----GLRID--GDPRAARDDIVGWAHDVRGAI-- 4fxn FEERMNG-YGCVVVE--TPLIVQNEPD----EAE QDCIEFGKKIANI FLAV_ANASP NFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL FLAV_AZOVI NYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGL FLAV_CLOAB ALLTILNHVKgMLVYSGG--VAFGKPKTHGYVHINEIQENE------D-ENARI-fGERiANkVKQIF----- FLAV_DESDE HFCGAVPAI-----EERAKELg ATIIAEG--LKMEGDASND--P--EAVASfAEDVLKQL-- FLAV_DESGI YFCGAVDVIEKKAEELgATLVA SSLKI-DGE PDSAEVLDwAREVLARV-- FLAV_DESSA YFCGAVDAIEEKLEKMgAVVIGDSLKIDGDPERDEIVSwGS--G-----IADKI FLAV_DESVH YFCGAVDAIEEKLKNLgA EIVQD----GLRID--GDPRAARDDIVGwAHDVRGAI-- FLAV_ECOLI YFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDHFVGLAID--EDRQPTAERVEKwVKQISEELHL FLAV_ENTAG NFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALENNEFVGLPLDQENQYDLTEERIDSwLEKL--KPAV FLAV_MEGEL EWMDAWKQRTE---DTgATVIG TAIVNE-----MP-----DNAP-ECKElG--EAAAKA--- 3chy NIIAA AQAGAS--GY VVK--PFTAATLE EK-----LNKIFEKLGM Iteration -1 SP= AvSP= SId= AvSId= 0.315

28 Global pre-processing (prepro0)
Preprocessed profile for sequence 3: 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFE 1fx ALIVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACFGSYEYFCGA-VDAIE 2fcr IGIFFSTSTGNTTEVADFIGKTL--GAKADAPIDVDDVTDPQALKDDLLFLGANTGADTERSGTSWDEFLYDKLPEVDMKDLPV-AIFGLGDAEGYPDFC FLAV_ANASP IGLFYGTQTGKTESVaEIIRD---EFGNDVVTLDVSQAEVTDLNDYQYLIIgCPTWNIGEL-QSDWEGLYSELDVDFNGKLVAYfGTIGYADNDAIGILE FLAV_AZOVI IGLFFGSNTGKTRKVaKSIKKRFDDETMS-DALNVNRVSAEDFAQYQFLILgTPTLGEGELENESWEEFLPKIGLDFSGKTVALfGQVGYPEGELYSFFK FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMNLDAVDKKFLQESEGIIFgTPTYYANI--SWEMKKWIDESSENLEGKLGAAfSTAGGSDIALLTILN FLAV_DESDE VLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLNAADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAAfAS---GDQEYVPAIE FLAV_DESGI ALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGVfGSYTYFCGA-VDVIE FLAV_DESSA MSIVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSVfGDYTYFCGA-VDAIE FLAV_DESVH ALIVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACfGSYEYFCGA-VDAIE FLAV_ECOLI TGIFFGSDTGNTENIaKMIQK---QLGKDVADVDIAKSSKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVALfGDYAFCDAGTIRDIE FLAV_ENTAG IGIFFGSDTGQTRKVaKLIHQK-LDGIADA-PLDVRRATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVALfGNYSKNFVSAMRILY FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGLfGSYGWGSGEWMDAWK 3chy DKELKFLVVDDFSTMRRIVRNLLKELG--FNNVEEAEDGVD-ALNK-LQAGGYGVISDWNMPNMDGLELLKTI--RADGAMSALPVLMVTAEAKKENIIA 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI 1fx EKLKNLGAEIVQDGLRIDGDPRAARDDIVGWAHDVRGA 2fcr DAIEEHDCFAKQKPVGFSNPDDESKNDQIPMEKRVAGW FLAV_ANASP EKISGYGSKALRNGKFVGLALDEDNQDLTDDRIKVAQL FLAV_AZOVI DRTDGYEAVVVGLALDLDNQSGKTDERVAAwLAQIAPE FLAV_CLOAB HLMKgYGGVAFGKPYVHINEIQENEDENARfGERiANk FLAV_DESDE ERAKELgATIIAEGLKMEGDASNDPEAVASfAEDVLKQ FLAV_DESGI KKAEELgATLVASSLKIDGEPDSAE--VLDwAREVARV FLAV_DESSA EKLEKMgAVVIGDSLKIDGDPERDE--IVSwGSGIADI FLAV_DESVH EKLKNLgAEIVQDGLRIDGDPRAARDDIVGwAHDVRGA FLAV_ECOLI PRTAGYGLAFVGLAIDEDRQPELTAERVEKwVKQISEE FLAV_ENTAG DLVIARgCVVGNWPLLENNEPDQENQDLTELEKKPAVL FLAV_MEGEL QRTEDTgATVIGT-AIVNEMPDNA-PECKElGEAAAKA 3chy AAQAGASGYVVK-PFTAATLEEKLNKIFEKLGM----- Iteration -1 SP= AvSP= SId= AvSId= 0.273

29 Reliable sequences for pre-profiles

30 Pre-profiles (prepro1500)
2

31 Pre-profiles (prepro1500)
13 14

32 Local pre-processing Local alignments are calculated from high to low scoring – each time the sequence parts corresponding to a selected local alignment are blocked such that a next local alignment has to emerge before or after the earlier selected one – this preserves co-linearity of the local alignments and assocaited sequence fragments in the pre-alignments

33 Local pre-processing (locprepro0)
Preprocessed profile for sequence 2: 2fcr 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGADTERSGTSWDEFLYDKLPEVDMKDLPVAIFGLGDAEGYPD 1fx IVYGSTTGNTEYTAETIARQL---ANAGYEVDDAASVEAFEGFDLVLLGCSTW--GDDSELQ----DDFLFDSLEETGAQGRKVACFGCGDS-SY-E 4fxn KI-VYWS-GTGNTEKMAELIAKGIGKDVNT-INVSDVNIDELLNE-DILILGCSA--MGDEVEES--EFEPF----IEEISTKGKKVALFGWGDGKGYG- FLAV_ANASP KIGLFYGTQTGKTESVaEIIRDEFGNDVVTLHDVSEVTD---LNDYQYLIIgCPTWNIG---ELQ-SDW-EGLYSELDDVDFNGKLVAYfGTGDQIGYAD FLAV_AZOVI KIGLFFGSNTGKTRKVaKSIKKTM---SDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGSDCENE--SWEEFL-PKIEGLDFSGKTVALfGLGDQVGYPE FLAV_CLOAB KISILYSSKTGKTERVaKLIEE--GVKRSGNIEVKDAVDKKFLQESEGIIFgTPTY YANISWEKWI-DESSEFNLEGKLGAAfSTANSAGGSD FLAV_DESDE KVLIVFGSSTGNTESIaQKLEELIAAAADA--SAENLAD-----GYDAVLFgCSAWGM-EDLEMQ----DDFLFEEFNRFGLAGRKVAAfASGDQE-Y-E FLAV_DESGI IVYGSTTGNTEGVaEAIAKTLNSEGTTVVNVADVTAPGLAEGYDVVLLgCSTW--GDDIELQ----EDFLYEDLDRAGLKDKKVGVfGCGDS-SY-T FLAV_DESSA IVYGSTTGNTETAaEYVAEAFENK---EIDVENVTD-VSVADYDIVLFgCSTW--G---EEEIELQDDFLYDSLENADLKGKKVSVfGCGDSD-Y-T FLAV_DESVH IVYGSTTGNTEYTaETIAREL---ADAGYEVDDAASVEAFEGFDLVLLgCSTW--GDDSELQ----DDFLFDSLEETGAQGRKVACfGCGDS-SY-E FLAV_ECOLI GIFFGSDTGNTENIaKMIQKQLG-K-----DVADVHDKEDLEAYDILLLgIPTWYYG----EAQCDWDDF-FPTLEEIDFNGKLVALfGCGDQEDYAE FLAV_ENTAG IGIFFGSDTGQTRKVaKLIHQKLDGIADAPLDVRRATREQFL-SYPVLLLgTPT--LG-DGELPGVSWQEFT-NTLSEADLTGKTVALfGLGDQLNYSK FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEKAAGADVESDTNVDDV----ASK--DVILLgCPA--MGSE-ELEDSVVEPFFTDLAPK--LKGKKVGLfGYGWGSG--- 3chy ADKELKFLVVDDFIVRNL----LKEL-----GFNNVEEAED 2fcr NFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV 1fx YFCDAIEE------K--LKNLG AEIVQD----GLRID--GD--PRAARIVGWAHDV...... 4fxn CVVVE TPLIVQNPDE---AEQDCIEFGK FLAV_ANASP NFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL FLAV_AZOVI NYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGL FLAV_CLOAB IALLTIH-LMVKSGG--VAFGKPKTHGYVHINEIQENE------D-ENARI-fGERiANkVKQI...... FLAV_DESDE HFCGAVPAI-----EERAKELg ATIIAEGKMEG---DASND--P--EAVASfAEDVLKQ... FLAV_DESGI YFCGAVDVIEKKAEELgATLVASSEPD------SAEVLD FLAV_DESSA YFCGAVDAIEEKLEKMgAVVIGDSLKIDGDPERDEIVSwGS--G-----IADKI FLAV_DESVH YFCDAIEE------K--LKNLg AEIVQD----GLRID--GD--PRAARIVGwAHDV...... FLAV_ECOLI YFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDHFVGLAID--EDRQPTAERVEKwVKQISEE... FLAV_ENTAG NFVSAMRILYDLVIARgACVVG--NPEGYKFSFSAALENNEFVGLPLDQENQYDLTEERIDSwLEAVL..... FLAV_MEGEL EWMDAWKQTED----TgATVIGTANPDN 3chy G-VDALNKLQ AGGYGFSNMPNMDLELLKTIRDGAMSALPVLMVTAEAKKENIIAGYVAATLEE...

34 Local pre-processing (locprepro0)
Preprocessed profile for sequence 3: 4fxn 4fxn MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFE 1fx IVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGDLVLLGCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACFGC---GDSSYVDAIE 2fcr KIIFFSSTGNTTEVADFIGKTL---GAKADAIDVDDVTDPQALKDDLLFLGAPTTGADT-ERSSWDEFLPEVDMK--DLPVAIF---GLGDAE------ FLAV_ANASP LFYGTQTGKTESVaEIIRD---EFGNDVVTLDVSQAEVTDLNDYQYLIIgCPTIGE--L-QSDWEGLYSELDVDFNGKLVAYfGTIGYADGKWSTDFN FLAV_AZOVI LFFGSNTGKTRKVaKSIKKRFDETMSD--ALNVNRVSAEDFAQYQFLILgTPTLGEGELNESEFLPKIEGLD--FSGKTVALfGQVGYGEGSWSTD-- FLAV_CLOAB MKILYSSKTGKTERVaKLIEEGVKRSGNEVKTMNLDAVD-KKFLQEEGIIFgTPTMKKWIDESSEFN--LEAfSTANSGSDIALLGGVAFGKPK------ FLAV_DESDE IVFGSSTGNTEKLEELIAAG----GHEVTLLNAADASAENLADYDAVLFgCSAWGMEDLEQDDFLSLFEEFNRGLAGRKVAAfAS---GDQEY-EHFE FLAV_DESGI IVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAGYDVVLLgCSTWGDDEIEQEDFVPLYEDLDAGLKDKKVGVfGC---GDSSYTYDIE FLAV_DESSA IVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNYDIVLFgCSTWGEEEIEQDDFIPLYDSLNADLKGKKVSVfGC---GDS----DYE FLAV_DESVH IVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGDLVLLgCSTWGDDSIEQDDFIPLFDSLETGAQGRKVACfGC---GDSSYVDAIE FLAV_ECOLI IFFGSDTGNTENIaKMIQK---QLGKDV--ADVHDISKEDLEAYDILLLgIPTYGEAQCDWDDFFPTLEEID--FNGKLVALfGC---GD---QEDYA FLAV_ENTAG IFFGSDTGQTRKVaKLIHQGIADAPLDVRR-----ATREQFLSYPVLLLgTPTLGDELVEASQYDSWQEFTNTDLTGKTVALf---GLGDQNYSKNFV FLAV_MEGEL VEIVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVASKDVILLgCPAMGSEELEDSVVEPFFTDLAPKLKGKKVGLfGSYGWGSGEWMDAWK 3chy RIV......N...LKEL---GFVEEAEDVDALNISDPNMDELLRADVLMVTAEAKKENIIAAAQVKPFLEEKLNKIFEK 4fxn ERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI 1fx EKLKNLGAEIVQDGLRIDGDPRAARDDIV 2fcr GYPCDAIEKPVGFSN-PDDEESKSVRDGK..... FLAV_ANASP DSRNGVGLALDE-----DNQSDLTD-DRIEFG...... FLAV_AZOVI GYEAVVVGLALDLDNQTDELAQIAPEFG...... FLAV_CLOAB THL-GY----VHINEIQENEDENAR---I-fGERiAN. FLAV_DESDE ERAKELgATIIAEGLKMENDP-EAAEDVLK FLAV_DESGI KKAEELgATLVASSLKIDGEPDSAE--VLDwAREVARV FLAV_DESSA EKLEKMgAVVIGDSLKIDGDPERDE--IVSwGSGIAD. FLAV_DESVH EKLKNLgAEIVQDGLRIDGDPRAARDDIV FLAV_ECOLI E----YFCDALGTDII---EP FLAV_ENTAG SAMRg-ACVVGNWPLLENNEPDQENQDLTE FLAV_MEGEL QRTEDTgATVIGTAIV--NEPDNA-PECKElGE..... 3chy

35 CLUSTAL X (1.64b) multiple sequence alignment Flavodoxin-cheY
1fx PKALIVYGSTTGNTEYTAETIARQLANAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-Y-EVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPLFD-SLEETGAQGRK FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-M-ETTVVNVADVTAPGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPLYE-DLDRAGLKDKK FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-I-DVELKNVTDVSVADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPLYD-SLENADLKGKK FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-H-EVTLLNAADASAENLADGYDAVLFGCSAWGMEDLE------MQDDFLSLFE-EFNRFGLAGRK FLAV_CLOAB MKISILYSSKTGKTERVAKLIEEGVKRSGNI-EVKTMNLDAVDKKFLQE-SEGIIFGTPTYYAN ISWEMKKWID-ESSEFNLEGKL FLAV_MEGEL MVEIVYWSGTGNTEAMANEIEAAVKAAG-A-DVESVRFEDTNVDDVAS-KDVILLGCPAMGSE--E------LEDSVVEPFF-TDLAPKLKGKK 4fxn MKIVYWSGTGNTEKMAELIAKGIIESG-K-DVNTINVSDVNIDELLN-EDILILGCSAMGDE--V------LEESEFEPFI-EEISTKISGKK FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDVVT----LHDVSQAEVTDLND-YQYLIIGCPTWNIGELQ---SD-----WEGLYS-ELDDVDFNGKL FLAV_AZOVI AKIGLFFGSNTGKTRKVAKSIKKRFDDETMSD---ALNVNRVSAEDFAQ-YQFLILGTPTLGEGELPGLSSDCENESWEEFLP-KIEGLDFSGKT 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAP---IDVDDVTDPQALKD-YDLLFLGAPTWNTGADTERSGT----SWDEFLYDKLPEVDMKDLP FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIADAP---LDVRRATREQFLS--YPVLLLGTPTLGDGELPGVEAGSQYDSWQEFTN-TLSEADLTGKT FLAV_ECOLI AITGIFFGSDTGNTENIAKMIQKQLGKDVAD----VHDIAKSSKEDLEA-YDILLLGIPTWYYGEAQ-CD WDDFFP-TLEEIDFNGKL 3chy ADKELKFLVVDDFSTMRRIVRNLLKELG----FNNVEEAEDGVDALN------KLQAGGYGFV--I------SDWNMPNMDG-LELLKTIR--- : : 1fx VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG LRIDGDPRAARDDIVGWAHDVRGAI FLAV_DESVH VACFGCGDSSYEYF--CGAVDAIEEKLKNLGAEIVQDG LRIDGDPRAARDDIVGWAHDVRGAI FLAV_DESGI VGVFGCGDSSYTYF--CGAVDVIEKKAEELGATLVASS LKIDGEPDSAE--VLDWAREVLARV FLAV_DESSA VSVFGCGDSDYTYF--CGAVDAIEEKLEKMGAVVIGDS LKIDGDPERDE--IVSWGSGIADKI FLAV_DESDE VAAFASGDQEYEHF--CGAVPAIEERAKELGATIIAEG LKMEGDASNDPEAVASFAEDVLKQL FLAV_CLOAB GAAFSTANSIAGGS--DIALLTILNHLMVKGMLVYSGGVA----FGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF FLAV_MEGEL VGLFGSYGWGSGE-----WMDAWKQRTEDTGATVIGTA IVN-EMPDNAPECKE-LGEAAAKA 4fxn VALFGSYGWGDGK-----WMRDFEERMNGYGCVVVETP LIVQNEPDEAEQDCIEFGKKIANI FLAV_ANASP VAYFGTGDQIGYADNFQDAIGILEEKISQRGGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------ FLAV_AZOVI VALFGLGDQVGYPENYLDALGELYSFFKDRGAKIVGSWSTDGYEFESSEAVV-DGKFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---- 2fcr VAIFGLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVR-DGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ENTAG VALFGLGDQLNYSKNFVSAMRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL FLAV_ECOLI VALFGCGDQEDYAEYFCDALGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA 3chy AD--GAMSALPVL-----MVTAEAKKENIIAAAQAGAS GYV-VKPFTAATLEEKLNKIFEKLGM :

36 Flavodoxin-cheY: Pre-processing (prepro1500)
1fx PKALIVYGSTTGNT-EYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACF FLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-EEFNRFGLAGRKVAAf FLAV_DESVH MPKALIVYGSTTGNT-EYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACf FLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-DSLENADLKGKKVSVf FLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-EDLDRAGLKDKKVGVf 2fcr KIGIFFSTSTGNT-TEVADFIGKTLGA---KADAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLYDKLPEVDMKDLPVAIF FLAV_AZOVI AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-PKIEGLDFSGKTVALf FLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-NTLSEADLTGKTVALf FLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DVVTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL QSDWEGLY-SELDDVDFNGKLVAYf FLAV_ECOLI AITGIFFGSDTGNT-ENIaKMIQKQLGK---DVADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE AQCDWDDFF-PTLEEIDFNGKLVALf 4fxn MK--IVYWSGTGNT-EKMAELIAKGIIESG-KDVNTINVSDVNIDELL-NEDILILGCSAMGDEVL EESEFEPFI-EEIS-TKISGKKVALF FLAV_MEGEL MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL EDSVVEPFF-TDLA-PKLKGKKVGLf FLAV_CLOAB MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQESEGIIFgTPTYYAN ISWEMKKWI-DESSEFNLEGKLGAAf 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NVEEAEDGVDALNKLQAGGYGFVI---SDWNMPNM DGLELL-KTIRADGAMSALPVLM T 1fx GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD GLRIDGD--PRAARDDIVGWAHDVRGAI FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE GLKMEGD--ASNDPEAVASfAEDVLKQL FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD GLRIDGD--PRAARDDIVGwAHDVRGAI FLAV_DESSA GCGDS-DY-TYFCGA-VDAIEEKLEKMgAVVIGD SLKIDGD--PE--RDEIVSwGSGIADKI FLAV_DESGI GCGDS-SY-TYFCGA-VDVIEKKAEELgATLVAS SLKIDGE--PD--SAEVLDwAREVLARV 2fcr GLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKS-VRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_AZOVI GLGDQVGYPENYLDA-LGELYSFFKDRgAKIVGSWSTDGYEFESSEA-VVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-- FLAV_ENTAG GLGDQLNYSKNFVSA-MRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------ FLAV_ANASP GTGDQIGYADNFQDA-IGILEEKISQRgGKTVGYWSTDGYDFNDSKA-LRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------ FLAV_ECOLI GCGDQEDYAEYFCDA-LGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA 4fxn G-----SY-GWGDGKWMRDFEERMNGYGCVVVET PLIVQNE--PDEAEQDCIEFGKKIANI FLAV_MEGEL G-----SY-GWGSGEWMDAWKQRTEDTgATVIGT AIVNEM--PDNA-PECKElGEAAAKA FLAV_CLOAB STANSIAGGSDIA---LLTILNHLMVKgMLVYSG----GVAFGKPKTHLGYVHINEIQENEDENARIfGERiANkVKQIF 3chy VTAEAKK--ENIIAA AQAGAS GYVV-----KPFTAATLEEKLNKIFEKLGM------ G Iteration 0 SP= AvSP= SId= AvSId= 0.313

37 Flavodoxin-cheY: Local Pre-processing (locprepro300)
1fx PKALIVYGSTTGNTEYTAETIARQLANAGYEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACF FLAV_DESVH MPKALIVYGSTTGNTEYTaETIARELADAGYEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPL--FDSLEETGAQGRKVACf FLAV_DESSA MSKSLIVYGSTTGNTETAaEYVAEAFENKEIDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPL--YDSLENADLKGKKVSVf FLAV_DESGI MPKALIVYGSTTGNTEGVaEAIAKTLNSEGMETTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPL--YEDLDRAGLKDKKVGVf FLAV_DESDE MSKVLIVFGSSTGNTESIaQKLEELIAAGGHEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSL--FEEFNRFGLAGRKVAAf 4fxn MK--IVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLN-EDILILGCSAMGDEVL------E-ESEFEPF--IEEIS-TKISGKKVALF FLAV_MEGEL MVE--IVYWSGTGNTEAMaNEIEAAVKAAGADVESVRFEDTNVDDVAS-KDVILLgCPAMGSEEL------E-DSVVEPF--FTDLA-PKLKGKKVGLf 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKADAPI--DVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFL-YDKLPEVDMKDLPVAIF FLAV_ANASP SKKIGLFYGTQTGKTESVaEIIRDEFGNDVVTLH--DVSQAEV-TDLNDYQYLIIgCPTWNIGEL QSDWEGL--YSELDDVDFNGKLVAYf FLAV_AZOVI AKIGLFFGSNTGKTRKVaKSIKKRFDDETMSDA-LNVNRVSA-EDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEF--LPKIEGLDFSGKTVALf FLAV_ENTAG MATIGIFFGSDTGQTRKVaKLIHQKLDG--IADAPLDVRRATR-EQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEF--TNTLSEADLTGKTVALf FLAV_ECOLI AITGIFFGSDTGNTENIaKMIQKQLGKDVADVH--DIAKSSK-EDLEAYDILLLgIPTWYYGEA QCDWDDF--FPTLEEIDFNGKLVALf FLAV_CLOAB MKISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNLDAVDKKFLQESEGIIFgTPTYYA NISWEMKKWIDESSEFNLEGKLGAAf 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM DGLEL--LKTIRADGAMSALPVLM 1fx GCGDS--SY-EYFCGA-VD--AIEEKLKNLGAEIVQD GLRID--GDPRAARDDIVGWAHDVRGAI FLAV_DESVH GCGDS--SY-EYFCGA-VD--AIEEKLKNLgAEIVQD GLRID--GDPRAARDDIVGwAHDVRGAI FLAV_DESSA GCGDS--DY-TYFCGA-VD--AIEEKLEKMgAVVIGD SLKID--GDPE--RDEIVSwGSGIADKI FLAV_DESGI GCGDS--SY-TYFCGA-VD--VIEKKAEELgATLVAS SLKID--GEPD--SAEVLDwAREVLARV FLAV_DESDE ASGDQ--EY-EHFCGA-VP--AIEERAKELgATIIAE GLKME--GDASNDPEAVASfAEDVLKQL 4fxn GS------Y-GWGDGKWMR--DFEERMNGYGCVVVET PLIVQ--NEPDEAEQDCIEFGKKIANI FLAV_MEGEL GS------Y-GWGSGEWMD--AWKQRTEDTgATVIGT AI-VN--EMPDNA-PECKElGEAAAKA 2fcr GLGDAE-GYPDNFCDA-IE--EIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ANASP GTGDQI-GYADNFQDA-IG--ILEEKISQRgGKTVGYWSTDGYDFNDSKALRN-GKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------ FLAV_AZOVI GLGDQV-GYPENYLDA-LG--ELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-- FLAV_ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------ FLAV_ECOLI GCGDQE-DYAEYFCDA-LG--TIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA FLAV_CLOAB STANSIAGGSDIALLTILNHLMVKgMLVYSGGVAFGKPKTHLGYVH INEIQENEDENARIfGERiANkVKQIF 3chy VTAEA---KKENIIAA AQAGAS GYVVK-----PFTAATLEEKLNKIFEKLGM------ G

38 Strategies for multiple sequence alignment
Profile pre-processing Secondary structure-induced alignment (Praline-SS) Globalised local alignment Matrix extension Objective: integrate secondary structure information to anchor alignments and avoid errors

39 Protein structure hierarchical levels
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH PRIMARY STRUCTURE (amino acid sequence) SECONDARY STRUCTURE (helices, strands) QUATERNARY STRUCTURE (oligomers) TERTIARY STRUCTURE (fold)

40 Why use (predicted) structural information
“Structure more conserved than sequence” Many structural protein families (e.g. globins) have family members with very low sequence similarities. For example, globin sequences identities can be as low as 10% while still having an identical fold. This means that you can still observe equivalent secondary structures in homologous proteins even if sequence similarities are extremely low. But you are dependent on the quality of prediction methods. For example, secondary structure prediction is currently at 76% correctness. So, 1 out of 4 predicted amino acids is still incorrect.

41 Two superposed protein structures with two well-superposed helices
The superposed structures lead to close pairs of C atoms that are taken as equivalent – this leads to a structural alignment in which the amino acids corresponding to equivalent C atom pairs are matched Red: well superposed Blue: low match quality C5 anaphylatoxin -- human (PDB code 1kjs) and pig (1c5a)) proteins are superposed

42 How to combine secondary structure and amino acid information
Dynamic programming search matrix Amino acid substitution matrices MDAGSTVILCFV HHHCCCEEEEEE M D A S T I L C G H C E H H C C E E Default

43 In terms of scoring… So how would you score a profile using this extra information? Same way of scoring as before, but you can use sec. struct. specific substitution scores in various combinations. Where does it fit in? Very important: structure is always more conserved than sequence so secondary structure elements can help anchoring the alignments

44 Sequences to be aligned
Predict secondary structure HHHHCCEEECCCEEECCHH HHHCCCCEECCCEEHHH HHHHHHHHHHHHHCCCEEEE CCCCCCEECCCEEEECCHH HHHHHCCEEEECCCEECCC Secondary structure Align sequences using secondary structure Multiple alignment

45 Using predicted secondary structure
1fx PK-ALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACF e eeee b ssshhhhhhhhhhhhhhttt eeeee stt tttttt seeee b ee sss ee ttthhhhtt ttss tt eeeee FLAV_DESVH MPK-ALIVYGSTTGNTEYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLFDS-LEETGAQGRKVACf e eeeeee hhhhhhhhhhhhhhh eeeeee eeeeee hhhhhh eeeee FLAV_DESGI MPK-ALIVYGSTTGNTEGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLYED-LDRAGLKDKKVGVf e eeeeee hhhhhhhhhhhhhh eeeeee hhhhhh eeeeeee hhhhhh eeeeee FLAV_DESSA MSK-SLIVYGSTTGNTETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLYDS-LENADLKGKKVSVf eeeeee hhhhhhhhhhhhhh eeeee eeeee hhhhhhh h eeeee FLAV_DESDE MSK-VLIVFGSSTGNTESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLFEE-FNRFGLAGRKVAAf eeee hhhhhhhhhhhhhh eeeee hhhhhhhhhhheeeee hhhhhhh hh eeeee 2fcr K-IGIFFSTSTGNTTEVADFIGKTLGAK---ADAPIDVDDVTDPQALKDYDLLFLGAPTWNTGAD----TERSGTSWDEFLYDKLPEVDMKDLPVAIF eeeee ssshhhhhhhhhhhhhggg b eeggg s gggggg seeeeeee stt s s s sthhhhhhhtggg tt eeeee FLAV_ANASP SKK-IGLFYGTQTGKTESVaEIIRDEFGND--VVTL-HDVSQAE-VTDLNDYQYLIIgCPTWNIGEL QSDWEGLYSE-LDDVDFNGKLVAYf eeeee hhhhhhhhhhhh eee hhh hhhhhhheeeeee hhhhhhhhh eeeeee FLAV_ECOLI AI-TGIFFGSDTGNTENIaKMIQKQLGKD--VADV-HDIAKSS-KEDLEAYDILLLgIPTWYYGEA QCDWDDFFPT-LEEIDFNGKLVALf eee hhhhhhhhhhhh eee hhh hhhhhhheeeee hhhhh eeeeee FLAV_AZOVI AK-IGLFFGSNTGKTRKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFLPK-IEGLDFSGKTVALf eee hhhhhhhhhhhhh hhh hhhhhhheeeee hhhhhhhhh eeeeee FLAV_ENTAG MAT-IGIFFGSDTGQTRKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFTNT-LSEADLTGKTVALf eeee hhhhhhhhhhhh hhh hhhhhhheeeee hhhhh eeeee 4fxn MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVNIDELLNE-DILILGCSAMGDEVL------E-ESEFEPFIEE-IST-KISGKKVALF eeeee ssshhhhhhhhhhhhhhhtt eeeettt sttttt seeeeee btttb ttthhhhhhh hst t tt eeeee FLAV_MEGEL M---VEIVYWSGTGNTEAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVASK-DVILLgCPAMGSEEL------E-DSVVEPFFTD-LAP-KLKGKKVGLf hhhhhhhhhhhhhh eeeee hhhhhhhh eeeee eeeee FLAV_CLOAB M-K-ISILYSSKTGKTERVaKLIEEGVKRSGNIEVKTMNL-DAVDKKFLQESEGIIFgTPTY-YANI SWEMKKWIDE-SSEFNLEGKLGAAf eee hhhhhhhhhhhhhh eeeeee hhhhhhhhhh eeee hhhhhhhhh eeeee 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNN-VEEAEDGV-DALNKLQAGGYGFVISD---WNMPNM DGLELLKTIRADGAMSALPVLMV tt eeee s hhhhhhhhhhhhhht eeeesshh hhhhhhhh eeeee s sss hhhhhhhhhh ttttt eeee 1fx GCGDS-SY-EYFCGAVDAIEEKLKNLGAEIVQD GLRIDGD--PRAARDDIVGWAHDVRGAI eee s ss sstthhhhhhhhhhhttt ee s eeees gggghhhhhhhhhhhhhh FLAV_DESVH GCGDS-SY-EYFCGAVDAIEEKLKNLgAEIVQD GLRIDGD--PRAARDDIVGwAHDVRGAI eee hhhhhhhhhhhh eeeee eeeee hhhhhhhhhhhhhh FLAV_DESGI GCGDS-SY-TYFCGAVDVIEKKAEELgATLVAS SLKIDGE--P--DSAEVLDwAREVLARV eee hhhhhhhhhhhh eeeee hhhhhhhhhhh FLAV_DESSA GCGDS-DY-TYFCGAVDAIEEKLEKMgAVVIGD SLKIDGD--P--ERDEIVSwGSGIADKI hhhhhhhhhhhh eeeee e eee FLAV_DESDE ASGDQ-EY-EHFCGAVPAIEERAKELgATIIAE GLKMEGD--ASNDPEAVASfAEDVLKQL e hhhhhhhhhhhhhh eeeee ee hhhhhhhhhhh 2fcr GLGDAEGYPDNFCDAIEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRD-GKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ eee ttt ttsttthhhhhhhhhhhtt eee b gggs s tteet teesseeeettt ss hhhhhhhhhhhhhhhht FLAV_ANASP GTGDQIGYADNFQDAIGILEEKISQRgGKTVGYWSTDGYDFNDSKALR-NGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------ hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhh FLAV_ECOLI GCGDQEDYAEYFCDALGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA hhhhhhhhhhhhhh eeee hhhhhhhhhhhhhhhhhh FLAV_AZOVI GLGDQVGYPENYLDALGELYSFFKDRgAKIVGSWSTDGYEFESSEAVVD-GKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L-- e hhhhhhhhhhhhhh eeeee hhhhhhhhhhh FLAV_ENTAG GLGDQLNYSKNFVSAMRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------ hhhhhhhhhhhhhhh eeee hhhhhhh hhhhhhhhhhhh 4fxn G-----SYGWGDGKWMRDFEERMNGYGCVVVET PLIVQNE--PDEAEQDCIEFGKKIANI e eesss shhhhhhhhhhhhtt ee s eeees ggghhhhhhhhhhhht FLAV_MEGEL G-----SYGWGSGEWMDAWKQRTEDTgATVIGT AIVNEM--PDNAPE-CKElGEAAAKA hhhhhhhhhhh eeeee eeee h hhhhhhhh FLAV_CLOAB STANSIA-GGSDIALLTILNHLMVK-gMLVYSG----GVAFGKPKTHLG-----YVHINEI--QENEDENARIfGERiANkV--KQIF-- hhhhhhhhhhhhhh eeeee hhhh hhh hhhhhhhhhhhh h 3chy TAEAKKENIIAAAQAGASGY VVK----P-FTAATLEEKLNKIFEKLGM------ ess hhhhhhhhhtt see ees s hhhhhhhhhhhhhhht G

46 Strategies for multiple sequence alignment
not for exam Profile pre-processing Secondary structure-induced alignment Globalised local alignment Matrix extension Objectives: Instead of single amino acid positions, focus on local alignments Consider best local alignment through each cell in DP matrix Try to avoid (early) errors

47 Globalised local alignment
not for exam 1. Local (SW) alignment (M + Po,e) + = 2. Global (NW) alignment (no M or Po,e) Double dynamic programming

48 Globalised local alignment
not for exam 1. 2.

49 M = BLOSUM62, Po= 0, Pe= 0 not for exam

50 M = BLOSUM62, Po= 12, Pe= 1 not for exam

51 M = BLOSUM62, Po= 60, Pe= 5 not for exam

52 Strategies for multiple sequence alignment
Profile pre-processing Secondary structure-induced alignment Globalised local alignment Matrix extension Objective: try to avoid (early) errors

53 Integrating alignment methods and alignment information with T-Coffee
Integrating different pair-wise alignment techniques (NW, SW, ..) Combining different multiple alignment methods (consensus multiple alignment) Combining sequence alignment methods with structural alignment techniques Plug in user knowledge

54 Matrix extension T-Coffee
Tree-based Consistency Objective Function For alignmEnt Evaluation Cedric Notredame (“Bioinformatics for dummies”) Des Higgins Jaap Heringa J. Mol. Biol., 302, ;2000

55 Using different sources of alignment information
Clustal Clustal Structure alignments Dialign Lalign Manual T-Coffee

56 T-Coffee library system
Seq1 AA1 Seq2 AA2 Weight 3 V31 5 L33 10 3 V31 6 L34 14 5 L33 6 R35 21 5 l33 6 I36 35

57 Matrix extension 2 1 3 1 4 1 3 2 4 2 4 3

58 Search matrix extension – alignment transitivity

59 T-Coffee Other sequences Direct alignment

60 Search matrix extension

61 T-COFFEE web-interface

62 3D-COFFEE Computes structural based alignments
Structures associated with the sequences are retrieved and the information is used to optimise the MSA More accurate … but for many (many) proteins we do not have the structure!

63 but..... T-COFFEE (V1.23) multiple sequence alignment Flavodoxin-cheY
1fx PKALIVYGSTTGNTEYTAETIARQLANAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK----- FLAV_DESVH MPKALIVYGSTTGNTEYTAETIARELADAG-YEVDSRDAASVE-AGGLFEGFDLVLLGCSTWGDDSIE------LQDDFIPL-FDSLEETGAQGRK----- FLAV_DESGI MPKALIVYGSTTGNTEGVAEAIAKTLNSEG-METTVVNVADVT-APGLAEGYDVVLLGCSTWGDDEIE------LQEDFVPL-YEDLDRAGLKDKK----- FLAV_DESSA MSKSLIVYGSTTGNTETAAEYVAEAFENKE-IDVELKNVTDVS-VADLGNGYDIVLFGCSTWGEEEIE------LQDDFIPL-YDSLENADLKGKK----- FLAV_DESDE MSKVLIVFGSSTGNTESIAQKLEELIAAGG-HEVTLLNAADAS-AENLADGYDAVLFGCSAWGMEDLE------MQDDFLSL-FEEFNRFGLAGRK----- 4fxn MKIVYWSGTGNTEKMAELIAKGIIESG-KDVNTINVSDVN-IDELL-NEDILILGCSAMGDEVLE ESEFEPF-IEEIS-TKISGKK----- FLAV_MEGEL MVEIVYWSGTGNTEAMANEIEAAVKAAG-ADVESVRFEDTN-VDDVA-SKDVILLGCPAMGSEELE DSVVEPF-FTDLA-PKLKGKK----- FLAV_CLOAB MKISILYSSKTGKTERVAKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQ-ESEGIIFGTPTYYAN ISWEMKKW-IDESSEFNLEGKL----- 2fcr KIGIFFSTSTGNTTEVADFIGKTLGAKA---DAPIDVDDVTDPQAL-KDYDLLFLGAPTWNTGA----DTERSGTSWDEFLYDKLPEVDMKDLP----- FLAV_ENTAG MATIGIFFGSDTGQTRKVAKLIHQKLDGIA---DAPLDVRRAT-REQF-LSYPVLLLGTPTLGDGELPGVEAGSQYDSWQEF-TNTLSEADLTGKT----- FLAV_ANASP SKKIGLFYGTQTGKTESVAEIIRDEFGNDV---VTLHDVSQAE-VTDL-NDYQYLIIGCPTWNIGEL QSDWEGL-YSELDDVDFNGKL----- FLAV_AZOVI AKIGLFFGSNTGKTRKVAKSIKKRFDDET-M-SDALNVNRVS-AEDF-AQYQFLILGTPTLGEGELPGLSSDCENESWEEF-LPKIEGLDFSGKT----- FLAV_ECOLI AITGIFFGSDTGNTENIAKMIQKQLGKDV---ADVHDIAKSS-KEDL-EAYDILLLGIPTWYYGEA QCDWDDF-FPTLEEIDFNGKL----- 3chy ADKELKFLVVD--DFSTMRRIVRNLLKELGFN-NVE-EAEDGVDALNKLQ-AGGYGFVISDWNMPNMDGLE LLKTIRADGAMSALPVLMV : : :: 1fx VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG LRIDGDPRAA--RDDIVGWAHDVRGAI FLAV_DESVH VACFGCGDSS--YEYFCGA-VDAIEEKLKNLGAEIVQDG LRIDGDPRAA--RDDIVGWAHDVRGAI FLAV_DESGI VGVFGCGDSS--YTYFCGA-VDVIEKKAEELGATLVASS LKIDGEPDSA----EVLDWAREVLARV FLAV_DESSA VSVFGCGDSD--YTYFCGA-VDAIEEKLEKMGAVVIGDS LKIDGDPE----RDEIVSWGSGIADKI FLAV_DESDE VAAFASGDQE--YEHFCGA-VPAIEERAKELGATIIAEG LKMEGDASND--PEAVASFAEDVLKQL 4fxn VALFGS------YGWGDGKWMRDFEERMNGYGCVVVETP LIVQNEPD--EAEQDCIEFGKKIANI FLAV_MEGEL VGLFGS------YGWGSGEWMDAWKQRTEDTGATVIGTA IV--NEMP--DNAPECKELGEAAAKA FLAV_CLOAB GAAFSTANSI--AGGSDIA-LLTILNHLMVKGMLVY----SGGVAFGKPKTHLGYVHINEIQENEDENARIFGERIANKVKQIF 2fcr VAIFGLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKSVRDG-KFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------ FLAV_ENTAG VALFGLGDQLNYSKNFVSA-MRILYDLVIARGACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSWLEKLKPAVL FLAV_ANASP VAYFGTGDQIGYADNFQDA-IGILEEKISQRGGKTVGYWSTDGYDFNDSKALRNG-KFVGLALDEDNQSDLTDDRIKSWVAQLKSEFGL------ FLAV_AZOVI VALFGLGDQVGYPENYLDA-LGELYSFFKDRGAKIVGSWSTDGYEFESSEAVVDG-KFVGLALDLDNQSGKTDERVAAWLAQIAPEFGLSL---- FLAV_ECOLI VALFGCGDQEDYAEYFCDA-LGTIRDIIEPRGATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKWVKQISEELHLDEILNA 3chy TAEAKKENIIAAAQAGASGYVVKPFT---AATLEEKLNKIFEKLGM .

64 Multiple alignment methods
Multi-dimensional dynamic programming > extension of pairwise sequence alignment. Progressive alignment > incorporates phylogenetic information to guide the alignment process Iterative alignment > correct for problems with progressive alignment by repeatedly realigning subgroups of sequence

65 Iteration Convergence Limit cycle Divergence
Iteration can help in cases where one can learn from the data produced in a preceding step, so that the next step can be taken in a ‘more informed’ way. Convergence Limit cycle Divergence

66 Pre-profile alignment Alignment consistency
Ala131 1 1 1 2 3 A131 L133 C126 4 5 2 2 1 2 3 4 5 3 3 1 2 4 5 4 4 1 2 5 3 5 5 5 1 2 3 4

67 Flavodoxin-cheY consistency scores (PRALINE prepro=0)
Completely consistently aligned amino acids 1fx TEYTAETIARQL VL999ST AQGRKVACF FLAV_DESVH TEYTAETIAREL VL999ST AQGRKVACF FLAV_DESDE YDAVL999SAW GRKVAAF FLAV_DESGI TEGVAEAIAKTL DVVL999ST FLAV_DESSA STW 4fxn FLAV_MEGEL 2fcr TEVADFIGK DLLF FLAV_ANASP LFYGTQTGKTESVAEIIR FLAV_ECOLI GSDTGNTENIAKMIQ FLAV_AZOVI IGLFFGSNTGKTRKVAKSIK FLAV_ENTAG FLAV_CLOAB ILYSSKTGKTERVAK 3chy Avrg Consist Conservation 1fx G FLAV_DESVH G FLAV_DESDE A FLAV_DESGI FLAV_DESSA 4fxn FLAV_MEGEL 2fcr FLAV_ANASP FLAV_ECOLI FLAV_AZOVI FLAV_ENTAG FLAV_CLOAB 3chy Avrg Consist Conservation * Iteration 0 SP= AvSP= SId= AvSId= 0.297 Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)

68 Flavodoxin-cheY consistency scores
(PRALINE prepro=1500) 1fx IVYGSTTGNTEYTAETIARQL DLVLLGCSTW AQGRKVACF FLAV_DESVH IVYGSTTGNTEYTAETIAREL DLVLLGCSTW AQGRKVACF FLAV_DESSA IVYGSTTGNTET YDIVLFGCSTW SL98ADLKGKKVSVF FLAV_DESGI IVYGSTTGNTEGVA DVVLLGCSTW KKVGVF FLAV_DESDE IVFGSSTGNTE YDAVLFGCSAW GRKVAAF 4fxn IVYWSGTGNTE NI DILILGCSA ISGKKVALF FLAV_MEGEL IVYWSGTGNTEAMA DVILLGCPAMGSE GKKVGLF 2fcr IFFSTSTGNTTEVA YDLLFLGAPT DKLPEVDMKDLPVAIF FLAV_ANASP LFYGTQTGKTESVAEII YQYLIIGCPTW W GKLVAYF FLAV_AZOVI LFFGSNTGKTRKVAKSIK YQFLILGTPTLGEG KTVALF FLAV_ENTAG IGIFFGSDTGQTRKVAKLIHQKL DVRRATR88888SYPVLLLGTPT WQEF8-8NTLSEADLTGKTVALF FLAV_ECOLI IFFGSDTGNTENIAKMI YDILLLGIPT KLVALF FLAV_CLOAB ILYSSKTGKTERVAKLIE LQESEGIIFGTPTY SWE GKLGAAF 3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQ-AGGYGFVI---SDWNMPNM DGLEL--LKTIRADGAMSALPVLM Avrg Consist Conservation fx G FLAV_DESVH G FLAV_DESSA G FLAV_DESGI G GATLV FLAV_DESDE AS fxn GS FLAV_MEGEL G MD--AWKQRTEDTGATVI fcr GLGDA5-8Y5DNFC FLAV_ANASP GTGDQ5-GY EEKISQRGG FLAV_AZOVI GLGDQ FLAV_ENTAG GLGDQL-NYSKNFVSA-MR--ILYDLVIARGACVVG8888EGYKFSFSAA6664NEFVGLPLDQEN88888EERIDSWLE FLAV_ECOLI GC FLAV_CLOAB STANS EDENARIFGERIANKVKQI chy VTAEA---KKENIIAA AQAGAS GYVVK-----PFTAATLEEKLNKIFEKLGM Avrg Consist Conservation * Iteration 0 SP= AvSP= SId= AvSId= 0.308 Consistency values are scored from 0 to 10; the value 10 is represented by the corresponding amino acid (red)

69 Consistency iteration
Pre-profiles Multiple alignment positional consistency scores

70 Pre-profile update iteration
Pre-profiles Multiple alignment

71 Iterate similarity matrix, guide tree and MSA
1 Score 1-2 2 1 Score 1-3 3 4 5 Score 4-5 Similarity matrix Scores This way of iterating was already implemented in by Hogeweg and Hesper 5×5 Guide tree Multiple alignment

72 Secondary structure-induced alignment

73 PRALINE Using secondary structure for alignment
Dynamic programming search matrix Amino acid exchange weights matrices MDAGSTVILCFV HHHCCCEEEEEE M D A S T I L C G H C E H H C C E E Default

74 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

75 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

76 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

77 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

78 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

79 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

80 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

81 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

82 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

83 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

84 Flavodoxin-cheY multiple alignment/ secondary structure iteration
cheY SSEs 3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP| 3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE | 3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE | 3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE | 3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE | 3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE | 3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE | 3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE | 3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM| 3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH | 3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH | 3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH | 3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

85 Is the initial SS prediction good enough?

86 MUSCLE Edgar 2004

87 PRALINE and MUSCLE method
PRALINE and MUSCLE use almost the same formalism to compare two profiles: MUSCLE: PRALINE: The difference is the position of the log in the above equations: Edgar calls the Muscle scoring scheme “Log-expectation score (LE)”

88 So what do we do ? A single shot for a good alignment without thinking: MUSCLE, T-COFFEE, PROBCONS (maybe POA) If you want to experiment with making alignments for a given sequence set: PRALINE Profile pre-processing Iteration Secondary structure-induced alignment Globalised local alignment There is no single method that always generates the best alignment Therefore best is to use more than one method: e.g. include Dialign2 (local)

89 Recap Weighting schemes to use information from all sequences right from the start during the progressive MSA protocol: Profile pre-processing (global/local) (PRALINE) Matrix extension (well balanced scheme) (T-Coffee) Smoothing alignment signals: globalised local alignment (PRALINE) Consistency based mixing of local and global alignment (T-Coffee) Using additional information: secondary structure driven alignment (PRALINE) Iterative schemes to alleviate the ‘greediness’ of the progressive MSA protocol: Profile pre-processing iteration (PRALINE) secondary structure driven iteration (PRALINE) ‘classical’ distance matrix iteration Binary cutting of guide tree and realignment of groups (MUSCLE)

90 References Heringa, J. (1999) Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. Comp. Chem. 23, Notredame, C., Higgins, D.G., Heringa, J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, Heringa, J. (2002) Local weighting schemes for protein multiple sequence alignment. Comput. Chem., 26(5), Simossis, V.A., Kleinjung, J. and Heringa, J. (2005) Homology-extended sequence alignment. Nucleic Acids Res. 33(3):


Download ppt "Introduction to bioinformatics 2007 Lecture 10"

Similar presentations


Ads by Google