Download presentation
Presentation is loading. Please wait.
Published byBarbara O’Neal’ Modified over 9 years ago
1
An Introduction to Multiple Sequence Alignments Cédric Notredame
2
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. :
3
Manguel M, Samaniego F.J., Abraham Wald’s Work on Aircraft Suvivability, J. American Statistical Association. 79, 259-270, (1984)
4
Our Scope How Can I Use My Alignment? How Does The Computer Align The Sequences? How Can I Assemble a Mult. Aln? What are the Difficulties?
5
Outline -Why Do We Need Multiple Sequence Alignment ? -The progressive Alignment Algorithm -A possible Strategy… -Potential Difficulties
6
Pre-requisite -How Do Sequences Evolve? -How can We COMPARE Sequences ? -How can We ALIGN Sequences ?
7
Why Do We Need Multiple Sequence Alignment ?
8
Sometimes Two Sequences Are Not Enough… The man with TWO watches NEVER knows the time
9
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : What is A Multiple Sequence Alignment? Structural Criteria: Residues are arranged so that those playing a similar role end up in the same column. Evolution Criteria: Residues are arranged so that those having the same ancestor end up in the same column.
10
Phylogenic Relation Functional Relation
12
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP unknown -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- unknown AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Beyond The Twilight Zone SwissProt Unkown Sequence Homology? Less Than 30 % id BUT Conserved where it MATTERS
14
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Prosite Patterns
15
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Prosite Patterns P-K-R-[PA]-x(1)-[ST]…
16
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Prosite Patterns SwissProt Uncharacterised Signature Match?
17
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Prosite Patterns Profiles And HMMs -More Sensitive -More Specific L? K>R A F D E F G H Q I V L W
18
A PROSITE PROFILE A Substitution Cost For Every Amino Acid, At Every Position
19
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Motifs/Patterns Phylogeny chite wheat trybr mouse -Evolution -Paralogy/Orthology Profiles
20
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Motifs/Patterns Phylogeny Profiles Struc. Prediction Column Constraint Evolution Constraint Structure Constraint
21
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Extrapolation Motifs/Patterns Phylogeny Profiles Struc. Prediction PsiPred OR PhD For secondary Structure Prediction: 75% Accurate. Threading: is improving but is not yet as good.
22
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : How Can I Use A Multiple Sequence Alignment? Automatic Multiple Sequence Alignment methods are not always perfect… You know better… With your big BRAIN
24
Why Is It Difficult To Compute A multiple Sequence Alignment? A CROSSROAD PROBLEM BIOLOGY: What is A Good Alignment COMPUTATION What is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: *
25
The Biological Problem. Same as PairWise Alignment Problem We do NOT know how Sequences Evolve. We do NOT understand the Relation Between Structures and Sequences. We would NOT recognize the Correct Alignment if we had it IN FRONT of our eyes…
26
The Biological Problem. The Charlie Chaplin Paradox
27
The Biological Problem. How to Evaluate an Alignment -Substitution Matrix (Blosum) -An Evaluation Function A AAACCAAACC -Gap Penalties. -A nice set of Sequences A A A C Sums of Pairs: Cost=6 C Over-estimation of the Substitutions Easy to compute
28
The COMPUTATIONAL Problem. Producing the Alignment -Substitution Matrix (Blosum) -An Evaluation Function -Gap Penalties. -A nice set of Sequences -An Alignment Algorithm GLOBAL Alignment Will It Work ?
29
HOW CAN I ALIGN MANY SEQUENCES 2 Globins =>1 Min
30
3 Globins =>2 hours HOW CAN I ALIGN MANY SEQUENCES
31
4 Globins => 10 days HOW CAN I ALIGN MANY SEQUENCES
32
5 Globins => 3 years HOW CAN I ALIGN MANY SEQUENCES
33
6 Globins =>300 years HOW CAN I ALIGN MANY SEQUENCES ! DHEA Loaded
34
7 Globins =>30. 000 years HOW CAN I ALIGN MANY SEQUENCES Solidified Fossil, Old stuff
35
8 Globins =>3 Million years HOW CAN I ALIGN MANY SEQUENCES
36
The Progressive Multiple Alignment Algorithm (Clustal W)
38
Making An Alignment Any Exact Method would be TOO SLOW We will use a Heuristic Algorithm. Progressive Alignment Algorithm is the most Popular -Fast -ClustalW -Greedy Heuristic (No Guarranty).
39
Progressive Alignment Feng and Dolittle, 1988; Taylor 1989 Clustering
40
Dynamic Programming Using A Substitution Matrix Progressive Alignment
41
-Depends on the ORDER of the sequences (Tree). -Depends on the CHOICE of the sequences. -Depends on the PARAMETERS: Substitution Matrix. Penalties (Gop, Gep). Sequence Weight. Tree making Algorithm.
42
Progressive Alignment When Does It Work Works Well When Phylogeny is Dense No outlayer Sequence. Image: River Crossing
43
SeqA GARFIELD THE LAST FA-T CAT SeqB GARFIELD THE FAST CA-T --- SeqC GARFIELD THE VERY FAST CAT SeqD -------- THE ---- FA-T CAT CLUSTALW (Score=20, Gop=-1, Gep=0, M=1) SeqA GARFIELD THE LAST FA-T CAT SeqB GARFIELD THE FAST ---- CAT SeqC GARFIELD THE VERY FAST CAT SeqD -------- THE ---- FA-T CAT CORRECT (Score=24) Progressive Alignment When Doesn’t It Work
44
GARFIELD THE LAST FAT CAT GARFIELD THE FAST CAT --- GARFIELD THE LAST FAT CAT GARFIELD THE FAST CAT GARFIELD THE VERY FAST CAT THE FAT CAT GARFIELD THE VERY FAST CAT -------- THE ---- FA-T CAT GARFIELD THE LAST FA-T CAT GARFIELD THE FAST CA-T --- GARFIELD THE VERY FAST CAT -------- THE ---- FA-T CAT
45
Building the Right Multiple Sequence Alignment.
46
Recognizing The Right Sequences When you Meet Them…
47
Gathering Sequences: BLAST
48
Common Mistake: Sequences Too Closely Related PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE PRVA_HUMAN SMTDLLNAEDIKKAVGAFSATDSFDHKKFFQMVGLKKKSADDVKKVFHMLDKDKSGFIEE PRVA_GERSP SMTDLLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEE PRVA_MOUSE SMTDVLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKNPDEVKKVFHILDKDKSGFIEE PRVA_RAT SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE PRVA_RABIT AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHILDKDKSGFIEE :**::*.*******:***:* :****************..::******:*********** PRVA_MACFU DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES PRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDGDGKIGVDEFSTLVAES PRVA_GERSP DELGFILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVSES PRVA_MOUSE DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES PRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES PRVA_RABIT EELGFILKGFSPDARDLSVKETKTLMAAGDKDGDGKIGADEFSTLVSES :*** ******.******.**** *:************.:******:** -IDENTICAL SEQUENCES BRING NO INFORMATION FOR THE MULTIPLE SEQUENCE ALIGNMENT -MULTIPLE SEQUENCE ALIGNMENTS THRIVE ON DIVERSITY…
49
Sequence Weighting Within ClustalW
50
Selecting Diverse Sequences (Opus II)
51
Respect Information! This Alignment Is not Informative about the relation Betwwen TPCC MOUSE and the rest of the sequences. PRVA_MACFU ------------------------------------------SMTDLLN----AEDIKKA PRVA_HUMAN ------------------------------------------SMTDLLN----AEDIKKA PRVA_GERSP ------------------------------------------SMTDLLS----AEDIKKA PRVA_MOUSE ------------------------------------------SMTDVLS----AEDIKKA PRVA_RAT ------------------------------------------SMTDLLS----AEDIKKA PRVA_RABIT ------------------------------------------AMTELLN----AEDIKKA TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : :*..*:::: PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVA_HUMAN VGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFI PRVA_GERSP IGAFAAADS--FDHKKFFQMVG------LKKKTPDDVKKVFHILDKDKSGFIEEDELGFI PRVA_MOUSE IGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSI PRVA_RAT IGAFTAADS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGSI PRVA_RABIT IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSGFIEEEELGFI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM -A better Spread of the Sequences is needed
52
Selecting Diverse Sequences (Opus II)
53
PRVB_CYPCA -AFAGVLNDADIAAALEACKAADSFNHKAFFAKVGLTSKSADDVKKAFAIIDQDKSGFIE PRVB_BOACO -AFAGILSDADIAAGLQSCQAADSFSCKTFFAKSGLHSKSKDQLTKVFGVIDRDKSGYIE PRV1_SALSA MACAHLCKEADIKTALEACKAADTFSFKTFFHTIGFASKSADDVKKAFKVIDQDASGFIE PRVB_LATCH -AVAKLLAAADVTAALEGCKADDSFNHKVFFQKTGLAKKSNEELEAIFKILDQDKSGFIE PRVB_RANES -SITDIVSEKDIDAALESVKAAGSFNYKIFFQKVGLAGKSAADAKKVFEILDRDKSGFIE PRVA_MACFU -SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIE PRVA_ESOLU --AKDLLKADDIKKALDAVKAEGSFNHKKFFALVGLKAMSANDVKKVFKAIDADASGFIE : *:.:..*.:*. * ** *: * : * :* * **:** PRVB_CYPCA EDELKLFLQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA- PRVB_BOACO EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG PRV1_SALSA VEELKLFLQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ- PRVB_LATCH DEELELFLQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA- PRVB_RANES QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA- PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES PRVA_ESOLU EEELKFVLKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA :**.*:.*.* *: ** ::.* **** **::** ** -A REASONABLE Model Now Exists. -Going Further:Remote Homologues.
54
Aligning Remote Homologues PRVA_MACFU ------------------------------------------SMTDLLNA----EDIKKA PRVA_ESOLU -------------------------------------------AKDLLKA----DDIKKA PRVB_CYPCA ------------------------------------------AFAGVLND----ADIAAA PRVB_BOACO ------------------------------------------AFAGILSD----ADIAAG PRV1_SALSA -----------------------------------------MACAHLCKE----ADIKTA PRVB_LATCH ------------------------------------------AVAKLLAA----ADVTAA PRVB_RANES ------------------------------------------SITDIVSE----KDIDAA TPCS_RABIT -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAI TPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAI TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : :: PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVA_ESOLU LDAVKAEGS--FNHKKFFALVG------LKAMSANDVKKVFKAIDADASGFIEEEELKFV PRVB_CYPCA LEACKAADS--FNHKAFFAKVG------LTSKSADDVKKAFAIIDQDKSGFIEEDELKLF PRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF PRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLF PRVB_LATCH LEGCKADDS--FNHKVFFQKTG------LAKKSNEELEAIFKILDQDKSGFIEDEELELF PRVB_RANES LESVKAAGS--FNYKIFFQKVG------LAGKSAADAKKVFEILDRDKSGFIEQDELGLF TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEI TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM :..:... *: * : * :* :.*:*: :**. PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES- PRVA_ESOLU LKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA- PRVB_CYPCA LQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA-- PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG- PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ-- PRVB_LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA-- PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-- TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE ::.. :: : ::.* :.** *. :** ::
55
Some Guidelines …
56
Do Not Use Two Many Sequences…
57
Reading Your Alignment
59
Going Further… PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF PRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLF TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEI TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM TPC_PATYE SDEMDEEATGRLNCDAWIQLFER---KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI. :... ::. : * :* :.* *. : *. PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES-- PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-- PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ--- TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ- TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ- TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE- TPC_PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMMSSDA :. :: : :: * :..* :. :** ::
60
WHAT MAKES A GOOD ALIGNMENT… -THE MORE DIVERGEANT THE SEQUENCES, THE BETTER -THE FEWER INDELS, THE BETTER -NICE UNGAPPED BLOCKS SEPARATED WITH INDELS -DIFFERENT CLASSES OF RESIDUES WITHIN A BLOCK: Completely Conserved Conserved For Size and Hydropathy Conserved For Size or Hydropathy -THE ULTIMATE EVALUATION IS A MATTER OF PERSONNAL JUDGEMENT AND KNOWLEDGE.
62
Potential Difficulties
63
DO NOT OVERTUNE!!! chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. : DO NOT PLAY WITH PARAMETERS IF YOU KNOW THE ALIGNMENT YOU WANT: MAKE IT YOURSELF! chite ---ADKPKRPL-SAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :*:.:... :.. *. *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM--------- mouse AKDDRIRYDNEMKSWEEQMAE * :.*. :
64
TUNING or NOT TUNING!!! -MOST METHODS ARE TUNED FOR WORKING WELL ON AVERAGE -PARAMETERS BEHAVIOUR DO NOT NECESSARILY FOLLOW THE THEORY (i.e. Substitution Matrices). -A GOOD ALIGNMENT IS USUALLY ROBUST(i.e. Changes little). -TUNE IF YOU WANT TO CONVINCE YOURSELF. -PARAMETERS TO TUNE USUALLY INCLUDE: GOP/ GEP MATRIX SENSITIVITY Vs SPEED GOP GEP Substitution Matrices (Etzold and al. 1993) Gonnet61.7 % Blosum5059.7 % Pam25059.2 %
66
KEEP A BIOLOGICAL PERSPECTIVE chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse -----KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :::.:... :.. *. *: * chite AD--K----PKR-PLYMLWLNS-ARESIKRENPDFK-VT-EVAKKGGELWRGL- wheat -DPNK----PKRAP-FFVFMGE-FREEFKQKNPKNKSVA-AVGKAAGERWKSLS trybr -K--KDSNAPKR-AMT-MFFSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG mouse ----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAWKNLS * ***.:: ::... : *... : *. *: * DIFFERENT PARAMETERS WRONG ALIGNMENT !!!
67
REPEATS THERE IS A PROBLEM WHEN TWO SEQUENCES DO NOT CONTAIN THE SAME NUMBER OF REPEATS IT IS THEN BETTER TO MANUALLY EXTRACT THE REPEATS AND TO ALIGN THEM. INDIVIDUAL REPEATS CAN BE RECOGNIZED USING DOTTER
69
Naming Your Sequences The Right Way
70
What Are The Available Methods ???
71
Simultaneous Alignments : MSA 1) Set Bounds on each pair of sequences (Carillo and Lipman) 2) Compute the Maln within the Hyperspace -Few Small Closely Related Sequence. -Do Well When They Can Run. -Memory and CPU hungry
72
Simultaneous Alignments : DCA -Few Small Closely Related Sequence, but less limited than MSA -Do Well When Can Run. -Memory and CPU hungry, but less than MSA
73
Dialign II 1) Identify best chain of segments on each pair of sequence. Assign a Pvalue to each Segment Pair. 3) Assemble the alignment according to the segment pairs. 2) Ré-évaluate each segment pair according to its consistency with the others
74
Muscle
75
7.16.1 Progressive Iterative Methods -HMMs, HMMER, SAM, MUSCLE -Slow, Sometimes Inaccurate -Good Profile Generators
76
7.16.1 Progressive MUSCLE
77
7.16.1 Progressive MUSCLE phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py
78
MAFFT Fast Fourrier Transformé
79
Prank
80
Stachmo
81
Mixing Heterogenous Data With T-Coffee Local AlignmentGlobal Alignment Multiple Sequence Alignment Multiple Alignment StructuralSpecialist
82
Struct Vs Struct Seq Vs Struct Thread Evaluation on Homestrad Superpose Seq Vs Seq Local Global Mixing Sequences and Structures with T-Coffee
83
www.tcoffee.org
84
What is The Best Method ?
86
A better Question… What is the Best Alignment ? What is the best bit of my alignment ?
87
What is the Local Quality of my Alignment ? II I
88
Choosing the right method
89
Situation Solution
90
Priority Solution Method Priority TreesProfile2D –Pred3D-PredFunc-Pred Accuracy Speed
91
Purpose Solution
92
Conclusion
93
-The BEST alignment Method: Your Brain The Right Data -Beware of repeated elements Multiple Alignment -The Best Evaluation Procedure: Experimental Data (SwissProt) -Choosing The Sequences Well is Important
94
Know Your Problem: What do you want to do with your MSA Multiple Alignment
95
Addresses MAFFT Progressive/iterativewww.biophys.kyoto-u.jp/katoh POA Progressive/Simultaneouswww.bioinformatics.ucla.edu/poa MUSCLE Progressive/Iterativewww.drive5.com/muscle
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.