Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Multiple Sequence Alignments

Similar presentations


Presentation on theme: "An Introduction to Multiple Sequence Alignments"— Presentation transcript:

1 An Introduction to Multiple Sequence Alignments
Cédric Notredame

2 chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD
wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

3 Manguel M, Samaniego F.J., Abraham Wald’s Work on Aircraft Suvivability, J. American Statistical Association. 79, , (1984)

4 Our Scope How Can I Use My Alignment?
How Does The Computer Align The Sequences? How Can I Assemble a Mult. Aln? What are the Difficulties?

5 Outline -Why Do We Need Multiple Sequence Alignment ?
-The progressive Alignment Algorithm -A possible Strategy… -Potential Difficulties

6 Pre-requisite -How Do Sequences Evolve?
-How can We COMPARE Sequences ? -How can We ALIGN Sequences ?

7 Why Do We Need Multiple Sequence Alignment ?

8 Sometimes Two Sequences Are Not Enough…
The man with TWO watches NEVER knows the time

9 What is A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Structural Criteria: Residues are arranged so that those playing a similar role end up in the same column. Evolution Criteria: Residues are arranged so that those having the same ancestor end up in the same column.

10 Functional Relation Phylogenic Relation

11

12 How Can I Use A Multiple Sequence Alignment?
chite ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP unknown KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM unknown AKDDRIRYDNEMKSWEEQMAE * : .* . : Less Than 30 % id BUT Conserved where it MATTERS Extrapolation Beyond The Twilight Zone SwissProt Unkown Sequence Homology?

13

14 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation Prosite Patterns

15 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation P-K-R-[PA]-x(1)-[ST]… Prosite Patterns

16 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation Prosite Patterns SwissProt Uncharacterised Signature Match?

17 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-IQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : L? K>R Extrapolation A F D E G H Q I V L W Prosite Patterns Profiles And HMMs -More Sensitive -More Specific

18 A PROSITE PROFILE A Substitution Cost For Every Amino Acid, At Every Position

19 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation chite wheat Motifs/Patterns trybr Profiles mouse -Evolution -Paralogy/Orthology Phylogeny

20 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Column Constraint  Evolution Constraint Structure Constraint Extrapolation Motifs/Patterns Profiles Phylogeny Struc. Prediction

21 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Extrapolation PsiPred OR PhD For secondary Structure Prediction: 75% Accurate. Motifs/Patterns Profiles Threading: is improving but is not yet as good. Phylogeny Struc. Prediction

22 How Can I Use A Multiple Sequence Alignment?
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : Automatic Multiple Sequence Alignment methods are not always perfect… You know better… With your big BRAIN

23

24 Why Is It Difficult To Compute A multiple Sequence Alignment?
A CROSSROAD PROBLEM BIOLOGY: What is A Good Alignment COMPUTATION What is THE Good Alignment chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: *

25 The Biological Problem.
Same as PairWise Alignment Problem We do NOT know how Sequences Evolve. We do NOT understand the Relation Between Structures and Sequences. We would NOT recognize the Correct Alignment if we had it IN FRONT of our eyes…

26 The Biological Problem. The Charlie Chaplin Paradox

27 The Biological Problem. How to Evaluate an Alignment
-Gap Penalties. -A nice set of Sequences -Substitution Matrix (Blosum) -An Evaluation Function A A A C A C Sums of Pairs: Cost=6 Over-estimation of the Substitutions Easy to compute

28 The COMPUTATIONAL Problem. Producing the Alignment
-Gap Penalties. -A nice set of Sequences -Substitution Matrix (Blosum) -An Evaluation Function -An Alignment Algorithm GLOBAL Alignment Will It Work ?

29 HOW CAN I ALIGN MANY SEQUENCES
2 Globins =>1 Min

30 HOW CAN I ALIGN MANY SEQUENCES
3 Globins =>2 hours

31 HOW CAN I ALIGN MANY SEQUENCES
4 Globins => 10 days

32 HOW CAN I ALIGN MANY SEQUENCES
5 Globins => 3 years

33 HOW CAN I ALIGN MANY SEQUENCES
! DHEA Loaded 6 Globins =>300 years

34 HOW CAN I ALIGN MANY SEQUENCES
7 Globins => years Solidified Fossil, Old stuff

35 HOW CAN I ALIGN MANY SEQUENCES 8 Globins =>3 Million years

36 The Progressive Multiple Alignment Algorithm
(Clustal W)

37

38 Making An Alignment Any Exact Method would be TOO SLOW We will use a Heuristic Algorithm. Progressive Alignment Algorithm is the most Popular -ClustalW -Greedy Heuristic (No Guarranty). -Fast

39 Progressive Alignment Feng and Dolittle, 1988; Taylor 1989
Clustering

40 Progressive Alignment
Dynamic Programming Using A Substitution Matrix

41 Progressive Alignment
-Depends on the CHOICE of the sequences. -Depends on the ORDER of the sequences (Tree). -Depends on the PARAMETERS: Substitution Matrix. Penalties (Gop, Gep). Sequence Weight. Tree making Algorithm.

42 Progressive Alignment
When Does It Work Works Well When Phylogeny is Dense No outlayer Sequence. Image: River Crossing

43 Progressive Alignment
When Doesn’t It Work SeqA GARFIELD THE LAST FA-T CAT SeqB GARFIELD THE FAST CA-T --- SeqC GARFIELD THE VERY FAST CAT SeqD THE ---- FA-T CAT CLUSTALW (Score=20, Gop=-1, Gep=0, M=1) SeqA GARFIELD THE LAST FA-T CAT SeqB GARFIELD THE FAST ---- CAT SeqC GARFIELD THE VERY FAST CAT SeqD THE ---- FA-T CAT CORRECT (Score=24)

44 GARFIELD THE LAST FAT CAT GARFIELD THE VERY FAST CAT
GARFIELD THE FAST CAT GARFIELD THE VERY FAST CAT THE FAT CAT GARFIELD THE LAST FAT CAT GARFIELD THE FAST CAT --- GARFIELD THE LAST FA-T CAT GARFIELD THE FAST CA-T --- GARFIELD THE VERY FAST CAT THE ---- FA-T CAT GARFIELD THE VERY FAST CAT THE ---- FA-T CAT

45 Building the Right Multiple Sequence Alignment.

46 Recognizing The Right Sequences When you Meet Them…

47 Gathering Sequences: BLAST

48 Sequences Too Closely Related
Common Mistake: Sequences Too Closely Related PRVA_MACFU SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE PRVA_HUMAN SMTDLLNAEDIKKAVGAFSATDSFDHKKFFQMVGLKKKSADDVKKVFHMLDKDKSGFIEE PRVA_GERSP SMTDLLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKTPDDVKKVFHILDKDKSGFIEE PRVA_MOUSE SMTDVLSAEDIKKAIGAFAAADSFDHKKFFQMVGLKKKNPDEVKKVFHILDKDKSGFIEE PRVA_RAT SMTDLLSAEDIKKAIGAFTAADSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIEE PRVA_RABIT AMTELLNAEDIKKAIGAFAAAESFDHKKFFQMVGLKKKSTEDVKKVFHILDKDKSGFIEE :**::*.*******:***:* :****************..::******:*********** PRVA_MACFU DELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES PRVA_HUMAN DELGFILKGFSPDARDLSAKETKMLMAAGDKDGDGKIGVDEFSTLVAES PRVA_GERSP DELGFILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVSES PRVA_MOUSE DELGSILKGFSSDARDLSAKETKTLLAAGDKDGDGKIGVEEFSTLVAES PRVA_RAT DELGSILKGFSSDARDLSAKETKTLMAAGDKDGDGKIGVEEFSTLVAES PRVA_RABIT EELGFILKGFSPDARDLSVKETKTLMAAGDKDGDGKIGADEFSTLVSES :*** ******.******.**** *:************.:******:** -IDENTICAL SEQUENCES BRING NO INFORMATION FOR THE MULTIPLE SEQUENCE ALIGNMENT -MULTIPLE SEQUENCE ALIGNMENTS THRIVE ON DIVERSITY…

49 Sequence Weighting Within ClustalW

50 Selecting Diverse Sequences (Opus II)

51 Respect Information! -A better Spread of the Sequences is needed
PRVA_MACFU SMTDLLN----AEDIKKA PRVA_HUMAN SMTDLLN----AEDIKKA PRVA_GERSP SMTDLLS----AEDIKKA PRVA_MOUSE SMTDVLS----AEDIKKA PRVA_RAT SMTDLLS----AEDIKKA PRVA_RABIT AMTELLN----AEDIKKA TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : :*. .*:::: PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVA_HUMAN VGAFSATDS--FDHKKFFQMVG------LKKKSADDVKKVFHMLDKDKSGFIEEDELGFI PRVA_GERSP IGAFAAADS--FDHKKFFQMVG------LKKKTPDDVKKVFHILDKDKSGFIEEDELGFI PRVA_MOUSE IGAFAAADS--FDHKKFFQMVG------LKKKNPDEVKKVFHILDKDKSGFIEEDELGSI PRVA_RAT IGAFTAADS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGSI PRVA_RABIT IGAFAAAES--FDHKKFFQMVG------LKKKSTEDVKKVFHILDKDKSGFIEEEELGFI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM This Alignment Is not Informative about the relation Betwwen TPCC MOUSE and the rest of the sequences. -A better Spread of the Sequences is needed

52 Selecting Diverse Sequences (Opus II)

53 Selecting Diverse Sequences (Opus II)
PRVB_CYPCA -AFAGVLNDADIAAALEACKAADSFNHKAFFAKVGLTSKSADDVKKAFAIIDQDKSGFIE PRVB_BOACO -AFAGILSDADIAAGLQSCQAADSFSCKTFFAKSGLHSKSKDQLTKVFGVIDRDKSGYIE PRV1_SALSA MACAHLCKEADIKTALEACKAADTFSFKTFFHTIGFASKSADDVKKAFKVIDQDASGFIE PRVB_LATCH -AVAKLLAAADVTAALEGCKADDSFNHKVFFQKTGLAKKSNEELEAIFKILDQDKSGFIE PRVB_RANES -SITDIVSEKDIDAALESVKAAGSFNYKIFFQKVGLAGKSAADAKKVFEILDRDKSGFIE PRVA_MACFU -SMTDLLNAEDIKKAVGAFSAIDSFDHKKFFQMVGLKKKSADDVKKVFHILDKDKSGFIE PRVA_ESOLU --AKDLLKADDIKKALDAVKAEGSFNHKKFFALVGLKAMSANDVKKVFKAIDADASGFIE : *: .: . .* .:*. * ** *: * : * :* * **:** PRVB_CYPCA EDELKLFLQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA- PRVB_BOACO EDELKKFLQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG PRV1_SALSA VEELKLFLQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ- PRVB_LATCH DEELELFLQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA- PRVB_RANES QDELGLFLQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA- PRVA_MACFU EDELGFILKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES PRVA_ESOLU EEELKFVLKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA :** .*:.* .* *: ** :: .* **** **::** ** -A REASONABLE Model Now Exists. -Going Further:Remote Homologues.

54 Aligning Remote Homologues
PRVA_MACFU SMTDLLNA----EDIKKA PRVA_ESOLU AKDLLKA----DDIKKA PRVB_CYPCA AFAGVLND----ADIAAA PRVB_BOACO AFAGILSD----ADIAAG PRV1_SALSA MACAHLCKE----ADIKTA PRVB_LATCH AVAKLLAA----ADVTAA PRVB_RANES SITDIVSE----KDIDAA TPCS_RABIT -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAI TPCS_PIG -TDQQAEARSYLSEEMIAEFKAAFDMFDADGG-GDISVKELGTVMRMLGQTPTKEELDAI TPCC_MOUSE MDDIYKAAVEQLTEEQKNEFKAAFDIFVLGAEDGCISTKELGKVMRMLGQNPTPEELQEM : :: PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVA_ESOLU LDAVKAEGS--FNHKKFFALVG------LKAMSANDVKKVFKAIDADASGFIEEEELKFV PRVB_CYPCA LEACKAADS--FNHKAFFAKVG------LTSKSADDVKKAFAIIDQDKSGFIEEDELKLF PRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF PRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLF PRVB_LATCH LEGCKADDS--FNHKVFFQKTG------LAKKSNEELEAIFKILDQDKSGFIEDEELELF PRVB_RANES LESVKAAGS--FNYKIFFQKVG------LAGKSAADAKKVFEILDRDKSGFIEQDELGLF TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEI TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM : : *: * : * :* : .*:*: :** . PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES- PRVA_ESOLU LKSFAADGRDLTDAETKAFLKAADKDGDGKIGIDEFETLVHEA- PRVB_CYPCA LQNFKADARALTDGETKTFLKAGDSDGDGKIGVDEFTALVKA-- PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG- PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ-- PRVB_LATCH LQNFSAGARTLTKTETETFLKAGDSDGDGKIGVDEFQKLVKA-- PRVB_RANES LQNFRASARVLSDAETSAFLKAGDSDGDGKIGVEEFQALVKA-- TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE :: :: : :: .* :.** *. :** ::

55 Some Guidelines …

56 Do Not Use Too Many Sequences…

57 Reading Your Alignment

58

59 Going Further… PRVA_MACFU VGAFSAIDS--FDHKKFFQMVG------LKKKSADDVKKVFHILDKDKSGFIEEDELGFI PRVB_BOACO LQSCQAADS--FSCKTFFAKSG------LHSKSKDQLTKVFGVIDRDKSGYIEEDELKKF PRV1_SALSA LEACKAADT--FSFKTFFHTIG------FASKSADDVKKAFKVIDQDASGFIEVEELKLF TPCS_RABIT IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNADGYIDAEELAEI TPCS_PIG IEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAECFRIFDRNMDGYIDAEELAEI TPCC_MOUSE IDEVDEDGSGTVDFDEFLVMMVRCMKDDSKGKSEEELSDLFRMFDKNADGYIDLDELKMM TPC_PATYE SDEMDEEATGRLNCDAWIQLFER---KLKEDLDERELKEAFRVLDKEKKGVIKVDVLRWI . : :: : * :* : .* *. : * . PRVA_MACFU LKGFSPDARDLSAKETKTLMAAGDKDGDGKIGVDEFSTLVAES-- PRVB_BOACO LQNFDGKARDLTDKETAEFLKEGDTDGDGKIGVEEFVVLVTKG-- PRV1_SALSA LQNFCPKARELTDAETKAFLKAGDADGDGMIGIDEFAVLVKQ--- TPCS_RABIT FR---ASGEHVTDEEIESLMKDGDKNNDGRIDFDEFLKMMEGVQ- TPCS_PIG FR---ASGEHVTDEEIESIMKDGDKNNDGRIDFDEFLKMMEGVQ- TPCC_MOUSE LQ---ATGETITEDDIEELMKDGDKNNDGRIDYDEFLEFMKGVE- TPC_PATYE LS---SLGDELTEEEIENMIAETDTDGSGTVDYEEFKCLMMSSDA : :: : :: * :..* :. :** ::

60 WHAT MAKES A GOOD ALIGNMENT…
-THE MORE DIVERGEANT THE SEQUENCES, THE BETTER -THE FEWER INDELS, THE BETTER -NICE UNGAPPED BLOCKS SEPARATED WITH INDELS -DIFFERENT CLASSES OF RESIDUES WITHIN A BLOCK: Completely Conserved Conserved For Size and Hydropathy Conserved For Size or Hydropathy -THE ULTIMATE EVALUATION IS A MATTER OF PERSONNAL JUDGEMENT AND KNOWLEDGE.

61

62 Potential Difficulties

63 DO NOT OVERTUNE!!! chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . : DO NOT PLAY WITH PARAMETERS IF YOU KNOW THE ALIGNMENT YOU WANT: MAKE IT YOURSELF! chite ---ADKPKRPL-SAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAP-SAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPR-SAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. :*: .: : * . *: * chite AATAKQNYIRALQEYERNGG- wheat ANKLKGEYNKAIAAYNKGESA trybr AEKDKERYKREM mouse AKDDRIRYDNEMKSWEEQMAE * : .* . :

64 Substitution Matrices
TUNING or NOT TUNING!!! -PARAMETERS TO TUNE USUALLY INCLUDE: GOP/ GEP MATRIX SENSITIVITY Vs SPEED GOP GEP Substitution Matrices (Etzold and al. 1993) Gonnet % Blosum % Pam % -MOST METHODS ARE TUNED FOR WORKING WELL ON AVERAGE -PARAMETERS BEHAVIOUR DO NOT NECESSARILY FOLLOW THE THEORY (i.e. Substitution Matrices). -A GOOD ALIGNMENT IS USUALLY ROBUST(i.e. Changes little). -TUNE IF YOU WANT TO CONVINCE YOURSELF.

65

66 KEEP A BIOLOGICAL PERSPECTIVE
chite ---ADKPKRPLSAYMLWLNSARESIKRENPDFK-VTEVAKKGGELWRGLKD wheat --DPNKPKRAPSAFFVFMGEFREEFKQKNPKNKSVAAVGKAAGERWKSLSE trybr KKDSNAPKRAMTSFMFFSSDFRS----KHSDLS-IVEMSKAAGAAWKELGP mouse KPKRPRSAYNIYVSESFQ----EAKDDS-AQGKLKLVNEAWKNLSP ***. ::: .: : * . *: * DIFFERENT PARAMETERS chite AD--K----PKR-PLYMLWLNS-ARESIKRENPDFK-VT-EVAKKGGELWRGL- wheat -DPNK----PKRAP-FFVFMGE-FREEFKQKNPKNKSVA-AVGKAAGERWKSLS trybr -K--KDSNAPKR-AMT-MFFSSDFR-S-KH-S-DLS-IV-EMSKAAGAAWKELG mouse ----K----PKR-PRYNIYVSESFQEA-K--D-D-S-AQGKL-KLVNEAWKNLS * *** .:: ::... : * : * . *: * WRONG ALIGNMENT !!!

67 REPEATS THERE IS A PROBLEM WHEN TWO SEQUENCES DO NOT CONTAIN THE SAME NUMBER OF REPEATS IT IS THEN BETTER TO MANUALLY EXTRACT THE REPEATS AND TO ALIGN THEM. INDIVIDUAL REPEATS CAN BE RECOGNIZED USING DOTTER

68

69 Naming Your Sequences The Right Way

70 What Are The Available Methods ???

71 Simultaneous Alignments : MSA
1) Set Bounds on each pair of sequences (Carillo and Lipman) 2) Compute the Maln within the Hyperspace -Few Small Closely Related Sequence. -Memory and CPU hungry -Do Well When They Can Run.

72 Simultaneous Alignments : DCA
-Few Small Closely Related Sequence, but less limited than MSA -Do Well When Can Run. -Memory and CPU hungry, but less than MSA

73 Dialign II 1) Identify best chain of segments on each pair of sequence. Assign a Pvalue to each Segment Pair. 2) Ré-évaluate each segment pair according to its consistency with the others 3) Assemble the alignment according to the segment pairs.

74 Muscle

75 7.16.1 Progressive Iterative Methods -HMMs, HMMER, SAM, MUSCLE
-Slow, Sometimes Inaccurate -Good Profile Generators

76 MUSCLE Progressive

77 MUSCLE phylogenomics.berkeley.edu/cgi-bin/muscle/input_muscle.py Progressive

78 Fast Fourrier Transformé
MAFFT Fast Fourrier Transformé

79 Prank

80 Stachmo

81 Mixing Heterogenous Data With T-Coffee Multiple Sequence Alignment
Local Alignment Global Alignment Multiple Alignment Specialist Structural Multiple Sequence Alignment

82 Mixing Sequences and Structures with T-Coffee
Seq Vs Seq Local Global Seq Vs Struct Struct Vs Struct Thread Superpose Evaluation on Homestrad

83

84 What is The Best Method ?

85

86 A better Question… What is the Best Alignment ?
What is the best bit of my alignment ?

87 What is the Local Quality of my Alignment ?
II

88 Choosing the right method

89 Situation  Solution

90 Priority  Solution Method Accuracy Speed Priority Trees Profile
2D –Pred 3D-Pred Func-Pred Accuracy Speed

91 Purpose  Solution

92 Conclusion

93 Multiple Alignment -The BEST alignment Method: Your Brain
The Right Data -The Best Evaluation Procedure: Experimental Data (SwissProt) -Choosing The Sequences Well is Important -Beware of repeated elements

94 Multiple Alignment Know Your Problem: What do you want to do with your MSA

95 Addresses MAFFT Progressive/iterative www.biophys.kyoto-u.jp/katoh POA
Progressive/Simultaneous MUSCLE Progressive/Iterative


Download ppt "An Introduction to Multiple Sequence Alignments"

Similar presentations


Ads by Google