Download presentation
Presentation is loading. Please wait.
Published byHollie Tyler Modified over 9 years ago
1
Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame
2
Cédric Notredame (22/02/2016) Our Scope Pairwise Alignment methods are POWERFUL Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL Look once Under the Hood
3
Cédric Notredame (22/02/2016) Outline -WHY Does It Make Sense To Compare Sequences -HOW Can we Align Two Sequences ? -HOW can I Search a Database ? -HOW Can we Compare Two Sequences ?
4
Cédric Notredame (22/02/2016) Why Does It Make Sense To Compare Sequences ? Sequence Evolution
5
Cédric Notredame (22/02/2016) Why Do We Want To Compare Sequences wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE ?????? Homology? SwissProt
6
Cédric Notredame (22/02/2016) Why Do We Want To Compare Sequences
7
Cédric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? -Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence Same Ancestor
8
Cédric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? Same Sequence Same Function Same 3D Fold Same Origin Many Counter-examples!
9
Cédric Notredame (22/02/2016) Comparing Is Reconstructing Evolution
10
Cédric Notredame (22/02/2016) An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection
11
Cédric Notredame (22/02/2016) An Alignment is a STORY ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation InsertionDeletion ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection
12
Cédric Notredame (22/02/2016) Evolution is NOT Always Divergent… AFGP with (ThrAlaAla)n Similar To Trypsynogen N AFGP with (ThrAlaAla)n S Chen et al, 97, PNAS, 94, 3811-16 NOT Similar to Trypsinogen
13
Cédric Notredame (22/02/2016) Evolution is NOT Always Divergent AFGP with (ThrAlaAla)n Similar To Trypsynogen AFGP with (ThrAlaAla)n NOT Similar to Trypsinogen N S SIMILAR Sequences BUT DIFFERENT origin
14
Cédric Notredame (22/02/2016) Evolution is NOT always Divergent… But in MOST cases, you may assume it is… Same Sequence Same Function Same 3D Fold Same Origin Similar Function DOES NOT REQUIRE Similar Sequence Historical Legacy
15
Cédric Notredame (22/02/2016) How Do Sequences Evolve Each Portion of a Genome has its own Agenda.
16
Cédric Notredame (22/02/2016) How Do Sequences Evolve ? CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint FamilyK S K A Histone36.40 Insulin4.00.1 Interleukin I4.61.4 Globin5.10.6 Apolipoprot. AI4.51.6 Interferon G8.62.8 Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years) Ks Synonymous Mutations, Ka Non-Neutral.
17
Cédric Notredame (22/02/2016) Different molecular clocks for different proteins--another prediction
18
Cédric Notredame (22/02/2016) G C L I V A F Aliphatic Aromatic Hydrophobic C How Do Sequences Evolve ? The amino Acids Venn Diagram To Make Things Worse, Every Residue has its Own Personality S T W Y Q H K R E DN Polar P G Small C
19
Cédric Notredame (22/02/2016) How Do Sequences Evolve ? In a structure, each Amino Acid plays a Special Role OmpR, Cter Domain In the core, SIZE MATTERS On the surface, CHARGE MATTERS - - +
20
Cédric Notredame (22/02/2016) How Do Sequences Evolve ? Accepted Mutations Depend on the Structure Big -> Big Small->Small NO DELETION - - + Charged -> Charged Small Big or Small DELETIONS
21
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Substitution Matrices
22
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Two Sequences, We need: Their FunctionTheir Structure We Do Not Have Them !!!
23
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? We will Need To Replace Structural Information With Sequence Information. Same Sequence Same Function Same 3D Fold Same Origin It CANNOT Work ALL THE TIME !!!
24
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine … The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX How to derive that matrix?
25
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? G C L I V A F Aliphatic Aromatic Hydrophobic C S T W Y Q H K R E DN Polar P G Small C Using Knowledge Could Work But we do not know enough about Evolution and Structure. Using Data works better.
26
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log
27
Cédric Notredame (22/02/2016) You ’ re kidding! … I was struck by a lightning twice too!! Garry Larson, The Far Side
28
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log
29
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Some Residues are Easier To mutate into other similar Cysteins that make disulfide bridges and those that do not get averaged
30
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix
31
Cédric Notredame (22/02/2016)
32
How Can We Compare Sequences ? Using Substitution Matrix ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Insertion Deletion Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment
33
Cédric Notredame (22/02/2016) Most popular Subsitution Matrices PAM250 Blosum62 (Most widely used) Raw Score TPEA ¦| | APGA TPEA ¦| | APGA Score = 1= 9 Question: Is it possible to get such a good alignment by chance only? +6+0+2 Scoring an Alignment
34
Cédric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty
35
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices They ignore non-local interactions and Assume that identical residues are equal They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection
36
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices Substitution Matrices Cannot Work !!!
37
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices I know… But at least, could I get some idea of when they are likely to do all right
38
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Length %Sequence Identity 100 Same 3D Fold Twilight Zone Similar Sequence Similar Structure 30% Different Sequence Structure ???? 30
39
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues
40
Cédric Notredame (22/02/2016)
44
How Can We Compare Sequences ? Which Matrix Shall I used The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350 Other Matrices Exist: BLOSUM 42 BLOSUM 62
45
Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Which Matrix Shall I use PAM: Distant Proteins High Index (PAM 350) BLOSUM: Distant Proteins Low Index (Blosum30) GONNET 250> BLOSUM62>PAM 250. But This will depend on: The Family. The Program Used and Its Tuning. Choosing The Right Matrix may be Tricky… Insertions, Deletions?
46
Cédric Notredame (22/02/2016) Dot Matrices Global Alignments Local Alignment HOW Can we Align Two Sequences ?
47
Cédric Notredame (22/02/2016)
48
Dot Matrices QUESTION What are the elements shared by two sequences ?
49
Cédric Notredame (22/02/2016) Dot Matrices >Seq1 THEFATCAT >Seq2 THELASTCAT THEFATCAT T H E F A S T C A T Window Stringency
50
Cédric Notredame (22/02/2016) Dot Matrices Sequences Window size Stringency
51
Cédric Notredame (22/02/2016) Dot Matrices Strigency Window=1 Stringency=1 Window=11 Stringency=7 Window=25 Stringency=15
52
Cédric Notredame (22/02/2016) Dot Matrices x y x y x
53
Cédric Notredame (22/02/2016) Dot Matrices http://myhits.isb-sib.ch/cgi-bin/dotlet
54
Cédric Notredame (22/02/2016) Dot Matrices
55
Cédric Notredame (22/02/2016) Dot Matrices
56
Cédric Notredame (22/02/2016) Dot Matrices
57
Cédric Notredame (22/02/2016) Dot Matrices Limits -Visual aid -Best Way to EXPLORE the Sequence Organisation -Does NOT provide us with an ALIGNMENT wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA
58
Cédric Notredame (22/02/2016) Cost L Afine Gap Penalty Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) GOP GEP GOP Parsimony: Evolution takes the simplest path (So We Think…)
59
Cédric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty
60
Cédric Notredame (22/02/2016) Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) >Seq1 THEFATCAT >Seq2 THEFASTCAT -DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING THEFA-TCAT THEFASTCAT
61
Cédric Notredame (22/02/2016) Global Alignments F A S T F A T ----FAT FAST--- (L1+l2)! (L1)!*(L2)! ---FAT- FAST--- --F-AT- FAST--- Brute Force Enumeration 2 () DYNAMIC PROGRAMMING
62
Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Match=1MisMatch=-1Gap=-1 F A T FAST 1 -2 -3 0 -2-3-4 2 0 0 Dynamic Programming (Needlman and Wunsch) F A T FAST 1 -2 -3 0 -2-3-4 2 0 0 0 0 2 1 1 F A T FAST 1 -2-3-4 2 0 2 1 FAST FA-T
63
Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties GOP GEP
64
Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties Global Alignments do not take into account the MODULAR nature of Proteins C: K vitamin dep. Ca Binding K: Kringle Domain G: Growth Factor module F: Finger Module
65
Cédric Notredame (22/02/2016) Local Alignments GLOBAL AlignmentLOCAL Alignment Smith And Waterman (SW)=LOCAL Alignment
66
Cédric Notredame (22/02/2016) Local Alignments We now have a PairWise Comparison Algorithm, We are ready to search Databases
67
Cédric Notredame (22/02/2016) Database Search 1.10e-20 10 1.10e-100 1.10e-2 1.10e-1 10 3 1 3 6 1.10e-2 1 20 15 13 QUERRY Comparison Engine Database E-values How many time do we expect such an Alignment by chance? SW Q
68
Cédric Notredame (22/02/2016)
69
CONCLUSION
70
Cédric Notredame (22/02/2016) -There is a relation between Sequence and Structure. The Easiest way to Compare Two Sequences is a dotplot. Sequence Comparison -Thanks to evolution, We CAN compare Sequences -Substitution matrices only work well with similar Sequences (More than 30% id).
71
Cédric Notredame (22/02/2016) A few Addresses
72
Cédric Notredame (22/02/2016)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.