Download presentation
Presentation is loading. Please wait.
1
Comparing Two Protein Sequences
Cédric Notredame
2
Our Scope If You Understand the LIMITS they Become VERY POWERFUL
Look once Under the Hood Pairwise Alignment methods are POWERFUL Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL
3
Outline -WHY Does It Make Sense To Compare Sequences
-HOW Can we Compare Two Sequences ? -HOW Can we Align Two Sequences ? -HOW can I Search a Database ?
4
Why Does It Make Sense To Compare Sequences ?
Sequence Evolution
5
Why Do We Want To Compare Sequences
wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE ?????? Homology? SwissProt
6
Why Do We Want To Compare Sequences
7
Why Does It Make Sense To Align Sequences ?
-Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence Same Ancestor
8
Why Does It Make Sense To Align Sequences ?
Same Sequence Same Function Same Origin Same 3D Fold Many Counter-examples!
9
Comparing Is Reconstructing Evolution
10
An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutations + Selection Mutations, deletions are the engines of evolution, but selection does the steering… As shown here it is often impossible to tell appart insertions and deletions, hence their generic name: indels. Next: Homology
11
An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN Mutations + Selection Mutations, deletions are the engines of evolution, but selection does the steering… As shown here it is often impossible to tell appart insertions and deletions, hence their generic name: indels. Next: Homology Deletion Insertion ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation
12
Evolution is NOT Always Divergent…
Chen et al, 97, PNAS, 94, AFGP with (ThrAlaAla)n Similar To Trypsynogen N AFGP with (ThrAlaAla)n S NOT Similar to Trypsinogen
13
Evolution is NOT Always Divergent
AFGP with (ThrAlaAla)n Similar To Trypsynogen NOT Similar to Trypsinogen N S SIMILAR Sequences BUT DIFFERENT origin
14
Evolution is NOT always Divergent…
But in MOST cases, you may assume it is… Similar Function DOES NOT REQUIRE Similar Sequence Same Sequence Function 3D Fold Origin Similar Sequence Historical Legacy
15
How Do Sequences Evolve
Each Portion of a Genome has its own Agenda.
16
How Do Sequences Evolve ?
CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint Family KS KA Histone Insulin Interleukin I a-Globin Apolipoprot. AI Interferon G Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years) Ks Synonymous Mutations, Ka Non-Neutral.
17
Different molecular clocks for different proteins--another prediction
The Neutral Theory also makes another prediction about molecular clocks--namely that different types of proteins will have different clock rates? In particular, proteins whose structures are such that a small change in the amino acid sequence can impair the function of that protein, should evolve at the slowest rates, whereas proteins whose amino acid sequences can be modified fairly dramatically WITHOUT impairing function, should evolve at the fastest rates? Do we, in fact, see evidence of this? Yes. Consider the fibrinopeptide class of protein. These proteins are involved in blood clotting. They can perform this function even when there are numerous amino acid changes. They evolve at a relatively rapid rate, as the slode of the line relating aa substitutions to time shows (slide). On the other hand, cytochrome c, a protein involved in respiration metabolism, cannot tolerate many changes to its aa sequence without losing function. As the slide shows, it evolves (“its clock ticks at”) a much slower rate.
18
How Do Sequences Evolve ? The amino Acids Venn Diagram
To Make Things Worse, Every Residue has its Own Personality G C L I V A F Aliphatic Aromatic Hydrophobic P G Small C S T W Y Q H K R E D N Polar
19
How Do Sequences Evolve ?
In a structure, each Amino Acid plays a Special Role OmpR, Cter Domain In the core, SIZE MATTERS On the surface, CHARGE MATTERS - +
20
How Do Sequences Evolve ?
Accepted Mutations Depend on the Structure Big -> Big Small ->Small NO DELETION + - - Charged -> Charged Small <-> Big or Small DELETIONS
21
How Can We Compare Sequences ?
Substitution Matrices
22
How Can We Compare Sequences ?
To Compare Two Sequences, We need: We Do Not Have Them !!! Their Structure Their Function
23
How Can We Compare Sequences ?
We will Need To Replace Structural Information With Sequence Information. Same Sequence Same Origin Same Function Same 3D Fold It CANNOT Work ALL THE TIME !!!
24
How Can We Compare Sequences ?
To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine … The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX How to derive that matrix?
25
How Can We Compare Sequences ?
G C L I V A F Aliphatic Aromatic Hydrophobic S T W Y Q H K R E D N Polar P Small Using Knowledge Could Work But we do not know enough about Evolution and Structure. Using Data works better.
26
How Can We Compare Sequences ? Making a Substitution Matrix
-Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log
27
You’re kidding! … I was struck by a lightning twice too!!
Garry Larson, The Far Side
28
How Can We Compare Sequences ? Making a Substitution Matrix
The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Cysteins that make disulfide bridges and those that do not get averaged Some Residues are Easier To mutate into other similar
29
How Can We Compare Sequences ? Making a Substitution Matrix
31
How Can We Compare Sequences ? Using Substitution Matrix
ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Insertion Deletion Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment
32
Scoring an Alignment TPEA ¦| | APGA Most popular Subsitution Matrices
PAM250 Blosum62 (Most widely used) Raw Score TPEA ¦| | APGA Score = = 9 Question: Is it possible to get such a good alignment by chance only? 1 + 6 + + 2
33
Insertions and Deletions
Gap Penalties Opening a gap is more expensive than extending it Gap Opening Penalty Gap Extension Penalty gap Seq A GARFIELDTHE----CAT ||||||||||| ||| Seq B GARFIELDTHELASTCAT
34
How Can We Compare Sequences ? Limits of the substitution Matrices
They ignore non-local interactions and Assume that identical residues are equal They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN Mutations + Selection
35
How Can We Compare Sequences ? Limits of the substitution Matrices
Substitution Matrices Cannot Work !!!
36
How Can We Compare Sequences ? Limits of the substitution Matrices
I know… But at least, could I get some idea of when they are likely to do all right
37
How Can We Compare Sequences ?
The Twilight Zone %Sequence Identity Similar Sequence Similar Structure 30% Different Sequence Structure ???? Same 3D Fold 30 Twilight Zone Length 100
38
How Can We Compare Sequences ?
The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues
43
How Can We Compare Sequences ? Which Matrix Shall I used
The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350 Other Matrices Exist: BLOSUM 42 BLOSUM 62
44
How Can We Compare Sequences ? Which Matrix Shall I use
PAM: Distant Proteins High Index (PAM 350) BLOSUM: Distant Proteins Low Index (Blosum30) GONNET 250> BLOSUM62>PAM 250. But This will depend on: The Family. The Program Used and Its Tuning. Choosing The Right Matrix may be Tricky… Insertions, Deletions?
45
HOW Can we Align Two Sequences ?
Dot Matrices Global Alignments Local Alignment
47
Dot Matrices QUESTION What are the elements shared by two sequences ?
48
Dot Matrices >Seq1 THEFATCAT >Seq2 THELASTCAT Window Stringency
49
Dot Matrices Sequences Window size Stringency
50
Dot Matrices Strigency Window=1 Stringency=1 Window=11 Stringency=7
51
Dot Matrices x y x y x
52
Dot Matrices
53
Dot Matrices
54
Dot Matrices
55
Dot Matrices
56
Dot Matrices Limits -Visual aid
-Best Way to EXPLORE the Sequence Organisation -Does NOT provide us with an ALIGNMENT wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA
57
Parsimony: Evolution takes the simplest path (So We Think…)
Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) Cost L Afine Gap Penalty GOP GEP GOP GOP Parsimony: Evolution takes the simplest path (So We Think…)
58
Insertions and Deletions
Gap Penalties Opening a gap is more expensive than extending it Gap Opening Penalty Gap Extension Penalty gap Seq A GARFIELDTHE----CAT ||||||||||| ||| Seq B GARFIELDTHELASTCAT
59
Global Alignments >Seq1 THEFATCAT >Seq2 THEFASTCAT THEFA-TCAT
-Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) -DYNAMIC PROGRAMMING >Seq1 THEFATCAT >Seq2 THEFASTCAT DYNAMIC PROGRAMMING THEFA-TCAT THEFASTCAT
60
( ) Global Alignments Brute Force Enumeration 2 F A S T F A T (L1+l2)!
DYNAMIC PROGRAMMING Brute Force Enumeration 2 ----FAT FAST--- F A S T ( ) (L1+l2)! ---FAT- FAST--- F A T (L1)!*(L2)! --F-AT- FAST---
61
Global Alignments Dynamic Programming (Needlman and Wunsch) F A S T F
Match=1 MisMatch=-1 Gap=-1 F A S T F A S T F A S T -1 -2 -3 -4 -1 -2 -3 -4 -1 -2 -3 -4 F F F -1 1 -1 1 -1 1 A A A -2 2 -2 2 1 2 1 T T T -3 -3 -1 -1 1 2 2 F A S T F A - T
62
Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties GOP GEP
63
Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties Global Alignments do not take into account the MODULAR nature of Proteins C: K vitamin dep. Ca Binding K: Kringle Domain G: Growth Factor module F: Finger Module
64
Local Alignments LOCAL Alignment GLOBAL Alignment Smith And Waterman (SW)=LOCAL Alignment
65
Local Alignments We now have a PairWise Comparison Algorithm, We are ready to search Databases
66
Database Search Q QUERRY Comparison Engine Database E-values
How many time do we expect such an Alignment by chance? Database SW Q 1.10e-20 10 1.10e-100 1.10e-2 1.10e-1 3 1 6 20 15 13
68
CONCLUSION
69
Sequence Comparison -Thanks to evolution, We CAN compare Sequences
-There is a relation between Sequence and Structure. -Substitution matrices only work well with similar Sequences (More than 30% id). The Easiest way to Compare Two Sequences is a dotplot.
70
A few Addresses
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.