Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Measuring the degree of similarity: PAM and blosum Matrix
Sequence Alignment.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Introduction to Bioinformatics
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Similarity Searching Class 4 March 2010.
Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al.
Heuristic alignment algorithms and cost matrices
Sequence analysis course
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Sequence Analysis Tools
Sequence similarity.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Introduction to Bioinformatics Algorithms Sequence Alignment.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Multiple Sequence Alignments
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Basics of Sequence Alignment and Weight Matrices and DOT Plot
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Page 1 march 2003 Pairwise sequence alignments Volker Flegel.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
An Introduction to Multiple Sequence Alignments Cédric Notredame.
An Introduction to Bioinformatics
. Sequence Alignment and Database Searching 2 Biological Motivation u Inference of Homology  Two genes are homologous if they share a common evolutionary.
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Bioinformática 2007-I Prof. Mirko Zimic Lunes -Alineamiento simple de secuencias (pairwise alignment). - Alineamiento local y global. - Matrices de ‘score’
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Page 1 August 2006 Pairwise sequence alignments Etienne de Villiers Adapted with permission of Swiss EMBnet node and SIB.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Cédric Notredame (08/12/2015) Molecular Evolution Cédric Notredame.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique CN+LF An introduction to multiple alignments © Cédric Notredame.
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF Multiple alignments, PATTERNS, PSI-BLAST.
Construction of Substitution matrices
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Day 7 Carlow Bioinformatics Aligning sequences. What is an alignment? CENTRAL concept in bioinformatics Easy if straight-forward, similar seqs –THISTHESAME.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Introduction to sequence alignment Mike Hallett (David Walsh)
Comparing Two Protein Sequences
Pairwise Sequence Alignment
Pairwise Alignment Global & local alignment
Comparing Two Protein Sequences
Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment
Presentation transcript:

Cédric Notredame (22/02/2016) Comparing Two Protein Sequences Cédric Notredame

Cédric Notredame (22/02/2016) Our Scope Pairwise Alignment methods are POWERFUL Pairwise Alignment methods are LIMITED If You Understand the LIMITS they Become VERY POWERFUL Look once Under the Hood

Cédric Notredame (22/02/2016) Outline -WHY Does It Make Sense To Compare Sequences -HOW Can we Align Two Sequences ? -HOW can I Search a Database ? -HOW Can we Compare Two Sequences ?

Cédric Notredame (22/02/2016) Why Does It Make Sense To Compare Sequences ? Sequence Evolution

Cédric Notredame (22/02/2016) Why Do We Want To Compare Sequences wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA EXTRAPOLATE ?????? Homology? SwissProt

Cédric Notredame (22/02/2016) Why Do We Want To Compare Sequences

Cédric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? -Evolution is our Real Tool. -Nature is LAZY and Keeps re-using Stuff. -Evolution is mostly DIVERGEANT Same Sequence  Same Ancestor

Cédric Notredame (22/02/2016) Why Does It Make Sense To Align Sequences ? Same Sequence Same Function Same 3D Fold Same Origin Many Counter-examples!

Cédric Notredame (22/02/2016) Comparing Is Reconstructing Evolution

Cédric Notredame (22/02/2016) An Alignment is a STORY ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection

Cédric Notredame (22/02/2016) An Alignment is a STORY ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation InsertionDeletion ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection

Cédric Notredame (22/02/2016) Evolution is NOT Always Divergent… AFGP with (ThrAlaAla)n Similar To Trypsynogen N AFGP with (ThrAlaAla)n S Chen et al, 97, PNAS, 94, NOT Similar to Trypsinogen

Cédric Notredame (22/02/2016) Evolution is NOT Always Divergent AFGP with (ThrAlaAla)n Similar To Trypsynogen AFGP with (ThrAlaAla)n NOT Similar to Trypsinogen N S SIMILAR Sequences BUT DIFFERENT origin

Cédric Notredame (22/02/2016) Evolution is NOT always Divergent… But in MOST cases, you may assume it is… Same Sequence Same Function Same 3D Fold Same Origin Similar Function DOES NOT REQUIRE Similar Sequence  Historical Legacy

Cédric Notredame (22/02/2016) How Do Sequences Evolve Each Portion of a Genome has its own Agenda.

Cédric Notredame (22/02/2016) How Do Sequences Evolve ? CONSTRAINED Genome Positions Evolve SLOWLY EVERY Protein Family Has its Own Level Of Constraint FamilyK S K A Histone36.40 Insulin Interleukin I  Globin Apolipoprot. AI Interferon G Rates in Substitutions/site/Billion Years as measured on Mouse Vs Human (80 Million years) Ks Synonymous Mutations, Ka Non-Neutral.

Cédric Notredame (22/02/2016) Different molecular clocks for different proteins--another prediction

Cédric Notredame (22/02/2016) G C L I V A F Aliphatic Aromatic Hydrophobic C How Do Sequences Evolve ? The amino Acids Venn Diagram To Make Things Worse, Every Residue has its Own Personality S T W Y Q H K R E DN Polar P G Small C

Cédric Notredame (22/02/2016) How Do Sequences Evolve ? In a structure, each Amino Acid plays a Special Role OmpR, Cter Domain In the core, SIZE MATTERS On the surface, CHARGE MATTERS - - +

Cédric Notredame (22/02/2016) How Do Sequences Evolve ? Accepted Mutations Depend on the Structure Big -> Big Small->Small NO DELETION Charged -> Charged Small Big or Small DELETIONS

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Substitution Matrices

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Two Sequences, We need: Their FunctionTheir Structure We Do Not Have Them !!!

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? We will Need To Replace Structural Information With Sequence Information. Same Sequence Same Function Same 3D Fold Same Origin It CANNOT Work ALL THE TIME !!!

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? To Compare Sequences, We need to Compare Residues We Need to Know How Much it COSTS to SUBSTITUTE an Alanine into an Isoleucine a Tryptophan into a Glycine … The table that contains the costs for all the possible substitutions is called the SUBSTITUTION MATRIX How to derive that matrix?

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? G C L I V A F Aliphatic Aromatic Hydrophobic C S T W Y Q H K R E DN Polar P G Small C Using Knowledge Could Work But we do not know enough about Evolution and Structure. Using Data works better.

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log

Cédric Notredame (22/02/2016) You ’ re kidding! … I was struck by a lightning twice too!! Garry Larson, The Far Side

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix -Take 100 nice pairs of Protein Sequences, easy to align (80% identical). -Align them… -Count each mutations in the alignments -25 Tryptophans into phenylalanine -30 Isoleucine into Leucine … -For each mutation, set the substitution score to the log odd ratio: Expected by chance Observed Log

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix The Diagonal Indicates How Conserved a residue tends to be. W is VERY Conserved Some Residues are Easier To mutate into other similar Cysteins that make disulfide bridges and those that do not get averaged

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Making a Substitution Matrix

Cédric Notredame (22/02/2016)

How Can We Compare Sequences ? Using Substitution Matrix ADKPRRP---LS-YMLWLN ADKPKRPKPRLSAYMLWLN Mutation Insertion Deletion Given two Sequences and a substitution Matrix, We must Compute the CHEAPEST Alignment

Cédric Notredame (22/02/2016) Most popular Subsitution Matrices PAM250 Blosum62 (Most widely used) Raw Score TPEA ¦| | APGA TPEA ¦| | APGA Score = 1= 9 Question: Is it possible to get such a good alignment by chance only? Scoring an Alignment

Cédric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices They ignore non-local interactions and Assume that identical residues are equal They assume evolution rate to be constant ADKPKRPLSAYMLWLN ADKPKRPKPRLSAYMLWLN ADKPRRPLS-YMLWLN ADKPKRPLSAYMLWLN Mutations + Selection

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices Substitution Matrices Cannot Work !!!

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Limits of the substitution Matrices I know… But at least, could I get some idea of when they are likely to do all right

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Length %Sequence Identity 100 Same 3D Fold Twilight Zone Similar Sequence Similar Structure 30% Different Sequence Structure ???? 30

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? The Twilight Zone Substitution Matrices Work Reasonably Well on Sequences that have more than 30 % identity over more than 100 residues

Cédric Notredame (22/02/2016)

How Can We Compare Sequences ? Which Matrix Shall I used The Initial PAM matrix was computed on 80% similar Proteins It been extrapolated to more distantly related sequences. Pam 250 Pam 350 Other Matrices Exist: BLOSUM 42 BLOSUM 62

Cédric Notredame (22/02/2016) How Can We Compare Sequences ? Which Matrix Shall I use PAM: Distant Proteins  High Index (PAM 350) BLOSUM: Distant Proteins  Low Index (Blosum30) GONNET 250> BLOSUM62>PAM 250. But This will depend on: The Family. The Program Used and Its Tuning. Choosing The Right Matrix may be Tricky… Insertions, Deletions?

Cédric Notredame (22/02/2016) Dot Matrices Global Alignments Local Alignment HOW Can we Align Two Sequences ?

Cédric Notredame (22/02/2016)

Dot Matrices QUESTION What are the elements shared by two sequences ?

Cédric Notredame (22/02/2016) Dot Matrices >Seq1 THEFATCAT >Seq2 THELASTCAT THEFATCAT T H E F A S T C A T Window Stringency

Cédric Notredame (22/02/2016) Dot Matrices Sequences Window size Stringency

Cédric Notredame (22/02/2016) Dot Matrices Strigency Window=1 Stringency=1 Window=11 Stringency=7 Window=25 Stringency=15

Cédric Notredame (22/02/2016) Dot Matrices x y x y x

Cédric Notredame (22/02/2016) Dot Matrices

Cédric Notredame (22/02/2016) Dot Matrices

Cédric Notredame (22/02/2016) Dot Matrices

Cédric Notredame (22/02/2016) Dot Matrices

Cédric Notredame (22/02/2016) Dot Matrices Limits -Visual aid -Best Way to EXPLORE the Sequence Organisation -Does NOT provide us with an ALIGNMENT wheat --DPNKPKRAMTSFVFFMSEFRSEFKQKHSKLKSIVEMVKAAGER | | |||||||| || | ||| ||| | |||| |||| ????? KKDSNAPKRAMTSFMFFSSDFRS----KHSDL-SIVEMSKAAGAA

Cédric Notredame (22/02/2016) Cost L Afine Gap Penalty Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) GOP GEP GOP Parsimony: Evolution takes the simplest path (So We Think…)

Cédric Notredame (22/02/2016) Insertions and Deletions Gap Penalties Opening a gap is more expensive than extending it Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT Seq AGARFIELDTHE----CAT ||||||||||| ||| Seq BGARFIELDTHELASTCAT gap Gap Opening Penalty Gap Extension Penalty

Cédric Notredame (22/02/2016) Global Alignments -Take 2 Nice Protein Sequences -A good Substitution Matrix (blosum) -A Gap opening Penalty (GOP) -A Gap extension Penalty (GEP) >Seq1 THEFATCAT >Seq2 THEFASTCAT -DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING THEFA-TCAT THEFASTCAT

Cédric Notredame (22/02/2016) Global Alignments F A S T F A T ----FAT FAST--- (L1+l2)! (L1)!*(L2)! ---FAT- FAST--- --F-AT- FAST--- Brute Force Enumeration 2 () DYNAMIC PROGRAMMING

Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Match=1MisMatch=-1Gap=-1 F A T FAST Dynamic Programming (Needlman and Wunsch) F A T FAST F A T FAST FAST FA-T

Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties GOP GEP

Cédric Notredame (22/02/2016) Global Alignments DYNAMIC PROGRAMMING Global Alignments are very sensitive to gap Penalties Global Alignments do not take into account the MODULAR nature of Proteins C: K vitamin dep. Ca Binding K: Kringle Domain G: Growth Factor module F: Finger Module

Cédric Notredame (22/02/2016) Local Alignments GLOBAL AlignmentLOCAL Alignment Smith And Waterman (SW)=LOCAL Alignment

Cédric Notredame (22/02/2016) Local Alignments We now have a PairWise Comparison Algorithm, We are ready to search Databases

Cédric Notredame (22/02/2016) Database Search 1.10e e e e e QUERRY Comparison Engine Database E-values How many time do we expect such an Alignment by chance? SW Q

Cédric Notredame (22/02/2016)

CONCLUSION

Cédric Notredame (22/02/2016) -There is a relation between Sequence and Structure. The Easiest way to Compare Two Sequences is a dotplot. Sequence Comparison -Thanks to evolution, We CAN compare Sequences -Substitution matrices only work well with similar Sequences (More than 30% id).

Cédric Notredame (22/02/2016) A few Addresses

Cédric Notredame (22/02/2016)