Download presentation
Presentation is loading. Please wait.
Published byLucas Sutton Modified over 9 years ago
1
EBI is an Outstation of the European Molecular Biology Laboratory. MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics Services Group PANDA group European Bioinformatics Institute Hinxton, Cambridge United Kingdom
2
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Overview … Search engines: peptide identification Protein inference De novo and spectral searches Choosing the right protein sequence DB You need to learn many things…
3
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk It should not be a black box… From: Lilley et al., Proteomics, 2011
4
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: Shot-gun/bottom-up approaches MS analysis MS/MS analysis fragmentation PROTOCOLPROTOCOL peptides proteins sequence database
5
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk PMF IDENTIFICATION
6
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Peptide Mass Fingerprinting (MS) MS analysis Peptide Mass Fingerprinting (PMF) MW - Each peak in the spectrum represents a peptide (or mixture of peptides) - Information about the Mass and Charge Not very used at present except for Gel Based approaches (in this case the Molecular Weight of the protein is known)
7
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Peptide Mass Fingerprinting (MS) in the web Aldente (Phenyx): http://www.expasy.org/tools/aldente/http://www.expasy.org/tools/aldente/ ASCQ_ME: https://www.genopole-lille.fr/logiciel/ascq_me/https://www.genopole-lille.fr/logiciel/ascq_me/ Bupid: http://zlab.bu.edu/Amemee/http://zlab.bu.edu/Amemee/ Mascot: http://www.matrixscience.com/search_form_select.htmlhttp://www.matrixscience.com/search_form_select.html MassSearch: http://www.cbrg.ethz.ch/services/MassSearchhttp://www.cbrg.ethz.ch/services/MassSearch MS-Fit (Protein Prospector): http://prospector.ucsf.edu/prospector/mshome.htm PepMAPPER: http://www.nwsr.manchester.ac.uk/mapper/http://www.nwsr.manchester.ac.uk/mapper/ Profound (Prowl): http://prowl.rockefeller.edu/prowl-cgi/profound.exehttp://prowl.rockefeller.edu/prowl-cgi/profound.exe XProteo: http://xproteo.com:2698/http://xproteo.com:2698/
8
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS/MS IDENTIFICATION
9
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS analysis Peptide Mass Fingerprinting (PMF) MS/MS MS/MS analysis Peptide sequence information (on top of Mass and Charge) Fragmentation
10
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein database based comparison Sequential comparison: de novo approaches Spectral comparison databasesequence theoretical spectrum experimental spectrum compare databasesequence experimental spectrum compare de novo sequence Spectral library experimental spectrum experimental spectrum compare Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007 Three types of MS/MS identification
11
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs MS/MS spectra proteins
12
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs MS/MS spectra proteins
13
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs proteins MS/MS spectra peptides Search engine sequence database UniProt IPI RefSeq TDMDNQIVVSDYAQ MDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL
14
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk SEARCH ENGINES
15
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines Sequence database matching Experimental Spectra Proteins Peptides Spectra TDMDNQIVVSDYAQMDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL VDMSLAQR DIVVQETMEDIDK … Theoretical Spectra UniProt IPI RefSeq sequence database
16
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines How good is the correlation? -Scores are generated by search engines -Usually the best match is kept Theoretical Spectra Experimental Spectra
17
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines Taken from Nesvizhskii, J Proteomics, 2010
18
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines Taken from Nesvizhskii, J Proteomics, 2010
19
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MASCOT (Matrix Science) http://www.matrixscience.com SEQUEST (Scripps, Thermo Fisher Scientific) http://fields.scripps.edu/sequest X!Tandem (The Global Proteome Machine Organization) http://www.thegpm.org/TANDEM OMSSA (NCBI) http://pubchem.ncbi.nlm.nih.gov/omssa/ The most popular algorithms
20
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Incorrect identifications Correct identifications False positivesFalse negatives Threshold score Adapted from: www.proteomesoftware.com – Wiki pages Overall concept of scores and cut-offs
21
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk false positives identifications higher stringency Playing with probabilistic cut-off scores
22
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Very well established search engine Can be used for MS/MS (PFF) identifications Based on a cross-correlation score (includes experimental peak height) Published core algorithm (patented, licensed to Thermo Fisher Scientific) Provides preliminary (Sp) score, rank, cross-correlation score (XCorr), and score difference between the top tow ranks (deltaCn, Cn) Thresholding is up to the user, and is commonly done per charge state Many extensions exist to perform a more automatic validation of results SEQUEST XCorr = deltaCn=
23
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: Sequest The XCorr is high if the direct comparison is significantly greater than the background It measures how good the XCorr is relative to the next best match.
24
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Very well established search engine Can do MS (PMF) and MS/MS (PFF) identifications Based on the MOWSE score Unpublished core algorithm (trade secret) Predicts an a priori threshold score that identifications need to pass From version 2.2, Mascot allows integrated decoy searches Provides rank, score, threshold and expectation value per identification Customizable confidence level for the threshold score Search engines: Mascot
25
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: Mascot www.matrixscience.com
26
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: X!Tandem Open source search engine Can be used for MS/MS experiments Based on a hyperscore, than only takes into account b and y ions. Published core algorithm and it is freely available Fast and able to handle PTMs in an iterative fashion Used as an auxiliary search engine by-Score= Sum of intensities of peaks matching B-type or Y-type ions HyperScore=
27
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: OMSSA Open source search engine Can be used for MS/MS experiments Relies on a Poisson distribution Published core algorithm and it is freely available Provides an expectancy score, similar to the BLAST E-value Very good performance in comparison with the others Used as an auxiliary search engine
28
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs proteins MS/MS spectra peptides Search engine sequence database UniProt IPI RefSeq TDMDNQIVVSDYAQ MDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL So far, we have actually identified peptides, not proteins
29
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs peptides proteins TDMDNQIVVSDYAQ MDRTW LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL IPI00302927 IPI00025512 IPI00002478 IPI00185600 IPI00014537 IPI00298497 IPI00329236 IPI00002232 Protein Inference is complex!!
30
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk PROTEIN INFERENCE
31
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Intermezzo: Protein inference The minimal and maximal explanatory sets peptide abcd proteins prot Xxx prot Yx prot Zxxx Minimal set Occam { peptide abcd proteins prot Xxx prot Yx prot Zxxx Maximal set anti-Occam { The Truth
32
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Intermezzo: Protein inference Slide from J. Cottrell, Matrix Science
33
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
34
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
35
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
36
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
37
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
38
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
39
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
40
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C
41
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C Unambiguous peptide
42
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk OTHER APPROACHES TO PERFORM MS/MS IDENTIFICATION
43
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein database based comparison Sequential comparison: de novo approaches Spectral comparison databasesequence theoretical spectrum experimental spectrum compare databasesequence experimental spectrum compare de novo sequence Spectral library experimental spectrum experimental spectrum compare Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007 Three types of MS/MS identification
44
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Example of a manual de novo of an MS/MS spectrum No more database necessary to extract a sequence! Algorithms Lutefisk Sherenga PEAKS PepNovo … References Dancik 1999, Taylor 2000 Fernandez-de-Cossio 2000 Ma 2003, Zhang 2004 Frank 2005, Grossmann 2005 … De novo approaches
45
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein database based comparison Sequential comparison: de novo approaches Spectral comparison databasesequence theoretical spectrum experimental spectrum compare databasesequence experimental spectrum compare de novo sequence Spectral library experimental spectrum experimental spectrum compare Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007 Three types of MS/MS identification
46
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Spectral searching Concept: To compare experimental spectra to other experimental spectra. There are many spectral libraries publicly available (for instance, from NIST) Custom ‘search engines’ have been developed: SpectraST (TPP) X!Hunter (GPM) It has been claimed that the searches have more sensitivity that with sequence database approaches
47
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Spectral searching (2) http://peptide.nist.gov/
48
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk COMBINING DIFFERENT SEARCH APPROACHES
49
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Multi-stage peptide identification strategy Taken from Nesvizhskii, J Proteomics, 2010 Goal: “Squeeze” your good quality experimental spectra
50
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk PROTEIN SEQUENCE DATABASES
51
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk 1.Comprehensive (whatever is not in the DB will not be included in your results). 2. Not too redundant at the protein sequence level -Protein inference gets easier -It is not very good if the database is too big. 3.Quality of annotation 4.Stability of identifiers What is needed from a protein database
52
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk a)UniProt Knowledgebase (UniProtKB): SWISS-PROT (manually curated)/ TrEMBL. b)NCBI non-redundant database: It compiles all protein sequences available from the following databases: ‘GenBank’ translations, the Protein Data Bank (PDB), UniProtKB/Swiss- Prot, PIR and PRF. c)Ensembl: Genomics centric resource. Integration of the information with genomics is easy. d)IPI (International Protein Index): It has been discontinued (9/2012). Different builds for different species (Human, Mouse, Cow, Rat, Zebrafish, Dog, Arabidopsis). a)Model organisms DBs (for instance, TAIR for Arabidopsis). Main databases used
53
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk - If the species is not well represented in the protein databases, there is a much stronger need to search ESTs or genomic databases. -The search engine will translate the 6 possible ORFs for each nucleotide sequence. - ESTs are not suitable for PMF approaches (incomplete proteins). - The alternative is to filter comprehensive databases like UniProt by species or genus, or to use a protein DB from a close organism. Databases for non-model organisms
54
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk -Since each database has a different focus, the databases can vary in terms of completeness, degree of redundancy, and quality of annotations. -More inclusive bigger protein databases will take longer to search - For the bigger resources, it may also result on more false-positive identifications and reduced statistical significance (the probability of random match is higher). Importance of choosing the right DB
55
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk POST-VALIDATION OF RESULTS
56
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk -Concepts of peptide and protein FDR -Decoy databases - Softwares like PeptideProphet, ProteinProphet, … -Influence of PTMs in the search -Scoring of PTM positioning ….. Other concepts that would be nice to learn…
57
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Recommended reading…. Nesvizhskii, J Proteomics, 2010 and many more…
58
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Conclusions Approaches to perform peptide and protein identification Sequence database based approaches: search engines The protein inference problem Importance of choosing the right protein database Many things to be learnt…
59
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Remember: it should not be a black box… From: Lilley et al., Proteomics, 2011
60
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk And still… we haven’t touched quantification at all From: Vaudel et al., Proteomics, 2010
61
EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.