Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBI is an Outstation of the European Molecular Biology Laboratory. MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics.

Similar presentations


Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics."— Presentation transcript:

1 EBI is an Outstation of the European Molecular Biology Laboratory. MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics Services Group PANDA group European Bioinformatics Institute Hinxton, Cambridge United Kingdom

2 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Overview … Search engines: peptide identification Protein inference De novo and spectral searches Choosing the right protein sequence DB You need to learn many things…

3 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk It should not be a black box… From: Lilley et al., Proteomics, 2011

4 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: Shot-gun/bottom-up approaches MS analysis MS/MS analysis fragmentation PROTOCOLPROTOCOL peptides proteins sequence database

5 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk PMF IDENTIFICATION

6 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Peptide Mass Fingerprinting (MS) MS analysis Peptide Mass Fingerprinting (PMF) MW - Each peak in the spectrum represents a peptide (or mixture of peptides) - Information about the Mass and Charge Not very used at present except for Gel Based approaches (in this case the Molecular Weight of the protein is known)

7 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Peptide Mass Fingerprinting (MS) in the web Aldente (Phenyx): http://www.expasy.org/tools/aldente/http://www.expasy.org/tools/aldente/ ASCQ_ME: https://www.genopole-lille.fr/logiciel/ascq_me/https://www.genopole-lille.fr/logiciel/ascq_me/ Bupid: http://zlab.bu.edu/Amemee/http://zlab.bu.edu/Amemee/ Mascot: http://www.matrixscience.com/search_form_select.htmlhttp://www.matrixscience.com/search_form_select.html MassSearch: http://www.cbrg.ethz.ch/services/MassSearchhttp://www.cbrg.ethz.ch/services/MassSearch MS-Fit (Protein Prospector): http://prospector.ucsf.edu/prospector/mshome.htm PepMAPPER: http://www.nwsr.manchester.ac.uk/mapper/http://www.nwsr.manchester.ac.uk/mapper/ Profound (Prowl): http://prowl.rockefeller.edu/prowl-cgi/profound.exehttp://prowl.rockefeller.edu/prowl-cgi/profound.exe XProteo: http://xproteo.com:2698/http://xproteo.com:2698/

8 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS/MS IDENTIFICATION

9 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS analysis Peptide Mass Fingerprinting (PMF) MS/MS MS/MS analysis Peptide sequence information (on top of Mass and Charge) Fragmentation

10 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein database based comparison Sequential comparison: de novo approaches Spectral comparison databasesequence theoretical spectrum experimental spectrum compare databasesequence experimental spectrum compare de novo sequence Spectral library experimental spectrum experimental spectrum compare Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007 Three types of MS/MS identification

11 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs MS/MS spectra proteins

12 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs MS/MS spectra proteins

13 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs proteins MS/MS spectra peptides Search engine sequence database UniProt IPI RefSeq TDMDNQIVVSDYAQ MDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL

14 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk SEARCH ENGINES

15 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines Sequence database matching Experimental Spectra Proteins Peptides Spectra TDMDNQIVVSDYAQMDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL VDMSLAQR DIVVQETMEDIDK … Theoretical Spectra UniProt IPI RefSeq sequence database

16 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines How good is the correlation? -Scores are generated by search engines -Usually the best match is kept Theoretical Spectra Experimental Spectra

17 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines Taken from Nesvizhskii, J Proteomics, 2010

18 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines Taken from Nesvizhskii, J Proteomics, 2010

19 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MASCOT (Matrix Science) http://www.matrixscience.com SEQUEST (Scripps, Thermo Fisher Scientific) http://fields.scripps.edu/sequest X!Tandem (The Global Proteome Machine Organization) http://www.thegpm.org/TANDEM OMSSA (NCBI) http://pubchem.ncbi.nlm.nih.gov/omssa/ The most popular algorithms

20 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Incorrect identifications Correct identifications False positivesFalse negatives Threshold score Adapted from: www.proteomesoftware.com – Wiki pages Overall concept of scores and cut-offs

21 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk false positives identifications higher stringency Playing with probabilistic cut-off scores

22 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Very well established search engine Can be used for MS/MS (PFF) identifications Based on a cross-correlation score (includes experimental peak height) Published core algorithm (patented, licensed to Thermo Fisher Scientific) Provides preliminary (Sp) score, rank, cross-correlation score (XCorr), and score difference between the top tow ranks (deltaCn,  Cn) Thresholding is up to the user, and is commonly done per charge state Many extensions exist to perform a more automatic validation of results SEQUEST XCorr = deltaCn=

23 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: Sequest The XCorr is high if the direct comparison is significantly greater than the background It measures how good the XCorr is relative to the next best match.

24 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Very well established search engine Can do MS (PMF) and MS/MS (PFF) identifications Based on the MOWSE score Unpublished core algorithm (trade secret) Predicts an a priori threshold score that identifications need to pass From version 2.2, Mascot allows integrated decoy searches Provides rank, score, threshold and expectation value per identification Customizable confidence level for the threshold score Search engines: Mascot

25 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: Mascot www.matrixscience.com

26 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: X!Tandem Open source search engine Can be used for MS/MS experiments Based on a hyperscore, than only takes into account b and y ions. Published core algorithm and it is freely available Fast and able to handle PTMs in an iterative fashion Used as an auxiliary search engine by-Score= Sum of intensities of peaks matching B-type or Y-type ions HyperScore=

27 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Search engines: OMSSA Open source search engine Can be used for MS/MS experiments Relies on a Poisson distribution Published core algorithm and it is freely available Provides an expectancy score, similar to the BLAST E-value Very good performance in comparison with the others Used as an auxiliary search engine

28 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs proteins MS/MS spectra peptides Search engine sequence database UniProt IPI RefSeq TDMDNQIVVSDYAQ MDR LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL So far, we have actually identified peptides, not proteins

29 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk MS proteomics: peptide IDs and protein IDs peptides proteins TDMDNQIVVSDYAQ MDRTW LFDQAFGLPR AKPLMELIER DESTNVDMSLAQR DIVVQETMEDIDK NGMFFSTYDR GTAGNALMDGASQL IPI00302927 IPI00025512 IPI00002478 IPI00185600 IPI00014537 IPI00298497 IPI00329236 IPI00002232 Protein Inference is complex!!

30 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk PROTEIN INFERENCE

31 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Intermezzo: Protein inference The minimal and maximal explanatory sets peptide abcd proteins prot Xxx prot Yx prot Zxxx Minimal set Occam { peptide abcd proteins prot Xxx prot Yx prot Zxxx Maximal set anti-Occam { The Truth

32 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Intermezzo: Protein inference Slide from J. Cottrell, Matrix Science

33 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

34 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

35 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

36 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

37 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

38 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

39 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

40 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C

41 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein inference B A D C Unambiguous peptide

42 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk OTHER APPROACHES TO PERFORM MS/MS IDENTIFICATION

43 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein database based comparison Sequential comparison: de novo approaches Spectral comparison databasesequence theoretical spectrum experimental spectrum compare databasesequence experimental spectrum compare de novo sequence Spectral library experimental spectrum experimental spectrum compare Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007 Three types of MS/MS identification

44 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Example of a manual de novo of an MS/MS spectrum No more database necessary to extract a sequence! Algorithms Lutefisk Sherenga PEAKS PepNovo … References Dancik 1999, Taylor 2000 Fernandez-de-Cossio 2000 Ma 2003, Zhang 2004 Frank 2005, Grossmann 2005 … De novo approaches

45 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Protein database based comparison Sequential comparison: de novo approaches Spectral comparison databasesequence theoretical spectrum experimental spectrum compare databasesequence experimental spectrum compare de novo sequence Spectral library experimental spectrum experimental spectrum compare Modified From: Eidhammer, Flikka, Martens, Mikalsen – Wiley 2007 Three types of MS/MS identification

46 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Spectral searching Concept: To compare experimental spectra to other experimental spectra. There are many spectral libraries publicly available (for instance, from NIST) Custom ‘search engines’ have been developed: SpectraST (TPP) X!Hunter (GPM) It has been claimed that the searches have more sensitivity that with sequence database approaches

47 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Spectral searching (2) http://peptide.nist.gov/

48 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk COMBINING DIFFERENT SEARCH APPROACHES

49 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Multi-stage peptide identification strategy Taken from Nesvizhskii, J Proteomics, 2010 Goal: “Squeeze” your good quality experimental spectra

50 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk PROTEIN SEQUENCE DATABASES

51 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk 1.Comprehensive (whatever is not in the DB will not be included in your results). 2. Not too redundant at the protein sequence level -Protein inference gets easier -It is not very good if the database is too big. 3.Quality of annotation 4.Stability of identifiers What is needed from a protein database

52 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk a)UniProt Knowledgebase (UniProtKB): SWISS-PROT (manually curated)/ TrEMBL. b)NCBI non-redundant database: It compiles all protein sequences available from the following databases: ‘GenBank’ translations, the Protein Data Bank (PDB), UniProtKB/Swiss- Prot, PIR and PRF. c)Ensembl: Genomics centric resource. Integration of the information with genomics is easy. d)IPI (International Protein Index): It has been discontinued (9/2012). Different builds for different species (Human, Mouse, Cow, Rat, Zebrafish, Dog, Arabidopsis). a)Model organisms DBs (for instance, TAIR for Arabidopsis). Main databases used

53 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk - If the species is not well represented in the protein databases, there is a much stronger need to search ESTs or genomic databases. -The search engine will translate the 6 possible ORFs for each nucleotide sequence. - ESTs are not suitable for PMF approaches (incomplete proteins). - The alternative is to filter comprehensive databases like UniProt by species or genus, or to use a protein DB from a close organism. Databases for non-model organisms

54 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk -Since each database has a different focus, the databases can vary in terms of completeness, degree of redundancy, and quality of annotations. -More inclusive bigger protein databases will take longer to search - For the bigger resources, it may also result on more false-positive identifications and reduced statistical significance (the probability of random match is higher). Importance of choosing the right DB

55 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk POST-VALIDATION OF RESULTS

56 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk -Concepts of peptide and protein FDR -Decoy databases - Softwares like PeptideProphet, ProteinProphet, … -Influence of PTMs in the search -Scoring of PTM positioning ….. Other concepts that would be nice to learn…

57 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Recommended reading…. Nesvizhskii, J Proteomics, 2010 and many more…

58 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Conclusions Approaches to perform peptide and protein identification Sequence database based approaches: search engines The protein inference problem Importance of choosing the right protein database Many things to be learnt…

59 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Remember: it should not be a black box… From: Lilley et al., Proteomics, 2011

60 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk And still… we haven’t touched quantification at all From: Vaudel et al., Proteomics, 2010

61 EBI Bulgaria Roadshow Rotterdam, 12 June 2012 Juan A. Vizcaíno juan@ebi.ac.uk Questions?


Download ppt "EBI is an Outstation of the European Molecular Biology Laboratory. MS Identification Dr. Juan Antonio VIZCAINO PRIDE Group coordinator PRIDE team, Proteomics."

Similar presentations


Ads by Google