Download presentation
Presentation is loading. Please wait.
Published byBeverley Leonard Modified over 8 years ago
1
Protein Identification by Database Searching John Cottrell Matrix Science
2
Protein Identification by Database Searching Three ways to use mass spectrometry data for protein identification 1.Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein
3
Protein Identification by Database Searching
5
PMF Servers on the Web ASCQ_ME: https://www.genopole-lille.fr/logiciel/ascq_me/ Bupid: http://zlab.bu.edu/Amemee/ Mascot: http://www.matrixscience.com/search_form_select.html MassSearch: http://www.cbrg.ethz.ch/services/MassSearch_new MS-Fit (Protein Prospector): http://prospector.ucsf.edu/prospector/mshome.htm PepMAPPER: http://www.nwsr.manchester.ac.uk/mapper/ Profound (Prowl): http://prowl.rockefeller.edu/prowl- cgi/profound.exe Mowse, PeptideSearch, Protocall, Aldente, XProteo
6
Protein Identification by Database Searching Search Parameters database taxonomy enzyme missed cleavages fixed modifications variable modifications protein MW estimated mass measurement error
7
Protein Identification by Database Searching
8
Henzel, W. J., Watanabe, C., Stults, J. T., JASMS 2003, 14, 931-942.
9
Protein Identification by Database Searching Peptide Mass Fingerprint Fast, simple analysis High sensitivity Need database of protein sequences not ESTs or genomic DNA Sequence must be present in database or close homolog Not good for mixtures especially a minor component.
11
Protein Identification by Database Searching H – N – C – C – N – C – C – N – C – C – N – C – C – OH R1R1 R2R2 R3R3 R4R4 OOO HHHHHHHH O a 1 b 1 c 1 a 2 b 2 c 2 a 3 b 3 c 3 x 3 y 3 z 3 x 2 y 2 z 2 x 1 y 1 z 1 H+H+ Roepstorff, P. and Fohlman, J. (1984). Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 11, 601.
12
Protein Identification by Database Searching Three ways to use mass spectrometry data for protein identification 1.Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein 2.Sequence Query Mass values combined with amino acid sequence or composition data
13
Protein Identification by Database Searching Mann, M. and Wilm, M., Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66 4390-9 (1994).
14
Protein Identification by Database Searching 1489.430 tag(650.213,GWSV,1079.335)
15
Protein Identification by Database Searching Mascot http://www.matrixscience.com/search_form_select.html MS-Seq (Protein Prospector) http://prospector.ucsf.edu/prospector/mshome.htm MultiIdent (TagIdent, etc.) http://www.expasy.org/tools/multiident/ PeptideSearch, Spider Sequence Tag Servers on the Web
16
Protein Identification by Database Searching
18
Sequence Tag Rapid search times Essentially a filter Error tolerant Match peptide with unknown modification or SNP Requires interpretation of spectrum Usually manual, hence not high throughput Tag has to be called correctly Although ambiguity is OK 2060.78 tag(977.4,[Q|K][Q|K][Q|K]EE,1619.7).
19
Protein Identification by Database Searching Three ways to use mass spectrometry data for protein identification 1.Peptide Mass Fingerprint A set of peptide molecular masses from an enzyme digest of a protein 2.Sequence Query Mass values combined with amino acid sequence or composition data 3.MS/MS Ions Search Uninterpreted MS/MS data from a single peptide or from a complete LC-MS/MS run
20
Protein Identification by Database Searching Eng, J. K., McCormack, A. L. and Yates, J. R., 3rd., An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5 976-89 (1994) SEQUEST
21
Protein Identification by Database Searching MS/MS Ions Search Servers on the Web Inspecthttp://proteomics.ucsd.edu/LiveSearch/ Mascothttp://www.matrixscience.com/search_form_select.html MS-Tag (Protein Prospector) http://prospector.ucsf.edu/prospector/mshome.htm Omssahttp://pubchem.ncbi.nlm.nih.gov/omssa/index.htm PepFrag (Prowl)http://prowl.rockefeller.edu/prowl/pepfrag.html PepProbehttp://bart.scripps.edu/public/search/pep_probe/search.jsp RAId_DbShttp://www.ncbi.nlm.nih.gov/CBBResearch/qmbp/RAId_DbS/index. html Sonar (Knexus)http://hs2.proteome.ca/prowl/knexus.html X!Tandem (The GPM)http://thegpm.org/TANDEM/index.html Not on-lineByonic, Crux, greylag, MassMatrix, Myrimatch, Paragon, Peaks, PepSplice, pFind, Phenyx, ProbID, ProLuCID, ProteinLynx GS, Sequest, SIMS, SpectrumMill
22
Protein Identification by Database Searching
24
MS/MS Ions Search Easily automated for high throughput Can get matches from marginal data Can be slow No enzyme Many variable modifications Large database Large dataset MS/MS is peptide identification Proteins by inference.
25
Protein Identification by Database Searching Search Parameters
26
Protein Identification by Database Searching Search Parameters Sequence Database
27
Protein Identification by Database Searching Search Parameters Sequence Database Swiss-Prot (~500,000 entries) High quality, non-redundant NCBInr, UniRef100 (~19,000,000 entries) Comprehensive, non-identical EST databases (>400,000,000 entries) Very large and very redundant Sequences from a single genome A consensus sequence Peptides are lost at exon-intron boundaries (Entry counts are from mid-2012)
28
Protein Identification by Database Searching Search Parameters Taxonomy Swiss-Prot 2010_08 Mammalia (mammals)=65104 Primates=26940 Homo sapiens (human)=20292 Other primates=6648 Rodentia (Rodents)=25473 Mus.=16358 Mus musculus (house mouse)=16307 Rattus=7533 Other rodentia=1582 Other mammalia=12691
29
Protein Identification by Database Searching Search Parameters Mass Tolerances Most search engines support separate mass tolerances for precursors and fragments May allow fixed units (Da, mmu) or proportional (ppm, %) Some search engines can correct for selection of 13 C peak Unless search engine performs some type of re-calibration, need to provide conservative estimate of mass accuracy, not precision This doesn’t have to be a guessing game. Run a standard, then look at the error graphs for strong matches
30
Protein Identification by Database Searching Search Parameters Enzyme can be Fully specific Non-specific (“no enzyme”) Some search engines support Limited number of missed cleavage points Semi-specific enzymes Enzyme mixtures
31
Protein Identification by Database Searching Search Parameters Common peak list formats DTA (Sequest) PKL (Masslynx) MGF (Mascot) mzData (.XML) mzML (.mzML)
32
Protein Identification by Database Searching Search Parameters Modifications Fixed / static / quantitative modifications cost nothing Variable / differential / non-quantitative modifications are very expensive
33
Protein Identification by Database Searching Search Parameters Modifications Common artefacts Carbamylation+43N-term, KUrea in digest buffer Deamidation+1NLow pH Pyro-glutamic acid-17Q at N-termLow pH Pyro-carbamidomethyl or carboxymethyl Cys +40C at N-termLow pH, delta is relative to unmodified C Oxidation+16M (many other residues also) Gels Over alkylation+57N-term, WIodacetamide Over alkylation+58N-term, WIodoacetic acid
34
Protein Identification by Database Searching Site Analysis
35
Protein Identification by Database Searching Site Analysis
36
Protein Identification by Database Searching Site Analysis AscoreBeausoleil S.A., et al. (2006) Nat. Biotechnol. 24, 1285–1292 MaxQuantCox J. & Mann M. (2008) Nat. Biotechnol. 26, 1367 - 1372 Olsen J.V., et al. (2006) Cell 127, 635–48 Inspect MS-Alignment PTMFinder Tanner S., et al. (2008) J. Proteome Res. 7, 170–181 Payne S., et al. (2008) J. Proteome Res. 7, 3373–3381 Tsur D., et al. (2005) Nat. Biotechnol. 23, 1562–1567 Tanner S., et al. (2005) Anal. Chem. 77, 4626-4639 PhosphoScoreRuttenberg B.E., et al. (2008) J. Proteome Res. 7, 3054-9 DebunkerLu B., et al. (2007) Anal. Chem. 79, 1301-10 SloMo - ETD/ECDBailey C.M., et al. (2009) J. Proteome Res. 8, 1965-71 ModifiCombSavitski M.M., et al. (2006) Mol. Cell. Proteomics 5, 935–48 Delta ScoreSavitski M. M., et al. (2010) Mol. Cell. Proteomics mcp.M110.003830
37
Site Analysis Protein Identification by Database Searching
38
Multi-pass Searches Implemented under a variety of names X!Tandem:Model refinement Mascot:Error tolerant search Spectrum Mill:Search saved hits, homology mode, unassigned single mass gap Phenyx:2-rounds Paragon:Thorough ID, fraglet-taglet
39
Protein Identification by Database Searching Scoring Score Total matches Incorrect matches Correct matches
40
Protein Identification by Database Searching Scoring Receiver Operating Characteristic
41
Protein Identification by Database Searching Sensitivity & Specificity
42
Protein Identification by Database Searching Sensitivity & Specificity Search a “decoy” database Decoy entries can be reversed or shuffled or randomised versions of target entries Decoy entries can be separate database or concatenated to target entries Gives a clear estimate of false discovery rate
43
Protein Identification by Database Searching Sensitivity & Specificity Score Total matches Incorrect matches Correct matches
44
Protein Identification by Database Searching Sensitivity & Specificity
45
Protein Identification by Database Searching Protein Inference Peptide 1Peptide 2Peptide 3 Peptide 1Peptide 3 Peptide 2 General approach is to create a minimal list of proteins. “Principal of parsimony” or “Occam’s razor” Protein A Protein B Protein C
46
Protein Identification by Database Searching Further Reading: Exercises: http://www.ms- ms.com/exercises/exercises. html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.