Protein Identification David Fenyő

Slides:



Advertisements
Similar presentations
PSI Mass Spectrometry Standards Working Group Summary HUPO PSI MS Standards Working Group.
Advertisements

David Campbell 1,, Eric Deutsch 1, Henry Lam 1, Hamid Mirzaei 1, Paola Picotti 2, Jeff Ranish 1, Ning Zhang 1, and Ruedi Aebersold 1,2,3 1.Institute for.
Protein Quantitation II: Multiple Reaction Monitoring
Protein Quantitation II: Multiple Reaction Monitoring
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
MS-Viewer – A Web Based Spectral Viewer For Database Search Results Peter R. Baker 1, Alma L. Burlingame 1 and Robert J. Chalkley 1 1 Mass Spectrometry.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
Proteomics Informatics – Databases, data repositories and standardization (Week 6)
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part III: Protein Quantitation
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Proteomics Informatics – Overview of Mass spectrometry (Week 2)
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Proteomics Informatics – Databases, data repositories and standardization (Week 7)
Laxman Yetukuri T : Modeling of Proteomics Data
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Johannes Griss PSI Meeting Heidelberg, April 2011 EBI is an Outstation of the European Molecular Biology Laboratory. mzTab Proposal for.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호.
Proteomics Informatics – Overview of Mass spectrometry (Week 2)
Open source tools for data analysis
Jarrett Egertson, Ph.D. MacCoss Lab
A Database of Peak Annotations of Empirically Derived Mass Spectra
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
MassMatrix Search Results Explained
Protein Identification via Database searching
Volume 4, Issue 6, Pages e4 (June 2017)
Creation of assays using repositories
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
A perspective on proteomics in cell biology
Volume 4, Issue 6, Pages e4 (June 2017)
Proteomics Informatics –
Proteomics Informatics –
NoDupe algorithm to detect and group similar mass spectra.
Bioinformatics for Proteomics
Shotgun Proteomics in Neuroscience
High level view of the MAE algorithm.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra
Presentation transcript:

Protein Identification David Fenyő

Protein Identification and Quantitation Samples Peptides Mass Spectrometry Quantity intensity m/z Identity

Information Content in a Single Mass Measurement Human 10 8 6 Avg. #of matching peptides 4 3 2 1 2 3 4 6 8 10 1 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da] S. cerevisiae 10 8 6 Avg. #of matching peptides 4 3 2 1 2 3 4 6 8 10 1 #of matching peptides 1000 2000 3000 Tryptic peptide mass [Da]

Repeat for each protein Compare, Score, Test Significance Identification – Peptide Mass Fingerprinting Sequence DB Pick Protein Digestion MS All Peptide Masses Repeat for each protein MS Compare, Score, Test Significance Identified Proteins

ProFound – Search Parameters

ProFound – Protein Identification by Peptide Mapping W. Zhang & B.T. Chait, Analytical Chemistry 72 (2000) 2482-2489

ProFound Results

Peptide Mapping – Mass Accuracy

Peptide Mapping - Database Size S. cerevisiae Expectation Values Peptide mapping example: S. Cerevisiae 4.8e-7 Fungi 8.4e-6 All Taxa 2.9e-4 Fungi All Taxa

Peptide Mapping - Database Size

Missed Cleavage Sites Expectation Values Peptide mapping example:

Peptide Mapping - Partial Modifications No Modifications Searched Searched With Without Possible Modifications Phosphorylation of S/T/Y DARPP-32 0.00006 0.01 CFTR 0.00002 0.005 Even if the protein is modified it is usually better to search a protein sequence database without specifying possible modifications using peptide mapping data. Phophorylation (S, T, or Y)

Peptide Mapping - Ranking by Direct Calculation of the Significance

General Criteria for a Good Protein Identification Algorithms The response to random input data should be random. Maximum number of correct identification and minimum number of incorrect identifications for any data set. Maximal separation between scores for correct identifications and the distribution of scores for random matching proteins for any data set. The statistical significance of the results should be calculated. The searches should be fast.

Response to Random Data Normalized Frequency

b y Peptide Fragmentation Mass Analyzer 1 Frag-mentation Detector Ion Source Mass Analyzer 2 b y

Identification – Tandem MS

Interpretation of Mass Spectra K L E D F G S m/z % Relative Abundance 100 250 500 750 1000

Interpretation of Mass Spectra K L E D F G S K 1166 L 1020 E 907 D 778 663 534 405 F 292 G 145 S 88 b ions m/z % Relative Abundance 100 250 500 750 1000

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 113 113

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022 129 129

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

Interpretation of Mass Spectra K L E D F G S 147 K 1166 L 260 1020 E 389 907 D 504 778 633 663 762 534 875 405 F 1022 292 G 1080 145 S 88 y ions b ions m/z % Relative Abundance 100 250 500 750 1000 [M+2H]2+ 762 260 389 504 633 875 292 405 534 907 1020 663 778 1080 1022

De Novo Sequencing Sequences consistent with spectrum Amino acid masses 762 100 875 [M+2H]2+ % Relative Abundance 633 292 405 260 389 534 1022 504 663 778 907 1020 1080 250 500 750 1000 m/z Mass Differences Sequences consistent with spectrum

De Novo Sequencing

De Novo Sequencing

SGF(I/L)EEDE(I/L)(K/Q) De Novo Sequencing X X X …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… SGF(I/L)EEDE(I/L)… 1166 – 1020 – 18 = 128 K or Q SGF(I/L)EEDE(I/L)(K/Q) …GF(I/L)EEDE(I/L)… …(I/L)EDEE(I/L)FG… Peptide M+H = 1166 1166 -1079 = 87 => S SGF(I/L)EEDE(I/L)… X X X

De Novo Sequencing Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information Challenges in de novo sequencing Neutral loss (-H2O, -NH3) Modifications Background peaks Incomplete information

Tandem MS – Database Search Sequence DB Lysis Fractionation Pick Protein Digestion LC-MS Pick Peptide Repeat for all proteins MS/MS All Fragment Masses all peptides Repeat for MS/MS Compare, Score, Test Significance

Algorithms

Comparing and Optimizing Algorithms

MS/MS - Parent Mass Error and Enzyme Specificity Expectation Values MS/MS example: Dm=2, Trypsin 2.5e-5 Dm=100, Trypsin 2.5e-5 Dm=2, non-specific 7.9e-5 Dm=100, non-specific 1.6e-4

Sequest Cross-correlation

X! Tandem - Search Parameters http://www.thegpm.org/

X! Tandem - Search Parameters

X! Tandem - Search Parameters

single stage searching spectra Generic search engine Test all cleavages, modifications, & mutations for all sequences sequences sequences Conventional, single stage searching

Some hard problems in MS/MS analysis in proteomics Allowing for unanticipated peptide cleavages - e.g., chymotryptic contamination in trypsin - calculation order ~ 200 × tryptic cleavage - “unfortunate” coefficient Determining potential modifications - e.g., oxidation, phosphorylation, deamidation - calculation order 2n - NP complete Detecting point mutations - e.g., sequence homology - calculation order 18N - NP complete

Multi-stage searching spectra Tryptic cleavage Modifications #1 sequences Modifications #2 sequences Point mutation X! Tandem

Search Results

Search Results

Sequence Annotations

Search Results

Search Results

Identification – Spectrum Library Search Lysis Fractionation Digestion LC-MS/MS Pick Spectrum Repeat for all spectra MS/MS Compare, Score, Test Significance Identified Proteins

Steps in making an Annotated Spectrum Library (ASL): 1. Find the best 10 spectra for a particular sequence, with the same PTMs and charge. 2. Add the spectra together and normalize the intensity values. 3. Assign a “quality” value: the median expectation value of the 10 spectra used. 4. Record the 20 most intense peaks in the averaged spectrum, it’s parent ion z, m/z, sequence, protein accessions & quality.

Spectrum Library Characteristics – Peptide Length

Spectrum Library Characteristics – Protein Coverage

Identification – Spectrum Library Search Library spectrum (5:25) Test spectrum (5:25) Results: 4 peaks selected, 1 peak missed

Identification – Spectrum Library Search How likely is this? Apply a hypergeometric probability model: - 25 possible m/z values; - 5 peaks in the library spectrum; and - 4 selected by the test spectrum. Matches Probability 1 0.45 2 0.15 3 0.016 4 0.00039 5 0.0000037

Identification – Spectrum Library Search If you have 1000 possible m/z values and 20 peaks in test and library spectrum? 1 matched: p = 0.6 5 matched: p = 0.0002 10 matched: p = 0.0000000000001

X! Hunter

X! Hunter algorithm: 1. Use dot product to find a library spectrum that best matches a test spectrum. 2. Calculate p-value with hypergeometric distribution. 3. Use p-value to calculate expectation value, given the identification parameters. 4. If expectation value is less than the median expectation value of the library spectrum, report the median value.

X! Hunter Result Query Spectrum Library Spectrum

Significance Testing False protein identification is caused by random matching An objective criterion for testing the significance of protein identification results is necessary. The significance of protein identifications can be tested once the distribution of scores for false results is known.

Significance Testing - Expectation Values The majority of sequences in a collection will give a score due to random matching.

Significance Testing - Expectation Values Database Search List of Candidates M/Z Distribution of Scores for Random and False Identifications Extrapolate And Calculate Expectation Values List of Candidates With Expectation Values

Homework Explore search parameter space for X! Tandem. Pick a subject for a short presentation next Tuesday from these: http://www.thegpm.org/basics/index.html#201505201.

Protein Sequence Databases

http://www.ncbi.nlm.nih.gov/books/NBK21091/ RefSeq Distinguishing Features of the RefSeq collection include: non-redundancy explicitly linked nucleotide and protein sequences updates to reflect current knowledge of sequence data and biology data validation and format consistency ongoing curation by NCBI staff and collaborators, with reviewed records indicated http://www.ncbi.nlm.nih.gov/books/NBK21091/

http://www.ensembl.org/ Ensembl genome information for sequenced chordate genomes. evidenced-based gene sets for all supported species large-scale whole genome multiple species alignments across vertebrates variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. http://www.ensembl.org/

http://www.uniprot.org/ UniProt The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. http://www.uniprot.org/

Species-Centric Consortia For some organisms, there are consortia that provide high-quality databases: Yeast (http://yeastgenome.org/) Fly (http://flybase.org/) Arabidopsis (http://arabidopsis.org/)

http://en.wikipedia.org/wiki/FASTA_format FASTA RefSeq: >gi|168693669|ref|NP_001108231.1| zinc finger protein 683 [Homo sapiens] MKEESAAQLGCCHRPMALGGTGGSLSPSLDFQLFRGDQVFSACRPLPDMVDAHGPSCASWLCPLPLAPGRSALLACLQDL DLNLCTPQPAPLGTDLQGLQEDALSMKHEPPGLQASSTDDKKFTVKYPQNKDKLGKQPERAGEGAPCPAFSSHNSSSPPP LQNRKSPSPLAFCPCPPVNSISKELPFLLHAFYPGYPLLLPPPHLFTYGALPSDQCPHLLMLPQDPSYPTMAMPSLLMMV NELGHPSARWETLLPYPGAFQASGQALPSQARNPGAGAAPTDSPGLERGGMASPAKRVPLSSQTGTAALPYPLKKKNGKI LYECNICGKSFGQLSNLKVHLRVHSGERPFQCALCQKSFTQLAHLQKHHLVHTGERPHKCSVCHKRFSSSSNLKTHLRLH SGARPFQCSVCRSRFTQHIHLKLHHRLHAPQPCGLVHTQLPLASLACLAQWHQGALDLMAVASEKHMGYDIDEVKVSSTS QGKARAVSLSSAGTPLVMGQDQNN Ensembl: >ENSMUSP00000131420 pep:known supercontig:NCBIM37:NT_166407:104574:105272:-1 gene:ENSMUSG00000092057 transcript:ENSMUST00000167991 MFSLMKKRRRKSSSNTLRNIVGCRISHCWKEGNEPVTQWKAIVLGQLPTNPSLYLVKYDGIDSIYGQELYSDDRILNLKVL PPIVVFPQVRDAHLARALVGRAVQQKFERKDGSEVNWRGVVLAQVPIMKDLFYITYKKDPALYAYQLLDDYKEGNLHMIPD TPPAEERSGGDSDVLIGNWVQYTRKDGSKKFGKVVYQVLDNPSVFFIKFHGDIHIYVYTMVPKILEVEKS UniProt: >sp|Q16695|H31T_HUMAN Histone H3.1t OS=Homo sapiens GN=HIST3H3 PE=1 SV=3 MARTKQTARKSTGGKAPRKQLATKVARKSAPATGGVKKPHRYRPGTVALREIRRYQKSTELLIRKLPFQRLMREIAQDFK TDLRFQSSAVMALQEACESYLVGLFEDTNLCVIHAKRVTIMPKDIQLARRIRGERA http://en.wikipedia.org/wiki/FASTA_format

PEFF - PSI Extended Fasta Format >sp:P06748 \ID=NPM_HUMAN \Pname=(Nucleophosmin) (NPM) (Nucleolar phosphoprotein B23) (Numatrin) (Nucleolar protein NO38) \NcbiTaxId=9606 \ModRes=(125|MOD:00046)(199|MOD:00047) \Length=294 >sp:P00761 \ID=TRYP_PIG \Pname=(Trypsin precursor) (EC 3.4.21.4) \NcbiTaxId=9823 \Variant=(20|20|V) \Processed=(1|8|PROPEP)(9|231|CHAIN) \Length=231 http://www.psidev.info/node/363

Sample-specific protein sequence databases Identified and quantified Peptides MS Protein DB Identified and quantified peptides and proteins

Sample-specific protein sequence databases Next-generation sequencing of the genome and transcriptome Samples Peptides MS Sample-specific Protein DB Identified and quantified peptides and proteins

Data Repositories

ProteomeExchange http://www.proteomeexchange.org/

PRIDE http://www.ebi.ac.uk/pride/

PeptideAtlas http://www.peptideatlas.org/

Chorus Key Aspects: Upload and share raw data with collaborators Analyze data with available tools and workflows Create projects and experiments Select from public files and (re-)analyze/visualize Download selected files

MassIVE Key Aspects: Upload files Spectra and Spectrum libraries, Analysis Results, Sequence Databases, Methods and Protocol) Perform analysis using available tools Browse public datasets Download data

The Global Proteome Machine Databases (GPMDB) http://gpmdb.thegpm.org

Comparison with GPMDB Most proteins show very reproducible peptide patterns

Comparison with GPMDB Query Spectrum Best match In GPMDB Second

GPMDB Data Crowdsourcing Any lab performs experiments Raw data sent to public repository (TRANCHE, PRIDE) Data imported by GPMDB Data analyzed & accepted/rejected Accepted information loaded into public collection General community uses information and inspects data

Information for including a data set in GPMDB MS/MS data (required) MS raw data files ASCII files: mzXML, mzML, MGF, DTA, etc. Analysis files: DAT, MSF, BIOML Sample Information (supply if possible) Species : human, yeast Cell/tissue type & subcellular localization Reagents: urea, formic acid, etc. Quantitation: SILAC, iTRAQ Proteolysis agent: trypsin, Lys-C Project information (suggested) Project name Contact information

How to characterize the evidence in GPMDB for a protein? High confidence Medium confidence Low confidence No observation

Statistical model for 212 observations of TP53 Start End N -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 Skew Kurt 214 248 539 0.15 0.18 0.22 0.17 0.07 0.03 0.01 0.00 -0.01 -2.01 249 267 1010 0.04 0.09 0.13 0.16 0.14 0.06 0.05 -0.08 -1.89 182 196 832 0.20 0.19 -0.12 -1.84 250 4 0.25 0.48 -2.28 1 24 269 0.10 0.12 -0.33 -0.88 65 51 0.08 0.02 0.47 -1.62 66 101 334 0.11 -1.21 273 60 0.45 -1.36 242 10 0.30 0.54 -1.39 239 32 -0.99 111 120 117 0.26 0.29 0.62 251 16 0.24 -0.60 241 14 0.21 0.87 -0.97 159 174 100 0.31 0.99 -1.07 68 0.86 -0.91 235 30 0.23 0.81 -0.82

Statistical model for observations of DNAH2

Statistical model for observations of GRAP2

DNA Repair

DNA Repair

TP53BP1:p, tumor protein p53 binding protein 1

TP53BP1:p, tumor protein p53 binding protein 1

Sequence Annotations

TP53BP1:p, tumor protein p53 binding protein 1

TP53BP1:p, tumor protein p53 binding protein 1

Peptide observations, catalase Peptide Sequence Observations FSTVAGESGSADTVR 2633 FNTANDDNVTQVR 2432 AFYVNVLNEEQR 1722 LVNANGEAVYCK 1701 GPLLVQDVVFTDEMAHFDR 1637 LSQEDPDYGIR 1560 LFAYPDTHR 1499 NLSVEDAAR 1400 FYTEDGNWDLVGNNTPIFFIR 1386 ADVLTTGAGNPVGDK 1338

Peptide frequency (ω), catalase Peptide Sequence ω FSTVAGESGSADTVR 0.08 FNTANDDNVTQVR 0.07 AFYVNVLNEEQR 0.05 LVNANGEAVYCK GPLLVQDVVFTDEMAHFDR LSQEDPDYGIR 0.04 LFAYPDTHR NLSVEDAAR FYTEDGNWDLVGNNTPIFFIR ADVLTTGAGNPVGDK

Global frequency of observation (ω), catalase Peptide sequences

Omega (Ω) value for a protein identification For any set peptides observed in an experiment assigned to a particular protein (1 to j ):

Protein Ω’s for a set of identifications Protein ID Ω (z=2) Ω (z=3) SERPINB1 0.88 0.82 SNRPD1 0.59 CFL1 0.81 0.87 SNRPE 0.8 PPIA 0.79 0.64 CSTA 0.36 PFN1 0.76 0.61 CAT 0.71 0.78 GLRX 0.66 CALM1 0.62 FABP5 0.57 0.17

Retention Time Distribution

Mass Accuracy

GO Cellular Processes

KEGG Pathways

Open-Source Resources

ProteoWizard http://proteowizard.sourceforge.net

Protein Prospector http://prospector.ucsf.edu/

UCSC Genome Browser http://genome.ucsc.edu/

Slice - Scalable Data Sharing for Remote Mass Informatics Developed by Manor Askenazi openslice.fenyolab.org Most mass spectrometry data is acquired in discovery mode, meaning that the data is amenable to open-ended analysis as our understanding of the target biochemistry increases. In this sense, mass spectrometry based discovery work is more akin to an astronomical survey, where the full list of object-types being imaged has not yet been fully elucidated, as opposed to e.g. micro-array work, where the list of probes spotted onto the slide is finite and well understood.

Standardization

Standardization - MIAPE

Standardization – MIAPE-MSI

Standardization – XML Formats mzML - experimental results obtained by mass spectrometric analysis of biomolecular compounds mzIdentML - describe the outputs of proteomics search engines TraML - exchange and transmission of transition lists for selected reaction monitoring (SRM) experiments mzQuantML - describe the outputs of quantitation software for proteomics mzTab - defines a tab delimited text file format to report proteomics and metabolomics results. MIF - decribes the molecular interaction data exchange format. GelML - describes the processing and separations of proteins in samples using gel electrophoresis, within a proteomics experiment.