In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics Solutions Inc, Canada 2 University of Waterloo, Canada
Problem Complete protein sequence coverage o antibody confirmation o biomarker discovery Database search software along is insufficient Protein sequence analysis
Possible reasons for incomplete coverage “non-database” peptides o unexpected modifications o mutated residues o novel peptide database errors Meanwhile Large amount of high-quality spectra are not matched. Protein sequence analysis
A workflow to identify both the database and “non-database” peptides Objective Maximize protein sequence coverage Explain more high-quality MS/MS spectra Proposed workflow for in-depth analysis
Workflow Proposed workflow for in-depth analysis Multiple protein digests with different enzymes High accuracy MS for both precursor and fragment ions
Workflow Proposed workflow for in-depth analysis PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom. 2003;17(20): Identify de novo sequence tags Reveal a set of high quality spectra
Workflow Proposed workflow for in-depth analysis PEAKS DB: De Novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 2012; 11: , 1–8. Identify database peptides. Database search result validated by de novo tags Reveal a set of confident proteins
Workflow Proposed workflow for in-depth analysis PeaksPTM: Mass spectrometry-based identification of peptides with unspecified modifications. Journal of Proteome Research 10.7 (2011) : Identify peptides with unexpected modifications Peptides from the set of confident proteins are “modified” in-silico by trying all possible modifications in UNIMOD. Speed up by de novo tags For input spectra with + highly confident de novo tags - no significant database matches
Workflow Proposed workflow for in-depth analysis SPIDER: software for protein identification from sequence tags with de novo sequencing error. J Bioinform Comput Biol Jun;3(3): Identify peptides with mutation, such as residue insertion, deletion, and substitution. Screen the protein database to find short sequences similar to de novo tags Use both the de novo tags and database sequence to reconstruct the most probable sequences that match the spectrum For input spectra with + highly confident de novo tags - no significant database matches
Workflow Proposed workflow for in-depth analysis Unassigned de novo sequence tags are reported as possible novel peptides
Result integration Proposed workflow for in-depth analysis
Test the workflow with the standard bovine serum albumin Sample Workflow In-depth analysis of BSA Pure ALBU_BOVIN from SIGMA 3 digests with Trypsin, LysC, GluC. LC-MS/MS with Thermo LTQ-Orbitrap XL. Workflow implemented in PEAK 6 3 digests in one project Searched database: Swiss-Prot Trypsin LysC GluC Workflow LC-MS/MS
More PSMs are identified in each additional step: Result 5,152 MS/MS spectra 1,737 PSMs 906 PSMs 44 PSMs 38 MS/MS spectra Filtered at 1% FDR 1,737 -> 2,687 PSMs PEAKS ALC score > 70%
BSA coverage Result The uncovered 4% is in the protein N-terminal region, which is mostly likely cleaved-off and not in the purchased sample 1. 1 specific binding site (Asp-Thr-His-Lys) for Cu(II) ions. T. Peters Jr., F.A. Blumenstock. J. Biol. Chem., 242 (1967), p. 1574
Contaminants Identified with at least 3 unique peptides. – Human keratin proteins (K2C1_HUMAN and K1C_HUMAN) – Bacteria protein (SSPA_STAAR) – Trypsin (TRY1_BOVIN) Result
PTMs Unsuspected modifications identified by PTM search – Three PTMs specified in database search » Carbamidomethylation (C) » Oxidation (M) » Deamidation (NQ) Result
Mutation 214 th amino acid A T Brown 1975, Fed. Proc. 34:591 Result
Unexplained de novo tags Might be… – Novel peptides outside of the searched database Result KK.QTALVELLK.HK ||||||| DPALVELLKK
A software workflow proposed for in-depth protein sequence analysis Found many things in a “pure” sample – Contaminants – Unsuspected PTMs – Mutations Improved protein sequence coverage – BSA coverage: 87% -> 96% Explained more high-quality MS/MS spectra – Identified MS/MS spectra: 1,737 -> 2,687 Summary
Q / A