Protein Sequencing Research Group (PSRG): Results of the PSRG 2011 Study: SensitivityAssessment of Edman and Mass Spectrometric Terminal Sequencing of an Undisclosed Protein H.A. Remmer1, J.S.Smith2, W.Sandoval3, B.Xiang4, K.Mawuenyega5, D. Suckau6, V. Katta3, J.J. Walters7, P.Hunziker8 1University of Michigan, Ann Arbor, MI, United States, 2University of Texas Medical Branch, Galveston, TX, United States, 3Genentech, Inc., South San Francisco, CA, United States, 4Monsanto Company, St. Louis, MO, United States, 5Washington University School of Medicine, St. Louis, MO, United States, 6Bruker Daltonics, Bremen, Germany, 7Sigma-Aldrich, St. Louis, MO, United States, 8University of Zurich, Zurich, Switzerland INTRODUCTION Establishing the N-terminal sequence of intact proteins plays a critical role in biochemistry and drug development. Edman degradation and top-down and bottom-up mass spectrometry methods for N-terminal sequence analysis have been used for that task. In this study, we proposed to determine the ability of these sequencing techniques to deal with various sample formats and to assay sensitivity. For the 2011 study, the PSRG distributed three kinds of sample sets (designated A, B or C) of 3 tubes each. Each tube contained the same artificial recombinant (unknown) protein in varying amounts and formats (see table below). Participants chose which of three sample sets - or any combination of sets - they would like to receive. Participants obtained the following information: (a) protein MW is ~52 kDa, (b) the sequence is NOT in a public database,(c) tubes 1 with lowest sample amount contains ~ 5 pmol protein in the selected format (d) potential presence of a co-purified E. coli protein at <20 kDa in Sample Set A is known, but of no interest to current study and(e) Sample Set A are soluble in 0.1% TFA, 0.1 % TFA/20 % acetonitrile or 25 mM AMBIC. Study participants were directed to a website to anonymously upload sequences and supporting data. The analysis of the results of the 2011 study focuses on the length and accuracy of the sequence calls depending on increasing amounts of protein. A total of 38 participants requested 74 sample sets. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 N-terminal Sequence G A L R V F D E K P Q N I Sample A1 Instrument Solvent Initial yield Rep. Yield Participant 004 Procise 494HT 0.1%TFA/30%IPA 2.8 93.60% X Participant 024 Procise 494 0.1%TFA/20% ACCN 0.7 95.60% Participant 058 0.1% TFA na note 1 PSRG002 0.1% TFA/50% ACCN 3.7 91.30% Sample A2 Participant 020 1.2 86.10% 9.4 94.80% Sample A3 7.8 98.60% 29.3 note 2 Sample B2 Sample C1 Participant 006 Participant 014 Procise HT 0.5 94.60% Participant 016 Procise 494cLC 95.80% Participant 036 0.6 90.80% Participant 040 Procise 1.7 PSRG001 93.00% 2.3 88.00% g Sample C2 r f 96.40% 3.5 95.50% k e q n l i 1.4 99.20% 92.70% 4.2 95.70% 92.20% Sample C3 p 10.8 97.70% 4.8 11.5 93.50% 18.1 96.30% 5.5 96.60% H 11.3 89.80% note 1: no sequence detected. Participant suspects sample not soluble in 0.1% TFA note 2: a total of 50 amino acid residues were sequenced correct N-terminal call no call is marked with "X "; a wrong call is denoted with a letter not color coded; a tentative call is denoted with a lower case letter Study Results: Edman Sequencing STUDY METHODS: The PSRG prepared the 3 sample sets for distribution as follows: The study protein (95% purity by SEC) was dissolved in 50% acetonitrile/0.1% TFA, lyophylized and the protein content was determined by AAA. The sample was the aliquoted based on protein content to achieve the desired concentrations (5pmol, 15pmol and 45pmol respectively). Samples A were lyophylized, samples B and C were subjected to SDS-PAGE (B) and subsequent electroblotting (C). Upon test analyses for validation, presence of contaminating proteins were acknowledged and found to mimic a client sample in a core facility setting. The validation analysis by ISD was performed on an UltrafleXtreme MALDI-TOF/TOF instrument after samples were shipped and showed that much less protein was available for analysis than anticipated by the original protein quantification. Participants obtained instructions for dissolution of samples in set A. However, valid ISD was only obtained for nominal 100pmol of the sample. The participants were asked to use their code number to report their data in Survey Monkey (www.surveymonkey.com). TYPICAL PARTICIPANT METHODS Edman Degradation Most participants performed the analysis on a Procise 494HT sequencer using standard reagents and protocols. The majority of participants used the sample as provided. For sample set C, the pvdf membrane was directly loaded onto the instrument, for set A, the sample was dissolved in 0.1% TFA containing 20%-50% acetonitrile, and applied onto a prosorb filter. Initial yields and repetitive yields were reported (see table). Bottom-up MS Method: Sample sets A and B were used for this analysis; samples A were dissolved in ammonium bicarbonate and digested usually using Trypsin and 1-2 additional enzymes. The analysis was mostly performed on an LTQ or LTQ Orbitrap and the MS/MS data were subjected to database search using Thermo Proteome Discoverer, or manual de novo mascot searches were performed. Top Down MS Method: The majority of participants utilized an Ultraflex MALDI-TOF/TOF instrument and performed in-source decay (ISD) using the matrices 2,5-diaminonapthalene (DAN) or 2,5-dihydroxybenzioc acid (DHB) as matrix. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 N-terminal Sequence G A L R V F D E K P Q N I 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 C-terminal Sequence T Y H Sample A1 Sample Processing LC MS Terminal sequence (de novo) Participant #040 10 mM AmBiC, Trypsin and Glu-C N/A 4700 Proteomics Analyzer N-terminus note 1 C-terminus X Sample A2 Participant #048 Trypsin and Chymotrypsin Not provided LTQ Orbitrap Velos ETD Participant #034 Trypsin, Glu-C and Lys-C Ultraflex TOF/TOF PSRG003 100 mM AmBiC, Lys-C and Lys-N Eksigent NanoLC-2D LTQ MS note 2 Samples B1, B2, B3 Participant # 048* Ac-M M Participant #026 Trypsin * Participant #048 sequenced more than 200 amino acids by manual spectra interpretation . Note 1: Participant 040 also sequenced by Edman degradation and had the opportunity to search MS/MS data for the correct N-terminal peptide. Note 2: Participant PSRG003 used Lys-C and Lys-N in combination according to a published procedure for N-terminal sequencing (see reference section). correct N-terminal call Correct C-terminal call no call is marked with "X " an incorrect call is denoted with letter not color coded Study Results: Bottom-Up Sequencing CONCLUSION Edman degradation was successfully employed in this study to obtain N-terminal sequence information of an unknown protein, not present in public databases, independent of the sample format. However, the most frequently selected sample format was the PVDF membrane followed by the lyophilized sample. A slight dependency between concentration and read-length was found but intra group variation was much higher. Bottom-up work applied to the study samples typically yielded sequences of another protein . However, the correct sequence was called as well. One participant also called the 70 C-terminal residues. In this study, top-down sequencing was attempted by MALDI-ISD from samples A without any success. Investigation of the sample by PSRG showed that the accessible protein amount in samples A (lyophilized) to the analysis was only ~5% of what was determined by AAA potentially due to poor solubility. Only much higher sample amounts of A than distributed allowed to retrieve de novo sequences and several bacterial heat shock proteins (15-16 kDa range) were identified in that sample after LC protein separation. Taken together, Edman sequencing demonstrated that the strict dependency on sample material in particular when applied to a membrane after SDS-PAGE, allowed to operate quite robust and reliably. All mass spectrometric methods, if not linked strictly to an intact protein MW, can easily identify “non target” sequences. Here the solubility and the homogeneity of the sample play a much greater role, in particular for the top-down approaches that have the highest requirement for sample amount and quality to be particularly recognized in future studies. REFERENCE: T. Kishimoto, J. Kondo, T. Takako-Igarashi and H. Tanaka. A novel method for analyzing protein terminals. Poster presented at the ASMS conference, Salt Lake City, 2010. Study Results: Top-Down Sequencing Sample A Instrument Matrix Methods Sample Prep Results Participant 016 UltraFlex III DHB, DAN MALDI-ISD used sample as provided None of the participants were able to call an N-terminal or C-terminal sequence when analyzing sample set A. Investigation of the sample by the PSRG showed that the accessible protein amount in samples A (lyophilized) to the analysis was significantly less than was determined by AAA due to poor solubility of the sample in aqueous solvents only. The validation analysis by ISD was performed on an UltrafleXtreme MALDI-TOF/TOF instrument after samples were shipped. Participants obtained instructions for dissolution of samples in set A. However, valid ISD was only obtained for nominal 100pmol of the sample after LC purification. Participant 028 Ultraflex II Flex control Intact MW , ISD C4 Zip Tip, eluted with 75% ACN, 0.1% TFA Participant 002 Information not provided no details provided Intact MW Participant 034 Ultraflex TOF/TOF DAN ISD PSRG001 4800 MALDI-TOF/TOF ISD/T3 Cl-MeOH precip. Reconst. in 0.1%TFA ACKNOWLEDGEMENTS Dr. Robert English (University of Texas Medical Branch) for accumulation & annonimization of data; Sigma-Aldrich for donation of the study sample; the Executive Board of the ABRF for support and scrutiny of the study proposal, Dr. Jack Simpson (National Cancer Institute, Frederick, MD) for functioning as liaison to the ABRF Executive Board, and participating labs for analyzing sample and returning data.