Protein Sequencing Research Group: Results of the PSRG 2012 Study Terminal Sequencing of Standard Proteins in a Mixture Year 1 of the 2-year Study.

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Protein Quantitation II: Multiple Reaction Monitoring
The Proteomics Core at Wayne State University
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Ch.5 Proteins: Primary structure Polypeptide diversity Protein purification and analysis Protein sequencing Protein evolution.
Protein Sequencing Research Group (PSRG): Results of the PSRG 2012 Study Terminal Sequencing of Standard Proteins in a Mixture Year 1 of the 2-Year Study.
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
De Novo Sequencing and Homology Searching with De Novo Sequence Tags.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Sangtae Kim Ph.D. candidate University of California, San Diego
ProReP - Protein Results Parser v3.0©
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
FIGURE 5. Plot of peptide charge state ratios. Quality Control Concept Figure 6 shows a concept for the implementation of quality control as system suitability.
Each results report will contain:
My contact details and information about submitting samples for MS
MALDI Imaging Mass Spectrometry Nan Kleinholz Mass Spectrometry and Proteomics Facility The Ohio State University Proteomics Summer Workshop 2015.
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Proteome.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
ESI and MALDI LC/MS-MS Approaches for Larger Scale Protein Identification and Quantification: Are They Equivalent? 1P. Juhasz, 1A. Falick,1A. Graber, 1S.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
UPDATE! In-Class Wed Oct 6 Latil de Ros, Derek Buns, John.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Protein and Peptide Sequencing by FTMS Susan Martin.
Laxman Yetukuri T : Modeling of Proteomics Data
ETD & ETD/PTR Electron Transfer Dissociation Proton Transfer Reaction
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
A new "Molecular Scanner" design for interfacing gel electrophoresis with MALDI-TOF ThP Stephen J. Hattan; Kenneth C. Parker; Marvin L. Vestal SimulTof.
In-Gel Digestion Why In-Gel Digest?
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Overview of Mass Spectrometry
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Separates charged atoms or molecules according to their mass-to-charge ratio Mass Spectrometry Frequently.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Fundamentals of Biochemistry
Lecture 6 Comparative analysis Oct 2011 SDMBT.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Constructing high resolution consensus spectra for a peptide library
What is proteomics? Richard Mbasu and Ben Richards.
DIA Method Design, Data Acquisition, and Assessment
Finding the unexpected in SWATH™ Data Sets – Implications for Protein Quantification Ron Bonner; Stephen Tate; Adam Lau AB SCIEX, 71 Four Valley Drive,
Yonsei Proteome Research Center Peptide Mass Finger-Printing Part II. MALDI-TOF 2013 생화학 실험 (1) 6 주차 자료 임종선 조교 내선 6625.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Protein Sequencing Research Group: Results of the PSRG 2011 Study
Table 1. Quality Parameters Being Considered for Evaluation
Multi-Analyte LC-MS/MS Methods – Best Practice.
‘Protein sequencing’: Determining protein sequences
Edman Sequencing as a Method For Polypeptide Quantitation
Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
MassMatrix Search Results Explained
The ABRF Edman Sequencing Research Group 2007 Study
Protein Sequencing Research Group (PSRG): Results of the PSRG 2011 Study: SensitivityAssessment of Edman and Mass Spectrometric Terminal Sequencing of.
Bioinformatics Solutions Inc.
Interpretation of Mass Spectra I
Proteomics Informatics –
Shotgun Proteomics in Neuroscience
Presentation transcript:

Protein Sequencing Research Group: Results of the PSRG 2012 Study Terminal Sequencing of Standard Proteins in a Mixture Year 1 of the 2-year Study

Current PSRG Members Henriette Remmer (Co-Chair)University of Michigan Jim Walters (Co-Chair) Sigma-Aldrich Robert English*University of Texas Medical Branch Pegah Jalili*Sigma-Aldrich Viswanatham KattaGenentech, Inc Kwasi MawuenyegaWashington University School of Medicine Detlev SuckauBruker Daltonics Bosong XiangMonsanto, Co. Jack Simpson (EB liaison)United States Pharmacopeia * new members added in 2011

PSRG 2012/13 – Study Background and Design Status of Terminal Sequencing :  In the midst of a technology transition from classical Edman sequencing to mass spectrometry (MS) based sequencing  Both technique have varied strengths and weaknesses and both have a role in biochemical research.  With a complimentary role realized, we attempt to push the capabilities of the various sequencing techniques, namely terminal sequencing of proteins in mixture Concept of the 2012 Study- Terminal Sequencing of Proteins in a Mixture:  Sequencing proteins in a mixture requires separation of proteins prior to analysis  Edman Sequencing : SDS-PAGE and electroblotting prior to analysis –  well established in most core facilities   MS based sequencing: LC separation necessary prior to analysis-  not well established in most core facilities => PSRG designed a 2-year study YEAR 1: Terminal sequencing and identification of three separated standard proteins YEAR 2: Same three proteins distributed, this time in mixture

PSRG 2012 Year 1: Study Objective To obtain N-terminal sequence information on three standard proteins supplied as separated samples.

2011 Study Design – The Samples  Participants were asked to analyze the samples for terminal sequencing using any technology available  Participants obtained all three proteins with ID in sufficient amounts to sequence each protein utilizing all three technologies. Feasibility of analysis had been validated by PSRG members.  Participants also filled out a survey, all responses were kept anonymously Protein Name Amounts Provided (pmol) N-terminally blocked? Fusion Protein? Comments BSA1mgNo reference protein/ calibrant Protein A3x 100Yes Fusion protein with blocked N-terminus Endostatin3x 100No Contains two N- terminal variants

Participation and Survey results 25 laboratories from 12 countries requested samples for Edman sequencing and most of the labs (23) also for MS sequencing. 14 of the 25 participating laboratories (56%) completed the survey. 7 of the 14 labs utilized Edman sequencing, 6 top-down MS and 6 bottom-up MS. Out of 14 respondents, 9 labs analyzed the reference protein BSA, 8 correctly determined the N-terminus 13 labs analyzed Protein A, 5 correctly determined the N-terminus 14 labs analyzed Endostatin, 12 labs correctly determined the N-terminus, only 7 identified the presence of the second N-terminus

Survey Response Results

Purification and separation method before analysis

N-Terminal Techniques: Edman Degradation

Edman Workflows PSRG 2012 Samples Used sample as Provided (5) ABI Procise HT’s 1 – 492 cLC cLC SDS PAGE – blotting on PVDF (2) blotting on PVDF (1) Shimadzu PPSQ-33A

Edman sequencing Protein A PROTEIN A- FUSION PROTEIN- N-TERMINUS BLOCKED C10 Polybrene-precycled glass fiber filters ABI Procise Biosystems Model 494HT De-blocking (PGAP) 100 pmol Sequence 1M Met LRPVETP C10-LRPVETP

Edman sequencing of Endostatin A00 Probability 2: position 7 Histine to Glutamine blotting on PVDF Shimadzu PPSQ-33A H2O with 0.1 % TFA Probability 1: position 4 Proline to Arginine Initial Yield: % Repetitive Yield: %

Edman sequencing of Endostatin A00 Sequence 1DFQPVLHLVALNSPL A00/Vaiants 1DFQPVLHLVALNSPL Sequence 2HSHRDFQPVLHLVAL A00/Variant 2RQ Sequence Verification: with Blast P Information about the sequence: SwissProt output

Summary of N-terminal sequencing result Sample DescriptionLab IDAmino acid sequence BSA Y20D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y L Q Q X P F D E H V K L V N C10D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y N32D T H K S E I A H R F K D L G E E H F K G L V L I A00D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y Protein A Y20 F L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A F Y Q V L N M P N Y20M F L R P V E T P T C10 L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A F Y Q V L N32 X L R P V E T P X R E I K K L A00 M L R P V E T P T R E I K K L D G L S10 X L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A V00 F L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A F Y Q V L N M P N Endostatin Seq. 1 Y20D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q X F Q Q A C10D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q C F Q Q A R E20D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q C F Q Q A R A V G L A G T N32D F Q P V L H L V A L N S P L S G G M R G I A00D F Q P V L H L V A L N S P L S10D F Q P V L H L V A L N S P L S G G M R G Endostatin Seq. 2 Y20H S H R D F Q P C10H S H R D F Q P X L H X X A L N X X X S G G M E20H S H R D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q C N32H S H R D F Q P V X H X V A L N S

PSRG 2011 Edman Conclusions & Observations All lab returned N-terminal data which correlate well with the published protein sequences It can produce the data with and without separation (SDS PAGE and chromatography) No C-terminal data was produced with Edman. If the protein N-terminally blocked, the reaction will not proceed for most but not all modifications. The reagents for Edman sequencing are very expensive Edman sequencing allows for direct determination of the protein’s N-terminal sequence.

N-Terminal Techniques Overview: MS Techniques

Mass Spectrometry Methods Used Top-Down Sequencing (no digests) ISD, T³:AB Sciex 4800 MALDI-TOF/TOF MS, ISD, T³:Bruker Ultraflex MALDI-TOF/TOF MS, ETD,CID:Bruker maXis 4G UHR-QTOF Only Top-Down N-term results were returned. Some participants used Bottom-Up MS as validation step Bottom-Up MS/MS (digests) MALDI-TOF/TOFs: AB/Bruker ESI-Orbitrap: Thermo

Top-Down Experimental Bruker Ultraflex Bruker UltrafleXtreme HPLC Direct infusion As provided Sample Separation Top-Down Instrumentation 0.1% TFA MeOH/H2O/HOAc 6M GndHCl Various organic/H2O/acid AB Sciex 4800 Triversa Nanomate Agilent 1200 Bruker Autoflex speed Bruker MaXis 4G ISD/T³ ISD ETD CID

Software used for MS Top-Down Analysis BioTools 3.2: Sequence-tags, automatic de-novo sequencing, trigger Mascot TD searching, result visualization, terminal assignments, TD report generation (Bruker) Mascot 2.3: TD and BU Database searches (Matrix Science) BLAST/MS-BLAST: Protein identification based on sequence tags (NIH, Harvard/EMBL) ISDetect: Sequence-tags, semi-automatic de-novo sequencing, result visualization (Genentech, Y Gan et al, in prep. )

The Top-Down MS Standard Analysis Strategies MW Determination: Check Sample Quality + Final QC ETD/ISD: obtain internal sequence Tags ID Protein: e.g. Mascot search Extend Sequence towards N-terminus (and C-term alike)  Compare with obtained protein sequences incl. PTMs)  T³-Sequencing, i.e. MS/MS analysis of MALDI-ISD fragments  Edman sequencing Problems: unknown terminal modifications (Sample B), fusion proteins (Sample B), ragged ends (Sample C) DTHKSEIAHRFKDLGEEHFKGLVLIAFSQYLQQCP DTHKSEIAHRFKDLGEEHFKGLVLIAFSQYLQQCP

BSA ISD Spectrum in DAN matrixPSRG123 good calibrant for ISD Spectra

Sample A: BSA, ISD+EdmanC10 following the basic strategy BSA sequence Accession number: AAI02743 c-ions in the MALDI-ISD spectrum revealed the sequence from Arg10 -Tyr30. Edman sequencing provided Asp1 to Gly15 Data from the orthogonal methods were put together to obtain 30 residues of BSA sequence. FINAL SEQUENCE OBTAINED FOR BSA: DTHKSEIAH RFKDLGEEHF KGLVLIAFSQ YLQQCPFDEH VKLVNELTEF… Coverage by Edman Coverage by MALDI-ISD Coverage by both

Sample B Endostatin (donated by Sigma) issues: ragged N-term, C-term loss of K C-term K excised added

EndostatinL36 Annotated ISD Spectrum from on/off gradient Interfering component

EndostatinL36 HPLC chromatogram, separation of two variant, ISD of F1, F2 not assigned The recovery from the endostation sample might be lower than 100 pmol 100 pmol Myoglobin standard F1 F2 LC-separation detected the protein heterogeneity, removed polymeric contamination but reduced the sample amount and readout length

UHR-QTOF MS analysis of Endostatin: 2 Components MS, x10 Intens m/z Z10 In contrast to MALDI-ISD, the QTOF-ETD analysis takes place after precursor ion selection

ETD Analysis of Endostatin, First Precursor: Mascot Database Search Result Simplest Use of Top-Down Data: Mascot Search Z10

TDS Analysis of Endostatin, First Precursor: Deconvoluted and Annotated ETD Spectrum c 2 c 9 c 26 Z10

TDS Analysis of Endostatin, First Precursor: Mass Accuracy of intact Protein Measured Monoisotopic mass Theoretical Monoisotopic mass Mass error3.2 ppm Measured (black) Spectrum Simulated (red) Spectrum Z10 Precision MW allows to confirm proper N-term and C-term loss of Lysin

Endostatin: TDS Sequence 1PSRG123

Endostatin: TDS Sequence 2PSRG123 If ISD spectral quality is good, both sequences can be directly read and N- and C-termini can be assigned from THE SAME SPECTRUM

Rec. Protein A (donated by Repligen) Issues: N-term methylation, fusion site after residue 18 E.coli  -Glucuronidase SPA_STAAU C-term sequence does not match intact MW (nice challenge for Top-Down MS in the Future..)

ISD Spectrum Protein A (DAN) E20 manual sequence generation TR E I/LK/Q I/LD G K/Q A H D EA

ISD spectrum for Samples #2 (Protein A) was manually interpreted by sequential subtraction of ions Resultant sequence: was Blasted against the Dayhoff public database (below) Protein A Identification E20 TRE[IL][KQ][KQ][IL]DG[IL]A[KQ] Only two sequences matched. Homology searching of the N-term Tag provided a)  -Glucuronidase, b) its N-terminally extended sequence, c) mass offset indicates N-term Methylation

Protein A MS/MS E20 ISD c-ion m/z T³-sequence analysis of c 9 confirms N-term methylation

Protein AL36 MS/MS of N-terminal tryptic fragment Validation of assigned N-term methylation and glucuronidase sequence by Bottom-Up LC-MALDI-TOF/TOF analysis

Protein AL36 Annotated ISD spectrum The N-terminal sequence is  -gluronidase fused with protein A. The N-terminal Methionine is methylated. The N-terminal aminoacids not confirmed by ISD was confirmed by MS/MS of the N-terminal tryptic fragment

Results from MS Analyses Please look at poster ##?? For more details

Lessons to be Learned from this Years Study Mass Spec Lessons.. 1.Top-Down with ETD or ISD provides reliable N-term sequences 2.Top-Down CID was most easily misinterpreted 3.Edman and Top-Down Complement each other very well: Edman for the first ~10 residues, Top-Down for the inexpensive extension of calls (e.g. through the fusion site of Protein A) 4.Validation of the N-term by either T³-sequencing or Bottom-Up works as well 5.Efficient use of Top-Down MS requires good software support 6.Bottom-Up was great to confirm N-term results but not to generate them 7.Use of protein HPLC resulted in shortened readouts 8.Protein A Successful analysis of the fusion required high experience 9.Endostatin ragged N-termini were recognized by those that determined the intact molecular weight(s), detected heterogeneity by HPLC or Edman 10.Top-Down by ETD or ISD permitted the detection of the C-terminal removal of Lysine, intact MW determination allowed to validate the finding

Next years ABRF-PSRG2013 study what's going to happen? Most likely, the same proteins will be provided again! But: provided as a stew in a single pot! Task: Isolate/separate them from the mixture Problem: SDS-PAGE works well for Edman, but it is difficult to extract intact proteins Hints:  Protein LC needs to be established, to get to the next level!  Always try to get intact MW information!  Use high sample amounts as you loose a lot during LC

The ABRF-PSRG Acknowledges the following Support Recombinant Protein A was obtained as donation from RepliGen (Waltham, MA) Endostatin was obtained as donation from SIGMA- ALDRICH (St Louis, MO) Steve Smith (University of Texas Medical Branch) and Larry Dangott (Texas A&M University) for Edman sequencing to provide reference data for this study.

End Following slides are bonus material

In-Source Decay (MALDI-ISD) “pseudo-MS/MS” technique, no precursor selection ISD of protein in the MALDI plume at <nsec timescale (similar to ETD) Fragmentation due to radical transfer from matrix to analyte (Takayama, 2001) a,c- ions: N-terminus; y, z+2-ions: C-terminus – simultaneous sequencing TOF/TOF allows for T³-sequencing: MS/MS analysis of ISD fragments MALDI-ISD

MALDI-ISD and T³-Sequencing Suckau & Resemann (2003) Anal Chem 75

ESI-ETD (Electron Transfer Dissociation) CID Collision with inert gas protein is internally heated globally it fragments in statistic process weak bond cleavages ETD Collision with electron donating gas perturbates electronic structure locally resulting in local bond cleavages ETD fragments all bond (except Pro) for top down MS/MS of intact proteins with precursor ion selection

ETD Measurement Cycle on QTOF Reaction Cell n-CI Source 10 kHz 1. Precursor Ion Accumulation 2. Electron Transfer Reagent Addition 3. ETD Reaction 4. Fragment Ion Transfer and Detection Tsybin et al. (2011) Anal Chem 83:8919

I/LN SGGMRG N K/Q D F C E20 ISD Endostatin (DAN): initial manual interpretation

Data base search for [IL]SGGMRGNR[KQ]DF[KQ]CF Excerpt from COIA1_HUMAN Excerpt from COIA1_MOUSE Differences between human and mouse can be seen in the -2 position from the start of ISD sequence (ie. LNSPL in human and LNTPL in mouse) Sequence from spectrum was found beginning at , so we know there are a handful of residues preceding this seq E20

010212_B23_10pmol_Endostatin_MSMS_2kV_ To confirm N-terminus not covered in the ISD spectrum, MS/MS was performed on m/z y7y7 b7b7 b8b8 y9y9 b9b9 b 10 b2b2 b3b3 b4b4 b5b5 Immonium Ions P H I/L K/Q E20 y6y6

Determination of Endostatin N-termini by Edman degradation. -Major sequence matches CO1A1_HUMAN at position A second sequence was found from position Both sequences concur with the ISD findings. E20 Edman sequencing detected the ragged N-term, ISD confirmed and extended it Largely manual analysis of ISD spectra made it difficult to extract full information

2012/2013 PSRG: Timeline of the 2-year study ABRF 2011 ABRF 2012 Settled on the 3 standard proteins for distribution as separated proteinsi n year 1 of the study Year 1 (2012) Study announcement Samples sent to participants Extended deadline for returning data Data analysis Feb ‘11 Oct ‘11 Jan ‘12 Mar ‘12 May ‘12 Discussed ideas for 2012 study. Agreement upon a study design May ‘11 Aug ‘11 Sep ‘11 Feb ‘13 ABRF 2013 Distribution of proteins in mixture for year 2 of the study Data analysis Oct ‘12 Deadline for returning data Jun ‘12 Year 2 (2013) Study announcement

Comments…… un-reproducible recovery from the tube for Endostatin is a problem if one wants to optimize the setting or try to reproduce the data….. Thanks! PSRG. It was fun. Unelss I've missed something, the availability of the proteins in the public domain made this an easy project. Sample quality was very good! I thought the fusion Protein A solution was blocked? I obtained sequence matches to the protein B-Glucuronidase, either B-Glucuronidase is fused to Protein A and you were not successful blocking the protein or B-Glucuronidase is a contaminant…… It was very costly study for an Edman lab, (reagents).