Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 2010. 10. 11. 신성호.

Slides:



Advertisements
Similar presentations
Protein Quantitation II: Multiple Reaction Monitoring
Advertisements

Proteomics Informatics – Protein characterization I: post-translational modifications (Week 10)
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
In-depth Analysis of Protein Amino Acid Sequence and PTMs with High-resolution Mass Spectrometry Lian Yang 2 ; Baozhen Shan 1 ; Bin Ma 2 1 Bioinformatics.
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Peptide Mass Fingerprinting
Mass Fingerprint. Protease A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Karl Clauser Proteomics and Biomarker Discovery Taming Errors for Peptides with Post-Translational Modifications Bioinformatics for MS Interest Group ASMS.
Comparison of chicken light and dark meat using LC MALDI-TOF mass spectrometry as a model system for biomarker discovery WP 651 Jie Du; Stephen J. Hattan.
Production of polypeptides, Da, and middle-down analysis by LC-MSMS Catherine Fenselau 1, Joseph Cannon 1, Nathan Edwards 2, Karen Lohnes 1,
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
Search Engine Result Combining Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
In-Gel Digestion Why In-Gel Digest?
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
Eat Raw & Fresh: Introducing isotopic Mass-to-charge Ratio and Envelope Fingerprinting (iMEF) and ProteinGoggle for Protein Database Search Zhixin(Michael)
Overview of Mass Spectrometry
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Error tolerant search Large number of spectra remain without significant score. Reasonable number of fragment ion peaks might have not match. – Underestimated.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Constructing high resolution consensus spectra for a peptide library
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Protein identification by mass spectrometry The shotgun proteomics strategy, based on digesting proteins into peptides and sequencing them using tandem.
Algorithms and Computation: Bottom-Up Data Analysis Workflows
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
A Database of Peak Annotations of Empirically Derived Mass Spectra
LC-MS/MS Identification of Impurities Present in Synthetic Peptide Drugs Dr Anna Meljon*, Dr Alan Thompson, Dr Osama Chahrour, and Dr John Malone Almac.
MassMatrix Search Results Explained
Protein Identification via Database searching
Pinpointing phosphorylation sites using Selected Reaction Monitoring and Skyline Christina Ludwig group of Ruedi Aebersold, ETH Zürich.
Mass spectrometry-based proteomics
Proteomics Lecture 4 Proteases.
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Interpretation of Mass Spectra I
Protein Identification by Peptide Mass Fingerprinting
Proteomics Informatics –
Volume 20, Issue 12, Pages (December 2013)
Volume 24, Issue 13, Pages (July 2014)
Protein Identification Using Mass Spectrometry
Bioinformatics for Proteomics
Shotgun Proteomics in Neuroscience
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
Volume 81, Issue 1, Pages (July 2001)
Interpretation of Mass Spectra
Kuen-Pin Wu Institute of Information Science Academia Sinica
Presentation transcript:

Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 2010. 10. 11. 신성호

Abstract MassMatrix A new DB search algorithm to identify and characterize intact X-links in proteins and peptides with high confidence Test with BS3 x-linked Cytochrome C Five x-links were indentified and verified Discriminate true positives x-linked PSM from false ones The distribution of statistical scores for true and false positives ROC analysis Search for intact x-links in complex Escherichia coli samples

Introduction Indentification of X-links in Proteins Provide invaluable information regarding a protein’s structure, conformation, and interactions DB Search Algorithms False positives need to be controlled Due to the increased search space for searches with x-links Traditional DB search program (SEQUEST, Mascot) Cannot be used for analysis of x-linked proteins/peptides

Introduction New DB Search Engine Identify x-links in proteins and peptides Include three probability-based scoring algorithms Provided better sensitivity than Mascot, SEQUEST, OMMSA, X!Tandem for a given specificity For proteins/peptides without any x-links or disulfide bonds

Introduction Validated for peptides and proteins with disulfide bonds By use of peptide standards with known disulfide bonds and bovine pancreatic ribonuclease A Tested using data sets Collected on a LTQ-FT mass spectrometry for the tryptic digests of Cytochrome C x-linked by BS3 Identify and verify five x-links for spatial plausibility by comparison with 3D structure

Experimental Section Material, Sample Preparation, and Mass Spectrometry Horse heart Cytochrome C, X-linking reagent BS3 The x-linked protein samples were purified by SDS-PAGE Monomer bands were cut and digested by trypsin Escherichia coli cells were cultured in LB broth using 200 rpm shaking speed Nano-LC-MS/MS experiments On a LTQ-FT mass spectrometry

Experimental Section DB Search and Search Parameters Isotope distributions Deconvolution to obtain the charge states and monoisotopic m/z values of the precursor ions, during raw data -> mzXML DB: Cytochrome C protein sequence + decoy sequence + 20 randomized Cytochrome C sequences 41 mzXML data files were searched against an Escherichia coli k-12 strain sequence DB containing 4,285 protein sequences

Experimental Section The search parameters Enzyme: trypsin Missed cleavage: 2 Modifications: variable iodoacetamide derivative of cysteine and variable oxidation of methionine Mass tolerances: 10 ppm for the precursor ions 0.6 Da for product ions Maximum number of modification: 2 Peptide length: 5-40 amino acid residues Score threshold: 5.3 for pp 1.3 for pptag

Experimental Section Fig 1 shows the structure and reaction of the x-linking reagent The chemical formula of the x-link between two lysine sites: C8H10O2(138.068 Da) 3 dead-end x-links

Results and Discussion Search Algorithm X-link type Interchain and Intrachain x-link Peptides with more than 2 x-links are difficult to characterize Poor fragmentation and large size Up to 2 x-links Considered Peptide type Type 1: only have interchain x-links Type 2: only have intrachain x-links Type 3: both inter- and intrachin x-links Type 4: circular chins Dead-end x-links in peptides are considered as modification

Results and Discussion X-link search algorithm based on the disulfide search algorithm 3 search modes Exploratory search mode All occurrences of A and B residues in the protein sequences are considered Confirmatory search mode Only the x-links specified in the DB by the user will be considered and searched against experimental data X-links are coded as “A($i)” and “B($i)”, i is the index number of the specified x-link

Results and Discussion Semiexploratory search mode A limited exploratory search of x-links will be performed between the amino acid residues labeled($ or $x) in the DB The process of the MassMatrix Proteins digestion In silico based on the specified proteolysis reagents Fragmentation using the appropriate fragmentation model CID, ETD fragmentation methods Each chain undergoes fragmentation independently and internal fragments are not searched Only product ions created from the rupture of a single bond When one chain undergoes fragmentation, other(s) will be considered as a modification

Results and Discussion Scoring against the experimental MS2 data The same as those used for peptides without any x-links and those with disulfide bonds as described previously 3 independent statistical scores, pp, pp2, and pptag pptag is the best standard MassMatrix produce protein and peptide match lists XMapper Generate x-link assignments from MassMatrix results Scoring N: the number of peptides assigned to the x-link np: the number of spectral matches for peptides p

Results and Discussion Validation of the X-Link Search Algorithm Data set A tryptic digest of Cytochrome C x-linked by BS3 on a LTQ-FT Reagent to protein ratio: 25:1 Final protein concentration: 0.12 mg/mL MS2: 6,982 spectra DB (Table 1) Cytochrome C protein sequence + A reversed Cytochrome C sequences + 20 randomized Cytochrome C sequences When x-links were considered, theoretical peptides was dramatically increased(2.10 ×106, search time 55s)

Results and Discussion Fig 2 shows 2 representative spectra for x-linked peptides (a): intrachain, (b): interchain, *: loss of ammonia, `: loss of water Fig 3 shows the pptag score distribution for TPs and FPs, and ROC analysis for the PSM identified in MassMatrix Scoring model can discriminate TPs from FPs ROC indicate that the algorithm performs well Area under the curve(AUC): 0.91(with X-links) and 0.92 Good sensitivity and specificity for both types of peptides with and without x-links

Results and Discussion Cytochrome C Contains 19 lysine residues Potentially form 171 x-links between 2 lysine 25 x-links assigned Fig 4 shows the scores of all identified x-links for Cytochrome C are mapped in a heat map using XMapper A majority of the x-links are background(low occurrence and low scores), light blue and cyan Background are irrelevant and represent the noise

Results and Discussion X-links formed on the Cytochrome C protein due to its representative 3D structure Present at higher occurrence and higher abundance 5 nonbackground x-links 5 x-links are further verified by comparison with the 3D structure as shown in Fig 5 The distances of the two lysine residues K25-K27: 5.3 Å, K86-K87: 3.4Å, K7-K100: 7.7Å K7-K27: 13.9Å, K99-K100: 13.6Å Slightly longer than the length of the x-link but reasonable given the differences between crystal and solution-phase structures

Results and Discussion FDR The background x-links were at a similar level to that of the false positives for decoy proteins Can be controlled by the target-decoy search strategy FDR=FPx-link/(TPx-link+FPx-link) 5 nonbackground x-links are survived and all of the background x-links were filter at 5% FDR

Results and Discussion Effect of Different X-Linking Conditions Different X-linking reagent to protein ratio 1:1, 2.5:1, 5:1, 10:1, 25:1, 50:1, and 100:1 Final protein concentration of 0.12 mg/mL Fig 6 shows the heat maps of the x-links identified All 5 nonbackground x-links were identified(≥25:1) Two or more nonbackground x-links were not identified(<25:1) Fig 7a shows the dependence of the scores of the five nonbackground x-links K86-K87 was independent of the ratio Others scores increased

Results and Discussion Fig 7b shows the number of all the background x-links (FDR 5%) Total background increases(<5:1) Becomes much less significant(>5:1) All background x-links can be filtered at FDR 5% High x-linking reagent to protein ratio favors the x-link determination in a ratio range of 1:1 to 25:1(0.12mg/mL) The effect of protein concentration Reagent to protein ratio: 10:1 Various protein concentrations: 0.06, 0.12, 0.60, 2.4 mg/mL Fig 8 shows the heat maps of the x-links identified in four sample Higher protein concentration have higher scores for the 5 nonbackground x-links Improvement becomes much less significant(>0.60mg/mL)

Results and Discussion Fig 9a shows the dependence of the scores of the nonbackground x-links on the protein concentration The number of total background x-links increases (≤0.12mg/mL) The number of background x-links at FDR 5% is independent of the protein concentration and stay low(Fig 9b) High protein concentration favors the x-link determination experiment and this benefit becomes insignificant (>0.12mg/mL at 10:1)

Results and Discussion X-link Search of A Complex Proteome Sample The dramatically increased search space, when cross-links are considered Complex proteome samples for x-links against large protein DB is very challenging Requires enormous computational resources Takes significantly longer time A staged search strategy Two stages First: search without considering any x-links Second: protein matches with significant scores from 1 stage will be searched for x-links

Results and Discussion A staged DB search violates the assumption used in the target-decoy search strategy Cannot be used to estimate and control false positive rates In future, nonstaged search will become feasible Escherichia coli proteome in vitro x-linked by BS3 The samples from two replicate experiments were presepareted by SDS-PAGE 41 bands were cut and in-gel digested with trypsin Analyzed by LC-MS/MS on a LTQ-FT mass spectrometer 341,613 MS2 spectra

Results and Discussion DB: Escherichia coli K-12 strain protein(4,285 protein) 37,600,000 peptides were calculated (36.70 min) 51,992 PSMs were identified 3,393 were x-linked matches The peptides without any across-links peptides are dominated The limited efficiency of x-linking experiment High complexity of the sample Further made the identification of x-linked peptides and even more challenging 59 proteins were identified(among 456 proteins) with one or more significant x-links with a score higher than 20

Results and Discussion Table 3 shows the top 20 x-links identified 12 x-links from 9 proteins were verified for spatial plausibility by comparison with the published 3D structures 6 x-links have no available structural data In summary, only a limited number of x-links can be identified in complex proteome samples using LC-MS/MS Due to the dominating noncross-linked peptides and high sample complexity The proteome samples are purified and/or enriched for x-linked peptides

Conclusions A new DB search algorithm Developed to identify intact x-links in proteins and peptides Based on the validated statistical scoring models High x-linking reagent to protein ratio favors the x-link determination in ratio of 1:1 to 25:1 at a protein concentration of 0.12 mg/mL Capable of discriminating true positive from false ones The distributions of statistical scores and ROC analysis