Download presentation
Presentation is loading. Please wait.
Published byLionel Pierce Modified over 7 years ago
1
Database Search Algorithm for Identification of Intact Cross-Links in Proteins and Peptides Using Tandem Mass Sepctrometry 신성호
2
Abstract MassMatrix A new DB search algorithm to identify and characterize intact X-links in proteins and peptides with high confidence Test with BS3 x-linked Cytochrome C Five x-links were indentified and verified Discriminate true positives x-linked PSM from false ones The distribution of statistical scores for true and false positives ROC analysis Search for intact x-links in complex Escherichia coli samples
3
Introduction Indentification of X-links in Proteins
Provide invaluable information regarding a protein’s structure, conformation, and interactions DB Search Algorithms False positives need to be controlled Due to the increased search space for searches with x-links Traditional DB search program (SEQUEST, Mascot) Cannot be used for analysis of x-linked proteins/peptides
4
Introduction New DB Search Engine
Identify x-links in proteins and peptides Include three probability-based scoring algorithms Provided better sensitivity than Mascot, SEQUEST, OMMSA, X!Tandem for a given specificity For proteins/peptides without any x-links or disulfide bonds
5
Introduction Validated for peptides and proteins with disulfide bonds
By use of peptide standards with known disulfide bonds and bovine pancreatic ribonuclease A Tested using data sets Collected on a LTQ-FT mass spectrometry for the tryptic digests of Cytochrome C x-linked by BS3 Identify and verify five x-links for spatial plausibility by comparison with 3D structure
6
Experimental Section Material, Sample Preparation, and Mass Spectrometry Horse heart Cytochrome C, X-linking reagent BS3 The x-linked protein samples were purified by SDS-PAGE Monomer bands were cut and digested by trypsin Escherichia coli cells were cultured in LB broth using 200 rpm shaking speed Nano-LC-MS/MS experiments On a LTQ-FT mass spectrometry
7
Experimental Section DB Search and Search Parameters
Isotope distributions Deconvolution to obtain the charge states and monoisotopic m/z values of the precursor ions, during raw data -> mzXML DB: Cytochrome C protein sequence + decoy sequence randomized Cytochrome C sequences 41 mzXML data files were searched against an Escherichia coli k-12 strain sequence DB containing 4,285 protein sequences
8
Experimental Section The search parameters Enzyme: trypsin
Missed cleavage: 2 Modifications: variable iodoacetamide derivative of cysteine and variable oxidation of methionine Mass tolerances: 10 ppm for the precursor ions Da for product ions Maximum number of modification: 2 Peptide length: 5-40 amino acid residues Score threshold: 5.3 for pp for pptag
9
Experimental Section Fig 1 shows the structure and reaction of the x-linking reagent The chemical formula of the x-link between two lysine sites: C8H10O2( Da) 3 dead-end x-links
10
Results and Discussion
Search Algorithm X-link type Interchain and Intrachain x-link Peptides with more than 2 x-links are difficult to characterize Poor fragmentation and large size Up to 2 x-links Considered Peptide type Type 1: only have interchain x-links Type 2: only have intrachain x-links Type 3: both inter- and intrachin x-links Type 4: circular chins Dead-end x-links in peptides are considered as modification
11
Results and Discussion
X-link search algorithm based on the disulfide search algorithm 3 search modes Exploratory search mode All occurrences of A and B residues in the protein sequences are considered Confirmatory search mode Only the x-links specified in the DB by the user will be considered and searched against experimental data X-links are coded as “A($i)” and “B($i)”, i is the index number of the specified x-link
12
Results and Discussion
Semiexploratory search mode A limited exploratory search of x-links will be performed between the amino acid residues labeled($ or $x) in the DB The process of the MassMatrix Proteins digestion In silico based on the specified proteolysis reagents Fragmentation using the appropriate fragmentation model CID, ETD fragmentation methods Each chain undergoes fragmentation independently and internal fragments are not searched Only product ions created from the rupture of a single bond When one chain undergoes fragmentation, other(s) will be considered as a modification
13
Results and Discussion
Scoring against the experimental MS2 data The same as those used for peptides without any x-links and those with disulfide bonds as described previously 3 independent statistical scores, pp, pp2, and pptag pptag is the best standard MassMatrix produce protein and peptide match lists XMapper Generate x-link assignments from MassMatrix results Scoring N: the number of peptides assigned to the x-link np: the number of spectral matches for peptides p
14
Results and Discussion
Validation of the X-Link Search Algorithm Data set A tryptic digest of Cytochrome C x-linked by BS3 on a LTQ-FT Reagent to protein ratio: 25:1 Final protein concentration: 0.12 mg/mL MS2: 6,982 spectra DB (Table 1) Cytochrome C protein sequence + A reversed Cytochrome C sequences + 20 randomized Cytochrome C sequences When x-links were considered, theoretical peptides was dramatically increased(2.10 ×106, search time 55s)
15
Results and Discussion
Fig 2 shows 2 representative spectra for x-linked peptides (a): intrachain, (b): interchain, *: loss of ammonia, `: loss of water Fig 3 shows the pptag score distribution for TPs and FPs, and ROC analysis for the PSM identified in MassMatrix Scoring model can discriminate TPs from FPs ROC indicate that the algorithm performs well Area under the curve(AUC): 0.91(with X-links) and 0.92 Good sensitivity and specificity for both types of peptides with and without x-links
16
Results and Discussion
Cytochrome C Contains 19 lysine residues Potentially form 171 x-links between 2 lysine 25 x-links assigned Fig 4 shows the scores of all identified x-links for Cytochrome C are mapped in a heat map using XMapper A majority of the x-links are background(low occurrence and low scores), light blue and cyan Background are irrelevant and represent the noise
17
Results and Discussion
X-links formed on the Cytochrome C protein due to its representative 3D structure Present at higher occurrence and higher abundance 5 nonbackground x-links 5 x-links are further verified by comparison with the 3D structure as shown in Fig 5 The distances of the two lysine residues K25-K27: 5.3 Å, K86-K87: 3.4Å, K7-K100: 7.7Å K7-K27: 13.9Å, K99-K100: 13.6Å Slightly longer than the length of the x-link but reasonable given the differences between crystal and solution-phase structures
18
Results and Discussion
FDR The background x-links were at a similar level to that of the false positives for decoy proteins Can be controlled by the target-decoy search strategy FDR=FPx-link/(TPx-link+FPx-link) 5 nonbackground x-links are survived and all of the background x-links were filter at 5% FDR
19
Results and Discussion
Effect of Different X-Linking Conditions Different X-linking reagent to protein ratio 1:1, 2.5:1, 5:1, 10:1, 25:1, 50:1, and 100:1 Final protein concentration of 0.12 mg/mL Fig 6 shows the heat maps of the x-links identified All 5 nonbackground x-links were identified(≥25:1) Two or more nonbackground x-links were not identified(<25:1) Fig 7a shows the dependence of the scores of the five nonbackground x-links K86-K87 was independent of the ratio Others scores increased
20
Results and Discussion
Fig 7b shows the number of all the background x-links (FDR 5%) Total background increases(<5:1) Becomes much less significant(>5:1) All background x-links can be filtered at FDR 5% High x-linking reagent to protein ratio favors the x-link determination in a ratio range of 1:1 to 25:1(0.12mg/mL) The effect of protein concentration Reagent to protein ratio: 10:1 Various protein concentrations: 0.06, 0.12, 0.60, 2.4 mg/mL Fig 8 shows the heat maps of the x-links identified in four sample Higher protein concentration have higher scores for the 5 nonbackground x-links Improvement becomes much less significant(>0.60mg/mL)
21
Results and Discussion
Fig 9a shows the dependence of the scores of the nonbackground x-links on the protein concentration The number of total background x-links increases (≤0.12mg/mL) The number of background x-links at FDR 5% is independent of the protein concentration and stay low(Fig 9b) High protein concentration favors the x-link determination experiment and this benefit becomes insignificant (>0.12mg/mL at 10:1)
22
Results and Discussion
X-link Search of A Complex Proteome Sample The dramatically increased search space, when cross-links are considered Complex proteome samples for x-links against large protein DB is very challenging Requires enormous computational resources Takes significantly longer time A staged search strategy Two stages First: search without considering any x-links Second: protein matches with significant scores from 1 stage will be searched for x-links
23
Results and Discussion
A staged DB search violates the assumption used in the target-decoy search strategy Cannot be used to estimate and control false positive rates In future, nonstaged search will become feasible Escherichia coli proteome in vitro x-linked by BS3 The samples from two replicate experiments were presepareted by SDS-PAGE 41 bands were cut and in-gel digested with trypsin Analyzed by LC-MS/MS on a LTQ-FT mass spectrometer 341,613 MS2 spectra
24
Results and Discussion
DB: Escherichia coli K-12 strain protein(4,285 protein) 37,600,000 peptides were calculated (36.70 min) 51,992 PSMs were identified 3,393 were x-linked matches The peptides without any across-links peptides are dominated The limited efficiency of x-linking experiment High complexity of the sample Further made the identification of x-linked peptides and even more challenging 59 proteins were identified(among 456 proteins) with one or more significant x-links with a score higher than 20
25
Results and Discussion
Table 3 shows the top 20 x-links identified 12 x-links from 9 proteins were verified for spatial plausibility by comparison with the published 3D structures 6 x-links have no available structural data In summary, only a limited number of x-links can be identified in complex proteome samples using LC-MS/MS Due to the dominating noncross-linked peptides and high sample complexity The proteome samples are purified and/or enriched for x-linked peptides
26
Conclusions A new DB search algorithm
Developed to identify intact x-links in proteins and peptides Based on the validated statistical scoring models High x-linking reagent to protein ratio favors the x-link determination in ratio of 1:1 to 25:1 at a protein concentration of 0.12 mg/mL Capable of discriminating true positive from false ones The distributions of statistical scores and ROC analysis
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.