Download presentation
Presentation is loading. Please wait.
Published byGunnar Thorsen Modified over 5 years ago
1
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. A small MudPIT dataset of K562 proteins, the manually validated subset of this dataset, and a manually validated dataset of peptides from standard proteins (Samples 1 and 2 in Table I) were analyzed by Sim, XCorr, Mowse, and PIC scores as described under “Materials and Methods.” Two types of searches were carried out: a “normal search” with the IPI database, allowing up to two missed tryptic cleavages and missed cleavage at KP or RP, or an “inverted search” with the same parameters but using a database where each protein entry had its sequence inverted from C to N terminus. Only peptides with observed mass >950 Da and with standard deviation of DTA ion intensities >1,000 counts were included. A, Sim score distribution for Sample 2 using sDTA files (⋄) or unprocessed DTA files (⋄). A clear bimodal distribution is achieved using sDTA files compared with unprocessed files, implying higher discrimination between correct and incorrect assignments. The ratio of areas under the two peaks in this distribution is ∼1:3, consistent with the expected number of tryptic peptide MS/MS spectra in this dataset (7). B, Sim score distribution for the Sample 2 dataset, searched against an inverted protein sequence database where all sequences are false positives. Little difference can be seen between sDTA files (▵) and unprocessed DTA files (▴). C, Sim score distribution for the standard protein dataset, processed to sDTA files and searched against a normal database (□) or an inverted database (▵). D, Sim score distribution for the Sample 2 dataset showing all assignments (⋄), correct identifications validated manually (□), and identifications from an inverted database search (▵). Comparisons of XCorr versus Sim (E) and Mowse versus Sim (F) for the Sample 2 dataset are shown. The data form two clusters where all cases with Sim score >0.53 have been validated by manual analysis (see Supplemental Fig. 6). Many validated assignments with high Sim could not be captured by Mowse or XCorr, whose values are below high confidence thresholds. G, Sim versus PIC for the Sample 2 dataset. Cases with moderate PIC scores appear to have two parent peptide ions cosequenced during MS/MS sequencing. H, Sim versus PIC for false positives generated by searching the Sample 2 dataset against the inverted sequence database. This control shows that the distribution of scores for incorrect assignments closely resembles the low scoring peak between Sim = 0 and 0.5 for normal assignments (D). Few incorrect assignments occur within the range for manually validated assignments (Sim > 0.5), reflecting good discrimination by Sim. Shaojun Sun et al. Mol Cell Proteomics 2007;6:1-17 © 2007 The American Society for Biochemistry and Molecular Biology
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.