Download presentation
Presentation is loading. Please wait.
Published byMarilyn Dixon Modified over 9 years ago
1
False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center
2
Peptide-Spectrum Matches Sigma49 – 32,691 LTQ MS/MS spectra of 49 human protein standards; IPI Human Yeast – 162,420 LTQ MS/MS spectra from a yeast cell lysate; SGD. X!Tandem E-value (no refinement), 1% FDR 2 Spectra used in: Zhang, B.; Chambers, M. C.; Tabb, D. L. 2007.
3
Traditional Protein Parsimony Select the smallest set of proteins that explain all identified peptides. Sensible principle, implies Eliminate equivalent/subset proteins Equivalent proteins are problematic: Which one to choose? Unique-protein peptides force the inclusion of proteins into solution True for most tools, even probability based ones Bad consequences for FDR filtered ids 3
4
Many proteins are easy Eliminate equivalent / dominated proteins Sigma49: 277 → 60 proteins Yeast:1226 → 1085 proteins Many components have a single protein: Sigma49: 52 ( 3 multi-protein) Yeast: 994 (43 multi-protein) "Unique" peptides force protein inclusion Sigma49: 16 single-peptide proteins Yeast: 476 single-peptide proteins 4
5
Must eliminate redundancy Contained proteins should not be selected 5 37 distinct peptides
6
Must eliminate redundancy Contained proteins should not be selected Even if they have some probability mass Number of sibling peptides matter less if they are shared. 6 1.0 0.8 0.7 0.0 1.0 Single AA Difference
7
1.0 0.0 1.0 Must ignore some PSMs A single additional peptide should not force protein into solution 7 Single AA Difference
8
Example from Yeast "Inosine monophosphate dehydrogenase" 4 gene family Contained proteins should not be selected Single peptide evidence for YML056C 8 1.0 0.6 0.0 1.0
9
Must ignore some PSMs Improving peptide identification sensitivity makes things worse! False PSMs don't cluster 9 10% 2x Proteins PSMs
10
Must ignore some PSMs Improving peptide identification sensitivity makes things worse! False PSMs don't cluster 10 Select Proteins to Explain True PSM% PSMs 90%
11
Must ignore some PSMs How do we choose? Maximize # peptides? Minimize FDR (naïve model)? Maximize # PSMs? 11
12
Generalized Protein Parsimony Weight peptides by number of PSMs Constrain unique peptides per protein Maximize explained peptides (PSMs) Match PSM filtering FDR to % uncovered PSMs Readily solved by branch-and-bound Permits complex protein/peptide constraints Reduces to traditional protein parsimony 12
13
Match FDR to uncovered PSMs 13 Traditional Parsimony at 1% FDR: 1085 (609 2+-Unique) Proteins
14
Software Filter multi-acquisition identifications by: FDR, E-value, probability Rewrite PSMs to reflect parsimony analysis PepXML, CSV, Excel Component-wise Peptide-Protein matrix: Selected, Dominant, Equivalent, Contained Selected protein accessions: …plus equivalents 14
15
Conclusions Many components are clear Doesn't matter what technique is used Traditional techniques do not handle the second protein in a component well A single additional peptide should not force Explain only the true PSM %: Determine protein criteria first Adjust PSM filter until explained peptides match 15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.