Presentation is loading. Please wait.

Presentation is loading. Please wait.

False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology.

Similar presentations


Presentation on theme: "False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology."— Presentation transcript:

1 False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center

2 Peptide-Spectrum Matches Sigma49 – 32,691 LTQ MS/MS spectra of 49 human protein standards; IPI Human Yeast – 162,420 LTQ MS/MS spectra from a yeast cell lysate; SGD. X!Tandem E-value (no refinement), 1% FDR 2 Spectra used in: Zhang, B.; Chambers, M. C.; Tabb, D. L. 2007.

3 Traditional Protein Parsimony Select the smallest set of proteins that explain all identified peptides. Sensible principle, implies Eliminate equivalent/subset proteins Equivalent proteins are problematic: Which one to choose? Unique-protein peptides force the inclusion of proteins into solution True for most tools, even probability based ones Bad consequences for FDR filtered ids 3

4 Many proteins are easy Eliminate equivalent / dominated proteins Sigma49: 277 → 60 proteins Yeast:1226 → 1085 proteins Many components have a single protein: Sigma49: 52 ( 3 multi-protein) Yeast: 994 (43 multi-protein) "Unique" peptides force protein inclusion Sigma49: 16 single-peptide proteins Yeast: 476 single-peptide proteins 4

5 Must eliminate redundancy Contained proteins should not be selected 5 37 distinct peptides

6 Must eliminate redundancy Contained proteins should not be selected Even if they have some probability mass Number of sibling peptides matter less if they are shared. 6 1.0 0.8 0.7 0.0 1.0 Single AA Difference

7 1.0 0.0 1.0 Must ignore some PSMs A single additional peptide should not force protein into solution 7 Single AA Difference

8 Example from Yeast "Inosine monophosphate dehydrogenase" 4 gene family Contained proteins should not be selected Single peptide evidence for YML056C 8 1.0 0.6 0.0 1.0

9 Must ignore some PSMs Improving peptide identification sensitivity makes things worse! False PSMs don't cluster 9 10% 2x Proteins PSMs

10 Must ignore some PSMs Improving peptide identification sensitivity makes things worse! False PSMs don't cluster 10 Select Proteins to Explain True PSM% PSMs 90%

11 Must ignore some PSMs How do we choose? Maximize # peptides? Minimize FDR (naïve model)? Maximize # PSMs? 11

12 Generalized Protein Parsimony Weight peptides by number of PSMs Constrain unique peptides per protein Maximize explained peptides (PSMs) Match PSM filtering FDR to % uncovered PSMs Readily solved by branch-and-bound Permits complex protein/peptide constraints Reduces to traditional protein parsimony 12

13 Match FDR to uncovered PSMs 13 Traditional Parsimony at 1% FDR: 1085 (609 2+-Unique) Proteins

14 Software Filter multi-acquisition identifications by: FDR, E-value, probability Rewrite PSMs to reflect parsimony analysis PepXML, CSV, Excel Component-wise Peptide-Protein matrix: Selected, Dominant, Equivalent, Contained Selected protein accessions: …plus equivalents 14

15 Conclusions Many components are clear Doesn't matter what technique is used Traditional techniques do not handle the second protein in a component well A single additional peptide should not force Explain only the true PSM %: Determine protein criteria first Adjust PSM filter until explained peptides match 15


Download ppt "False-Discovery-Rate Aware Protein Inference by Generalized Protein Parsimony Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology."

Similar presentations


Ads by Google