Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry.

Similar presentations


Presentation on theme: "Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry."— Presentation transcript:

1 Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Georgetown University Protein inference tools are poorly designed for FDR filtered peptide identifications: True peptide identifications cluster on relatively few true proteins, and False peptide identifications are spread across many different proteins, magnifying the number of false positive proteins. Boosting the number of peptide identifications at fixed FDR increases the number of false positive proteins. Successful protein inference: must ignore a significant proportion of peptide identifications. must ensure inferred proteins are supported by at least two unique peptides. Introduction Dominated and equivalent proteins can be quickly and easily eliminated. Unique peptides force proteins into the solution. Unresolved protein-peptide bipartite-graph can be decomposed into components. Many components are trivial. Greedy solution is optimal for most components. Branch-and-bound easily finds optimal for all. Traditional Protein Parsimony a) b) Peptides weighted by the number of spectra (peptide identifications) represented. Constrain the minimum number of unique peptides per protein. Minimize proteins covering a fixed proportion of the peptide identifications (c.f. FDR), or Maximize covered peptide identifications, subject to protein constraint(s). Greedy solution not necessarily feasible! Branch-and-bound readily finds optimal. Generalized Protein Parsimony c) tSPMDb – 92,985 LCQ MS/MS spectra of 18 protein standards and contaminants1; SwissProt Sigma49 – 32,691 LTQ MS/MS spectra of 49 human protein standards2 ; IPI Human Yeast – 162,420 LTQ MS/MS spectra from a yeast cell lysate2; Saccharomyces Genome Database. X!Tandem (no refinement), filter at 1% FDR FDR estimation using reversed target database 1HW Elim. – 1-hit wonders eliminated before parsimony analysis2. Comparison with ProteinProphet3 applied to FDR filtered peptide identifications PP – Protein Prophet prob. > 0; PP* – Protein Protein Prophet prob. ≥ (1-FDR) & # unique stripped peptides ≥ 2. Spectra and Peptide Identifications Figure 1: Inferred proteins for a) tSPMDb b) Sigma49, and c) Yeast datasets. a) Figure 2: Large connected components a) tSPMDb b) Sigma49, and c) Yeast protein-peptide bipartite-graphs. Rows: proteins, Columns: peptides. Inferred proteins should be supported by at least two unique peptides. Inferred proteins from FDR-filtered peptide-identifications should leave some peptides uncovered – especially one-hit-wonders. Branch-and-bound solves generalizations of the protein parsimony problem optimally. Conclusions Observed peptides in RED. c) Zhang, B.;  Chambers, M. C.;  Tabb, D. L Purvine, S.;  Picone, A. F.;  Kolker, E. 2004 Nesvizhskii, A. I.;  Keller, A.;  Kolker, E.;  Aebersold, R. 2003 References b)


Download ppt "Protein Inference by Generalized Protein Parsimony reduces False Positive Proteins in Bottom-Up Workflows Nathan J. Edwards, Department of Biochemistry."

Similar presentations


Ads by Google