Download presentation
Presentation is loading. Please wait.
Published bySuzanna Short Modified over 9 years ago
1
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center
2
Systems Biology 2 Structured High-Throughput Experiments Knowledge Databases
3
molecular biology ↕ phenotype Systems Biology 3 Knowledge Databases Structured High-Throughput Experiments Localization Function Process Interactions Pathway Mutation Proteomics Sequencing Microarrays Metabolomics molecular biology ↕ biology
4
molecular biology ↕ phenotype Systems Biology 4 Mathematical Models Structured High-Throughput Experiments Localization Function Process Interactions Pathway Mutation Proteomics Sequencing Microarrays Metabolomics molecular biology ↕ biology Knowledge Databases
5
molecular biology ↕ phenotype Systems Biology 5 Mathematical Models Structured High-Throughput Experiments Localization Function Process Interactions Pathway Mutation Proteomics Sequencing Microarrays Metabolomics molecular biology ↕ biology Knowledge Databases Functional Annotation Enrichment
6
molecular biology ↕ phenotype Systems Biology 6 Mathematical Models Structured High-Throughput Experiments Localization Function Process Interactions Pathway Mutation Proteomics Sequencing Microarrays Metabolomics molecular biology ↕ biology Knowledge Databases Functional Annotation Enrichment
7
molecular biology ↕ phenotype Systems Biology 7 Mathematical Models Structured High-Throughput Experiments Localization Function Process Interactions Pathway Mutation Proteomics Sequencing Microarrays Metabolomics molecular biology ↕ biology Knowledge Databases Functional Annotation Enrichment
8
Functional Annotation Enrichment In any draw, we expect: ~ 5 "evens", ~ 2 "≤ 10", etc. Each ball is equally likely Balls are independent p-value is surprise! For transcriptomics: Genes↔ Balls Genome↔ Tumbler Diff. Expr.↔ Draw Annotation↔ "evens",… 8 Draw 10 of 50!
9
Why not in proteomics? Double counting and false positives… …due to traditional protein inference Proteomics cannot see all proteins… …proteins are not equally likely to be drawn Good relative abundance is hard… …extra chemistries, workflows, and software …missing values are particularly problematic 9
10
In proteomics… Double counting and false positives… Use generalized protein parsimony Proteomics cannot see all proteins… Use identified proteins as background Good relative abundance is hard… Model differential spectral counts directly 10
11
Ignore some PSMs FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered 11 10% Proteins PSMs
12
Ignore some PSMs FDR filtering leaves some false PSMs Enforce strict protein inference criteria Leave some PSMs uncovered 12 Proteins PSMs 90%
13
Match uncovered PSMs to FDR 13
14
Plasma membrane enrichment Pellicle enrichment of plasma membrane Choksawangkarn et al. JPR 2013 (Fenselau Lab) Six replicate LC-MS/MS analyses each Cell-lysate (44,861 MS/MS) Fe 3 O 4 -Al 2 O 3 pellicle (21,871 MS/MS) 625 3-unique proteins to match 10% FDR: Lysate: 18,976 PSMs; Pellicle: 13,723 PSMs 89 proteins with significantly (< 10 -5 ) increased counts 14
15
Plasma membrane enrichment Na/K+ ATPase subunit alpha-1 (P05023): Lysate: 1; Pellicle: 90; p-value: 5.2 x 10 -33 Transferrin receptor protein 1 (P02786): Lysate: 17; Pellicle: 63; p-value: 2.0 x 10 -11 DAVID Bioinformatics analysis (89/625): Plasma membrane (GO:0005886) : 29 (5.2 x 10 -5 ) Transmembrane (SwissProtKW): 24 (1.3 x 10 -6 ) Transmembrane (SwissProtKW): Lysate: 524; Pellicle: 1335; p-value: 2.6 x 10 -158 15
16
A protein's PSMs rise and fall together! 16
17
A protein's PSMs rise and fall together? 17
18
Anomalies indicate proteoforms 18
19
Nascent polypeptide-associated complex subunit alpha 19 7.3 x 10 -8
20
20 Pyruvate kinase isozymes M1/M2 2.5 x 10 -5
21
Summary Functional annotation enrichment for proteomics too: Careful counting (generalized parsimony) Differential abundance by spectral counts Use (multivariate-)hypergeometric model for Differential abundance by spectral counts Proteoform detection 21
22
HER2/Neu Mouse Model of Breast Cancer Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue by LC-MS/MS 1.4 million MS/MS spectra Peptide-spectrum assignments Normal samples (N n ): 161,286 (49.7%) Tumor samples (N t ): 163,068 (50.3%) 4270 proteins identified in total 2-unique generalized protein parsimony 22
23
Distribution of p-values (Yeast) 23
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.