Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Set Enrichment and Splicing Detection using Spectral Counting Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University.

Similar presentations


Presentation on theme: "Gene Set Enrichment and Splicing Detection using Spectral Counting Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University."— Presentation transcript:

1 Gene Set Enrichment and Splicing Detection using Spectral Counting Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University Medical Center

2 Outline Systems Biology Gene Sets & Functional Enrichment Balls in Urns Proteomics MS/MS and Peptide ID Quantitation and Spectrum Counting Differential Protein Abundance Detecting Splicing and Isoforms 2

3 Systems Biology 3 Mathematical Models Knowledge Databases High-Throughput Experiments

4 Systems Biology 4 Mathematical Models Knowledge Databases High-Throughput Experiments Sequencing Microarrays Proteomics Metabolomics molecular biology ↕ phenotype

5 Systems Biology 5 Mathematical Models Knowledge Databases High-Throughput Experiments UniProt OMIM Kegg molecular biology ↕ biology

6 Systems Biology 6 Mathematical Models Knowledge Databases High-Throughput Experiments Software Statistics Algorithms phenotype ↕ biology

7 Systems Biology 7 Mathematical Models Knowledge Databases High-Throughput Experiments Software Statistics Algorithms phenotype ↕ biology UniProt OMIM Kegg molecular biology ↕ biology Sequencing Microarrays Proteomics Metabolomics molecular biology ↕ phenotype

8 Gene Expression Analysis Differential expression via: Structured experiments Transcript measurements Statistics But now what? 8

9 Gene Expression Analysis Hengel et al. J Immunol. 2003. Structured experiment: CD4+/L-selectin- T-cells, vs CD4+/L-selectin+ T-cells Affymetrix Human Genome U95A Array Processing & Statistics MAS 4.0, t-Tests, FDR filtering, … 164 probe identifiers for upregulated genes. 9

10 Gene Expression Analysis 10 34529_AT 38816_AT 679_AT 37105_AT 34623_AT 36378_AT 35648_AT 33979_AT 34529_AT 1372_AT 38646_S_AT 35896_AT 34249_AT 40317_AT 32413_AT 33530_AT 32469_AT 34720_AT 36317_AT 31987_AT 33027_AT 35439_AT 36421_AT 966_AT 967_G_AT 31525_S_AT 38236_AT 34618_AT 34546_AT 31512_AT 40959_AT 38604_AT 33922_AT 40790_AT 35595_AT 33963_AT 33685_AT 35566_F_AT 33684_AT 36436_AT 37166_AT 34453_AT 1645_AT 39469_S_AT 38229_AT 38945_AT 37711_AT 39908_AT 1355_G_AT 38948_AT 1786_AT 39198_S_AT 606_AT 35091_AT 35090_G_AT 37954_AT 822_S_AT 36766_AT 37953_S_AT 38128_AT 40350_AT 37097_AT 33516_AT 38691_S_AT 34702_F_AT 31715_AT 1331_S_AT 34577_AT 33027_AT 38508_S_AT 32680_AT 39187_AT 31506_S_AT 31793_AT 40294_AT 40553_AT 1983_AT 32250_AT 37968_AT 33293_AT 40271_AT 32418_AT 33077_AT 38201_AT 2090_I_AT 34012_AT 34703_F_AT 38482_AT 40058_S_AT 34902_AT 34636_AT 41113_AT 35996_AT 40735_AT 34539_AT 41280_R_AT 37061_AT 34233_I_AT 41703_R_AT 37898_R_AT 35373_AT 37408_AT 35213_AT 31576_AT 39094_AT 32010_AT 919_AT 1855_AT 1391_S_AT 34436_AT 33371_S

11 Gene Expression Analysis 11 1112_g_atneural cell adhesion molecule 1 1331_s_attumor necrosis factor receptor superfamily, member 25 1355_g_atneurotrophic tyrosine kinase, receptor, type 2 1372_attumor necrosis factor, alpha-induced protein 6 1391_s_atcytochrome P450, family 4, subfamily A, polypeptide 11 1403_s_atchemokine (C-C motif) ligand 5 1419_g_atnitric oxide synthase 2, inducible 1575_atATP-binding cassette, sub-family B (MDR/TAP), member 1 1645_atKiSS-1 metastasis-suppressor 1786_atc-mer proto-oncogene tyrosine kinase 1855_atfibroblast growth factor 3 (murine mammary tumor virus integration site (v-int-2) oncogene homolog) 1890_atgrowth differentiation factor 15 ……

12 Gene Set Enrichment Candidate genes are “special” with respect to the experiment structure (phenotype) Are they special with respect to general biological knowledge? Are the candidate genes related? Can we filter out the noise? Can we expose associated genes? What genes' changes are linked to the experimental structure / phenotype? 12

13 Gene Sets Genes may be related in many ways: Same pathway, similar function, cellular location Cytoband, identified in previous study, etc. Define gene sets for relatedness GO Biological Process GO Molecular Function GO Cellular Component KEGG Pathway, Biocarta Pathway Biological knowledge databases 13

14 Gene Set Enrichment 14

15 Gene Set Enrichment 15

16 Gene Set Enrichment 16

17 Drawing Balls from Urns 17 1000 Balls, 900 Red, 100 Blue.

18 Drawing Balls from Urns 18 100 Balls Drawn at Random? # Red? # Blue?

19 Drawing Balls from Urns 19 How surprising is 5, 10, 15, 20, … blue?

20 Drawing Balls from Urns 20 How surprising is 30, 50, 70, … blue?

21 Drawing Balls from Urns 21 6 of 155 upregulated genes have "oxygen binding" GO annotation! All human genes ( = 25), blue is oxygen binding.

22 How surprised should we be? Classic problem in probability theory How well do the observed counts match the expected counts? Various mostly equivalent statistical tests are applied: Fisher exact test Hypergeometric Chi-Squared (χ 2 ) p-value measures "surprise". 22

23 23 Proteomics Proteins are the machines that drive much of biology Genes are merely the recipe The direct characterization of proteins en masse. What proteins are present? How much of each protein is present? Which proteins change in abundance?

24 24 Sample Preparation for Tandem Mass Spectrometry Enzymatic Digest and Fractionation

25 25 Single Stage MS MS

26 26 Tandem Mass Spectrometry (MS/MS) MS/MS

27 27 Peptide Fragmentation

28 LC-MS/MS Powerful combination of liquid chromatography (LC), and Tandem mass-spectrometry (MS/MS) Automatically collect 100k MS/MS spectra in an afternoon Tens of thousands of peptide/spectra assignments, Thousands of proteins identified 28

29 Spectral Counting Abundant proteins are more likely to be identified: Selection (by the instrument) for fragmentation is based on intensity More abundant ions are more likely to fragment in an informative manner A proteins' peptide identification count (spectra) can be used as a crude abundance measurement. Easy, cheap, (relative) protein quantitation 29

30 Differential Spectral Counts Spectral counts are too crude for classical (microarray) statistics. Fold change, t-tests, … However, we expect "similar" spectral counts when the protein abundance is unchanged. Recast as drawing balls from urns. 30

31 HER2/Neu Mouse Model of Breast Cancer Paulovich, et al. JPR, 2007 Study of normal and tumor mammary tissue by LC-MS/MS 1.4 million MS/MS spectra Peptide-spectrum assignments Normal samples (N n ): 161,286 (49.7%) Tumor samples (N t ): 163,068 (50.3%) 4270 proteins identified in total 31

32 Drawing Balls from Urns 32 All Normal SpectraAll Tumor Spectra Plastin-2 (Lcp1)827102 2.437E-123 Osteopontin (Spp1)33419 2.444E-62 Hypoxia up-regulated protein 1 (Hyou1)2007 1.437E-40

33 Functional Enrichment 374 proteins with "significantly" increased abundance in tumor tissue Use 4270 proteins as background! DAVID gene set enrichment: Protein translation RNA binding, splicing 33

34 Differential Spectral Counting Assumptions of the formal tests (Fisher exact, χ 2 ) are violated, so p-values can be misleading (too small) Use label permutation tests to compute empirical p-values. SLOW! Collapse spectral counts to protein sets (GO terms) directly: Potential to observe more subtle spectral count differences 34

35 35 Unannotated Splice Isoform

36 36 Unannotated Splice Isoform

37 37 Halobacterium sp. NRC-1 ORF: GdhA1 K-score E-value vs PepArML @ 10% FDR Many peptides inconsistent with annotated translation start site of NP_279651

38 What if there is no "smoking gun" peptide… 38

39 What if there is no "smoking gun" peptide… 39

40 What if there is no "smoking gun" peptide… 40

41 PKM2 in Peptide Atlas 41 experiments peptides

42 What if there is no "smoking gun" peptide… 42 ?

43 Nascent polypeptide-associated complex subunit alpha Long form is "muscle-specific" Exon 3 is missing from short form Peptide identifications provide evidence for long form only 9 peptides are specific to long form 6 peptides are found in both isoforms Urn with balls of 15 different colors p-value of observed spectral counts: 7.3E-8 43

44 Nascent polypeptide-associated complex subunit alpha 44

45 Pyruvate kinase isozymes M1/M2 Exon "substitution" changes sequence in the middle of the protein Peptide identifications provide evidence for both isoforms 3 peptides are specific to isoform 1 5 peptides are specific to isoform 2 Urn with balls of 63 colors for isoform 1 p-value of observed spec. counts: 2.46E-05 45

46 46 Pyruvate kinase isozymes M1/M2

47 Summary Systems biology requires: Experiments, Databases, Models Informaticians and Disease Experts Functional Enrichment: Quickly navigate knowledge databases using experiment derived genes Classical probability experiment: Balls & Urns How surprised should you be? Still require domain expert to pick out gems 47

48 Summary Proteomics: High-throughput protein comparison Proteome "sample" is identified Crude spectral count quantitation Differential protein abundance: Use Balls & Urns to find significant changes Apply functional enrichment tools Splicing detection: Perturbed peptide spectral counts provide evidence for splicing. Evaluate using Balls & Urns 48


Download ppt "Gene Set Enrichment and Splicing Detection using Spectral Counting Nathan Edwards Department of Biochemistry and Mol. & Cell. Biology Georgetown University."

Similar presentations


Ads by Google