Download presentation
Presentation is loading. Please wait.
Published byMagnus Wheeler Modified over 9 years ago
1
Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2002. Feel free to use images in it with PROPER acknowledgement.
2
Do not reproduce without permission 2 Gerstein.info/talks (c) 2003 2 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Computational Proteomics of Protein Complexes Mark B Gerstein Yale U Talk at NIH 2003.04.07
3
Do not reproduce without permission 3 Gerstein.info/talks (c) 2003 3 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The Interactome: the Next ‘omic Step Interactome Proteome Transcriptome Genome
4
Do not reproduce without permission 4 Gerstein.info/talks (c) 2003 4 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The popularity of interactome information
5
Do not reproduce without permission 5 Gerstein.info/talks (c) 2003 5 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Computational Proteomics of Complexes 1.Interactions provide a systematic way of defining protein function on a genomic scale 2.Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome 3.Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data 4.Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non- interaction information (combining #1 and #2)
6
Do not reproduce without permission 6 Gerstein.info/talks (c) 2003 6 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Circumscribing Protein Function in terms of Interactions
7
Do not reproduce without permission 7 Gerstein.info/talks (c) 2003 7 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Understanding Protein Function on a Genomic Scale 250 of 650 known on chr. 22 [Dunham et al.] >>30K+ Proteins in Entire Human Genome (alt. splicing).…… ~650
8
Do not reproduce without permission 8 Gerstein.info/talks (c) 2003 8 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Issues in defining protein function on a genomic scale Multi-functionality: 2 functions/protein (also 2 proteins/function) Role Conflation: molecular, cellular, phenotypic Fun terms… but do they scale? Starry night Sarah (affects female fertility) ; Sonic; Darkener of apricot & suppressor of white apricot; Redtape, gridlock, roadblock (when mutated block transport along axons) ; ROP vs ROM ( "Regulator of Copy Number" or RNA-I-II-complex-binding-protein) For now, definable aspects of function: interactions, location, enzymatic rxn. [Babbit]
9
Do not reproduce without permission 9 Gerstein.info/talks (c) 2003 9 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ontologies for function: Networks, Hierarchies, DAGs
10
Do not reproduce without permission 10 Gerstein.info/talks (c) 2003 10 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ontologies for function: Interaction vectors Lan et al. IEEE (2002) & COSB (2003)
11
Do not reproduce without permission 11 Gerstein.info/talks (c) 2003 11 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Validating and Integrating Genomic Protein-Protein Interaction Datasets with Known Complexes
12
Do not reproduce without permission 12 Gerstein.info/talks (c) 2003 12 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein interaction data Databases (BIND, DIP, MIPS etc.) literature High-throughput datasets in vivo pull down yeast two-hybrid Computational predictions Tangential genomic data Expression data Phenotypic data Localization Data
13
Do not reproduce without permission 13 Gerstein.info/talks (c) 2003 13 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Combining interaction data High-throughput data is less reliable than more careful, smaller scale experiments Orthogonal datasets Combining data increases accuracy coverage How to do this in a quantitative way? How to weight the different data sources? General classification problem (machine learning) Bayesian networks: probabilistic
14
Do not reproduce without permission 14 Gerstein.info/talks (c) 2003 14 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Example of data integration: RNA polymerase II Which subunits interact? -> protein-protein interaction experiments Kornberg et al., 2001 Compare with Gold Std. structure: Edwards, Kus, Jansen, Greenbaum, Greenblatt, Gerstein, TIG (2002)
15
Do not reproduce without permission 15 Gerstein.info/talks (c) 2003 15 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II
16
Do not reproduce without permission 16 Gerstein.info/talks (c) 2003 16 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II
17
Do not reproduce without permission 17 Gerstein.info/talks (c) 2003 17 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Interaction experiments before structure was known
18
Do not reproduce without permission 18 Gerstein.info/talks (c) 2003 18 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II
19
Do not reproduce without permission 19 Gerstein.info/talks (c) 2003 19 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier
20
Do not reproduce without permission 20 Gerstein.info/talks (c) 2003 20 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier
21
Do not reproduce without permission 21 Gerstein.info/talks (c) 2003 21 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA ploymerase II
22
Do not reproduce without permission 22 Gerstein.info/talks (c) 2003 22 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of interaction data sets. Data set Method
23
Do not reproduce without permission 23 Gerstein.info/talks (c) 2003 23 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of experimental data with gold standards Positives 8250 interactions in MIPS complexes Negatives ~2.7 M pairs in diff. Subcellular compartments TP FP Set of experimental “interactions”
24
Do not reproduce without permission 24 Gerstein.info/talks (c) 2003 24 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gavin UetzHo 90/556711/135 1357/6226 6/6 353/212 18/6 15/1 TP / FP Combining experimental data Jansen et al. JSFG 2002
25
Do not reproduce without permission 25 Gerstein.info/talks (c) 2003 25 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Integrating Structural Complexes with Non-interaction Genomic Information: Using them to Interpret Gene Expression data
26
Do not reproduce without permission 26 Gerstein.info/talks (c) 2003 26 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 Format of Gene Expression Data
27
Do not reproduce without permission 27 Gerstein.info/talks (c) 2003 27 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCMs prots. ORC Polym. & Expression Correlations Segment Replication Complex into Component Parts
28
Do not reproduce without permission 28 Gerstein.info/talks (c) 2003 28 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Range of Expression Correlations within Complexes Replication Cplx Overall.05 ORC.19, MCMs.75 Pol. .45, .75, Ribosome Overall.80 Large.80 Small.81 Proteasome Overall.43 20S.50 19S.51
29
Do not reproduce without permission 29 Gerstein.info/talks (c) 2003 29 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein-Protein Interactions & Expression between selected expression timecourses (all pairs, control) (strong interactions in perm- anent complexes, clearly diff.) Cell Cycle CDC28 expt. (Davis) Sets of interactions (from MIPS) (Uetz et al.) Pairwise interactions
30
Do not reproduce without permission 30 Gerstein.info/talks (c) 2003 30 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Significance of correlations (complexes) PermanentTransient/other Jansen et al., Genome Research, 2002
31
Do not reproduce without permission 31 Gerstein.info/talks (c) 2003 31 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permanent v. Transient Complexes Jansen et al., Genome Research, 2002
32
Do not reproduce without permission 32 Gerstein.info/talks (c) 2003 32 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Transient complexes Example: replication complex Subparticles behave like permanent complexes Jansen et al., Genome Research, 2002 Permanent complexes show strong co- expression vs. Transient complexes have weaker co- expression
33
Do not reproduce without permission 33 Gerstein.info/talks (c) 2003 33 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Genome-wide prediction of protein complexes based on both high- throughput interaction data and non- interaction, genomic information
34
Do not reproduce without permission 34 Gerstein.info/talks (c) 2003 34 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Global Network of 3 Different Types of Relationships ~313K significant relationships from ~18M possible
35
Do not reproduce without permission 35 Gerstein.info/talks (c) 2003 35 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Global Network of 3 Different Types of Relationships Simultaneous 188K Inverted 63K Shifted 67K ~313K significant relationships from ~18M possible
36
Do not reproduce without permission 36 Gerstein.info/talks (c) 2003 36 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Globally, how well do expression relationships predict known interactions? Coverage of the 8250 Known Interactions in Complexes Found [MIPS] Random ~2% 1x (313K/18M) 24x Enrichment Compared to Randomized Expression Relationships CC: 313K relationships from ~18M possible from clustering cell-cycle expt. CC 42%
37
Do not reproduce without permission 37 Gerstein.info/talks (c) 2003 37 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Combining Expression Data Sets Increases Coverage & Decreases Noise Coverage of the 8250 Known Interactions in Complexes Found [MIPS] KO: 278K relationships from clustering knock-out profiles [Rosetta] KO 34% 22x Enrichment Compared to Randomized Expression Relationships
38
Do not reproduce without permission 38 Gerstein.info/talks (c) 2003 38 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Combining Expression Data Sets Increases Coverage & Decreases Noise Coverage of the 8250 Known Interactions in Complexes Found [MIPS] CC: 313K relationships from ~18M possible from clustering cell-cycle expt. CC 42% 24x KO: 278K relationships from clustering knock-out profiles [Rosetta] KO 34% 22x KO v CC 55% 111x KO ^ CC 21% 254x Enrichment Compared to Randomized Expression Relationships
39
Do not reproduce without permission 39 Gerstein.info/talks (c) 2003 39 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Computational Proteomics of Complexes 1.Interactions provide a systematic way of defining protein function on a genomic scale 2.Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome 3.Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data 4.Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non- interaction information (combining #1 and #2)
40
Do not reproduce without permission 40 Gerstein.info/talks (c) 2003 40 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu For the Future Developing an accurate interactome for the cell, from prediction and through integration of high-throughput information Development of statistical approaches to combine and integrate information Development of database technologies to store hetrogeneous and noisy genome-wide interaction datasets A moderate number of structural complexes are very useful as gold standard data
41
Do not reproduce without permission 41 Gerstein.info/talks (c) 2003 41 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein complexes & Structural Genomics A computational challenge following from the solution of the partslist Given many monomeric structures produced by structural genomics, predict (or rationalize) the interactome through docking Maybe many structures will be only be solved as complexes….
42
Do not reproduce without permission 42 Gerstein.info/talks (c) 2003 42 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Association between Protein Sequence Features and Experimental Progress
43
Do not reproduce without permission 43 Gerstein.info/talks (c) 2003 43 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Bottlenecks in analysis of all of TargetDB (Interologs)
44
Do not reproduce without permission 44 Gerstein.info/talks (c) 2003 44 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Acknowledgements J Qian, R Jansen, A Drawid, C Wilson, D Greenbaum, C Goh, N Lan, H Hegyi, R Das, S Douglas, B Stenger J Lin, Y Kluger Collaborators M Snyder (A Kumar, H Zhu, …) A Edwards, B Kus, J Greenblatt NIH GeneCensus.org
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.