Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

Similar presentations


Presentation on theme: "Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation."— Presentation transcript:

1 Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2002. Feel free to use images in it with PROPER acknowledgement.

2 Do not reproduce without permission 2 Gerstein.info/talks (c) 2003 2 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Computational Proteomics of Protein Complexes Mark B Gerstein Yale U Talk at NIH 2003.04.07

3 Do not reproduce without permission 3 Gerstein.info/talks (c) 2003 3 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The Interactome: the Next ‘omic Step Interactome Proteome Transcriptome Genome

4 Do not reproduce without permission 4 Gerstein.info/talks (c) 2003 4 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The popularity of interactome information

5 Do not reproduce without permission 5 Gerstein.info/talks (c) 2003 5 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Computational Proteomics of Complexes 1.Interactions provide a systematic way of defining protein function on a genomic scale 2.Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome 3.Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data 4.Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non- interaction information (combining #1 and #2)

6 Do not reproduce without permission 6 Gerstein.info/talks (c) 2003 6 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Circumscribing Protein Function in terms of Interactions

7 Do not reproduce without permission 7 Gerstein.info/talks (c) 2003 7 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Understanding Protein Function on a Genomic Scale 250 of 650 known on chr. 22 [Dunham et al.] >>30K+ Proteins in Entire Human Genome (alt. splicing).…… ~650

8 Do not reproduce without permission 8 Gerstein.info/talks (c) 2003 8 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Issues in defining protein function on a genomic scale Multi-functionality: 2 functions/protein (also 2 proteins/function) Role Conflation: molecular, cellular, phenotypic Fun terms… but do they scale? Starry night Sarah (affects female fertility) ; Sonic; Darkener of apricot & suppressor of white apricot; Redtape, gridlock, roadblock (when mutated block transport along axons) ; ROP vs ROM ( "Regulator of Copy Number" or RNA-I-II-complex-binding-protein) For now, definable aspects of function: interactions, location, enzymatic rxn. [Babbit]

9 Do not reproduce without permission 9 Gerstein.info/talks (c) 2003 9 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ontologies for function: Networks, Hierarchies, DAGs

10 Do not reproduce without permission 10 Gerstein.info/talks (c) 2003 10 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ontologies for function: Interaction vectors Lan et al. IEEE (2002) & COSB (2003)

11 Do not reproduce without permission 11 Gerstein.info/talks (c) 2003 11 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Validating and Integrating Genomic Protein-Protein Interaction Datasets with Known Complexes

12 Do not reproduce without permission 12 Gerstein.info/talks (c) 2003 12 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein interaction data Databases (BIND, DIP, MIPS etc.)  literature High-throughput datasets  in vivo pull down  yeast two-hybrid Computational predictions  Tangential genomic data Expression data Phenotypic data Localization Data

13 Do not reproduce without permission 13 Gerstein.info/talks (c) 2003 13 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Combining interaction data High-throughput data is less reliable than more careful, smaller scale experiments  Orthogonal datasets Combining data increases  accuracy  coverage How to do this in a quantitative way?  How to weight the different data sources?  General classification problem (machine learning)  Bayesian networks: probabilistic

14 Do not reproduce without permission 14 Gerstein.info/talks (c) 2003 14 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Example of data integration: RNA polymerase II Which subunits interact? -> protein-protein interaction experiments Kornberg et al., 2001 Compare with Gold Std. structure: Edwards, Kus, Jansen, Greenbaum, Greenblatt, Gerstein, TIG (2002)

15 Do not reproduce without permission 15 Gerstein.info/talks (c) 2003 15 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II

16 Do not reproduce without permission 16 Gerstein.info/talks (c) 2003 16 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II

17 Do not reproduce without permission 17 Gerstein.info/talks (c) 2003 17 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Interaction experiments before structure was known

18 Do not reproduce without permission 18 Gerstein.info/talks (c) 2003 18 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II

19 Do not reproduce without permission 19 Gerstein.info/talks (c) 2003 19 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier

20 Do not reproduce without permission 20 Gerstein.info/talks (c) 2003 20 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier

21 Do not reproduce without permission 21 Gerstein.info/talks (c) 2003 21 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA ploymerase II

22 Do not reproduce without permission 22 Gerstein.info/talks (c) 2003 22 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of interaction data sets. Data set Method

23 Do not reproduce without permission 23 Gerstein.info/talks (c) 2003 23 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of experimental data with gold standards Positives 8250 interactions in MIPS complexes Negatives ~2.7 M pairs in diff. Subcellular compartments TP FP Set of experimental “interactions”

24 Do not reproduce without permission 24 Gerstein.info/talks (c) 2003 24 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gavin UetzHo 90/556711/135 1357/6226 6/6 353/212 18/6 15/1 TP / FP Combining experimental data Jansen et al. JSFG 2002

25 Do not reproduce without permission 25 Gerstein.info/talks (c) 2003 25 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Integrating Structural Complexes with Non-interaction Genomic Information: Using them to Interpret Gene Expression data

26 Do not reproduce without permission 26 Gerstein.info/talks (c) 2003 26 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 Format of Gene Expression Data

27 Do not reproduce without permission 27 Gerstein.info/talks (c) 2003 27 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCMs prots. ORC Polym.  &  Expression Correlations Segment Replication Complex into Component Parts

28 Do not reproduce without permission 28 Gerstein.info/talks (c) 2003 28 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Range of Expression Correlations within Complexes Replication Cplx Overall.05 ORC.19, MCMs.75 Pol. .45, .75, Ribosome Overall.80 Large.80 Small.81 Proteasome Overall.43 20S.50 19S.51

29 Do not reproduce without permission 29 Gerstein.info/talks (c) 2003 29 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein-Protein Interactions & Expression between selected expression timecourses (all pairs, control) (strong interactions in perm- anent complexes, clearly diff.) Cell Cycle CDC28 expt. (Davis) Sets of interactions (from MIPS) (Uetz et al.) Pairwise interactions

30 Do not reproduce without permission 30 Gerstein.info/talks (c) 2003 30 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Significance of correlations (complexes) PermanentTransient/other Jansen et al., Genome Research, 2002

31 Do not reproduce without permission 31 Gerstein.info/talks (c) 2003 31 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permanent v. Transient Complexes Jansen et al., Genome Research, 2002

32 Do not reproduce without permission 32 Gerstein.info/talks (c) 2003 32 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Transient complexes Example: replication complex Subparticles behave like permanent complexes Jansen et al., Genome Research, 2002 Permanent complexes show strong co- expression vs. Transient complexes have weaker co- expression

33 Do not reproduce without permission 33 Gerstein.info/talks (c) 2003 33 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Genome-wide prediction of protein complexes based on both high- throughput interaction data and non- interaction, genomic information

34 Do not reproduce without permission 34 Gerstein.info/talks (c) 2003 34 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Global Network of 3 Different Types of Relationships ~313K significant relationships from ~18M possible

35 Do not reproduce without permission 35 Gerstein.info/talks (c) 2003 35 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Global Network of 3 Different Types of Relationships Simultaneous 188K Inverted 63K Shifted 67K ~313K significant relationships from ~18M possible

36 Do not reproduce without permission 36 Gerstein.info/talks (c) 2003 36 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Globally, how well do expression relationships predict known interactions? Coverage of the 8250 Known Interactions in Complexes Found [MIPS] Random ~2% 1x (313K/18M) 24x Enrichment Compared to Randomized Expression Relationships CC: 313K relationships from ~18M possible from clustering cell-cycle expt. CC 42%

37 Do not reproduce without permission 37 Gerstein.info/talks (c) 2003 37 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Combining Expression Data Sets Increases Coverage & Decreases Noise Coverage of the 8250 Known Interactions in Complexes Found [MIPS] KO: 278K relationships from clustering knock-out profiles [Rosetta] KO 34% 22x Enrichment Compared to Randomized Expression Relationships

38 Do not reproduce without permission 38 Gerstein.info/talks (c) 2003 38 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Combining Expression Data Sets Increases Coverage & Decreases Noise Coverage of the 8250 Known Interactions in Complexes Found [MIPS] CC: 313K relationships from ~18M possible from clustering cell-cycle expt. CC 42% 24x KO: 278K relationships from clustering knock-out profiles [Rosetta] KO 34% 22x KO v CC 55% 111x KO ^ CC 21% 254x   Enrichment Compared to Randomized Expression Relationships

39 Do not reproduce without permission 39 Gerstein.info/talks (c) 2003 39 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Computational Proteomics of Complexes 1.Interactions provide a systematic way of defining protein function on a genomic scale 2.Known complexes provide a benchmark to validate and integrate genome-wide interaction experiments, providing a more accurate interactome 3.Known complexes provide a focus for the intergration of (non-interaction) genomic information – e.g. expression data 4.Extrapolating from known complexes, one can predict protein complexes on a genome-scale via integrating experimental interactions and non- interaction information (combining #1 and #2)

40 Do not reproduce without permission 40 Gerstein.info/talks (c) 2003 40 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu For the Future Developing an accurate interactome for the cell, from prediction and through integration of high-throughput information Development of statistical approaches to combine and integrate information Development of database technologies to store hetrogeneous and noisy genome-wide interaction datasets A moderate number of structural complexes are very useful as gold standard data

41 Do not reproduce without permission 41 Gerstein.info/talks (c) 2003 41 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein complexes & Structural Genomics A computational challenge following from the solution of the partslist  Given many monomeric structures produced by structural genomics, predict (or rationalize) the interactome through docking Maybe many structures will be only be solved as complexes….

42 Do not reproduce without permission 42 Gerstein.info/talks (c) 2003 42 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Association between Protein Sequence Features and Experimental Progress

43 Do not reproduce without permission 43 Gerstein.info/talks (c) 2003 43 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Bottlenecks in analysis of all of TargetDB (Interologs)

44 Do not reproduce without permission 44 Gerstein.info/talks (c) 2003 44 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Acknowledgements J Qian, R Jansen, A Drawid, C Wilson, D Greenbaum, C Goh, N Lan, H Hegyi, R Das, S Douglas, B Stenger J Lin, Y Kluger Collaborators M Snyder (A Kumar, H Zhu, …) A Edwards, B Kus, J Greenblatt NIH GeneCensus.org


Download ppt "Do not reproduce without permission 1 Gerstein.info/talks (c) 2003 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation."

Similar presentations


Ads by Google