Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2003, Feel free to use images in it with PROPER acknowledgement.
Do not reproduce without permission 2 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions Mark B Gerstein Yale U Talk at Experimental Biology
Do not reproduce without permission 3 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Understanding proteins, through analysis of populations rather than individuals
Do not reproduce without permission 4 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The central post- genomic term Greenbaum et al., Genome Res. 11:1463
Do not reproduce without permission 5 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The central post- genomic term Greenbaum et al., Genome Res. 11:1463
Do not reproduce without permission 6 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The central post- genomic term PubMed Hits Proteome Greenbaum et al., Genome Res. 11:1463
Do not reproduce without permission 7 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Many Manifestations of Proteins & Research Topics in Proteomics Analyzing protein fossils (pseudogenes) in genomes Predicting protein function on a genomic scale Comparing folds & families between proteomes Analyzing protein flexibility in terms of packing Structures Sequences ArraysGels
Do not reproduce without permission 8 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods
Do not reproduce without permission 9 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Simple models connecting mRNA and Proteins where k s,i and k d,i are the protein synthesis and degradation rate constants At steady state: P i = k s;i [mRNA i ] k d,i = k s,i [ mRNA i ] – k d,i P i dPi dt
Do not reproduce without permission 10 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Same mRNA levels yet protein data varied > 20X (N ~100, r = 0.9) Early Experiments: Gygi et al. MCB (1999)
Do not reproduce without permission 11 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Same mRNA levels yet protein data varied > 20X Do some ORFs bias the results? 73 proteins (69%) R = 0.36
Do not reproduce without permission 12 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data Integration High-throughput data is less reliable than smaller-scale experiments. Merging reduces noise. Combining data increases accuracy & coverage
Do not reproduce without permission 13 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Merged mRNA vs Protein (2D gels) r =0.67 Greenbaum et al Bioinformatics 2001
Do not reproduce without permission 14 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Outliers (> 2 SD from the mean) High Protein Metabolism (1) Energy(2) MIPS [Mewes et al.] Low Protein Prot. Syn. (5) Prot. Fate (6)
Do not reproduce without permission 15 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Merged mRNA vs Protein (2D gels + Mudpit) mRNA Set 6249 ORFsProtein Set # 2 2 2DE sets & 2 Mudpit ~2000 ORFs
Do not reproduce without permission 16 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Functional Categories Co-regulated proteins
Do not reproduce without permission 17 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Subcellular Localization
Do not reproduce without permission 18 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Very Broad Categories: Enrichment of Amino Acids in Transcriptome and Translatome relative to Genome
Do not reproduce without permission 19 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Changes in mRNA vs. protein Murine hematopoietic precursor MPRO change in expression hr [S Weissmann]
Do not reproduce without permission 20 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Changes in mRNA vs. protein Murine hematopoietic precursor MPRO change in expression hr R = 0.58 ~ 80% of the genes are located in the first and third quadrants
Do not reproduce without permission 21 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Changes in mRNA vs. protein Ratios of wt+gal to wt gal (ICAT vs microarray) Ideker et al Science, 2001 N ~ 290 r = 0.6
Do not reproduce without permission 22 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods
Do not reproduce without permission 23 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Expression & interactions Interactions form indirect relationship between mRNA expression & protein abundance Proteins in a complex should occur in stoichiometric equal amounts and (apx) rise & fall in expression in unison Types of interactions Protein complexes (e.g. proteasome, ribosome) Aggregated interactions in vivo pull down (Ho, Gavin) yeast two-hybrid (Uetz, Ito)
Do not reproduce without permission 24 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Interactions provide a way to define function Networks [Eisenberg et al.] Hierarchies & DAGs [Enzyme, Bairoch; GO, Ashburner; MIPS, Mewes, Frishman] Interaction Vectors [Lan et al, IEEE 90:1848]
Do not reproduce without permission 25 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Relationship of Interactions to Abs. Expression Level
Do not reproduce without permission 26 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein-Protein Interactions & Expression Correlations between selected expression timecourses (all pairs, control) (strong interactions in perm- anent complexes, clearly diff.) Cell Cycle CDC28 expt. (Davis) Sets of interactions (from MIPS) (Uetz et al.) Pairwise interactions
Do not reproduce without permission 27 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein-Protein Interactions & Expression Correlations Sets of interactions between selected expression timecourses (all pairs, control) (from MIPS) (strong interactions in perm- anent complexes, clearly diff.) (Uetz et al.) Cell Cycle CDC28 expt. (Davis) Pairwise interactions
Do not reproduce without permission 28 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 Representing Expression Correlations within a Large Complex in a Matrix
Do not reproduce without permission 29 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCMs prots. ORC Polym. & Expression Correlations Segment Replication Complex into Component Parts
Do not reproduce without permission 30 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Range of Expression Correlations within Complexes Replication Cplx Overall.05 ORC.19, MCMs.75 Pol. .45, .75, Ribosome Overall.80 Large.80 Small.81 Proteasome Overall.43 20S.50 19S.51
Do not reproduce without permission 31 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods
Do not reproduce without permission 32 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Predicting Interactions from Expression Analysis ~313K significant relationships from ~18M possible Correlated mRNA profiles predict possible interactions Very noisy
Do not reproduce without permission 33 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Overview of information integrated and Bayesian Formalism Data suggestive of interactions (co-expression, co-localization, similar essentiality) Noisy high-throughput experiments (Gavin et al., Uetz et al. &c) Gold-standard complexes (MIPS, Mewes, Frishman et al.) Jansen et al. Science 302:449
Do not reproduce without permission 34 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Overview of information integrated and Bayesian Formalism Cross-validated training and testing Thresholding L at various values Tabulation of observed TP and FP at various thresholds Jansen et al. Science (in press)Jansen et al. Science 302:449
Do not reproduce without permission 35 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of network against gold standard complex interactions Positives 8250 known interactions in MIPS complexes [Mewes] Negatives ~2.7 M pairs in diff. Subcellular compartments TP FP Set of predicted “interactions” [Related Data in Bind, DIP]
Do not reproduce without permission 36 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Observed TP/FP Ratio Tracks L, Suggesting a Threshold Jansen et al. Science 302:449
Do not reproduce without permission 37 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Observed TP/FP Ratio Tracks L, Suggesting a Threshold Jansen et al. Science 302:449
Do not reproduce without permission 38 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Observed TP/FP Ratio Tracks L, Suggesting a Threshold Jansen et al. Science 302:449
Do not reproduce without permission 39 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Integration of Features Gives Much Higher Likelihood Ratios than Any Individual Feature Jansen et al. Science 302:449
Do not reproduce without permission 40 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of Predictions with Known Complexes Jansen et al. Science 302:449
Do not reproduce without permission 41 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Example prediction: Mito. Ribosome Jansen et al. Science 302:449
Do not reproduce without permission 42 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison with new experiments J Greenblatt RFA cplx Jansen et al. Science 302:449
Do not reproduce without permission 43 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods
Do not reproduce without permission 44 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Intuition for how to do Integration of Interactomes Diverse sources of interaction information Databases (BIND, DIP, MIPS etc.) Individual expts. in literature High-throughput datasets in vivo pull down (Ho, Gavin) yeast two-hybrid (Uetz, Ito) Genomic data Expression Phenotypes Localization Functional Noisy High-throughput data is less reliable than smaller scale experiments [Grigorev, Bork] Combining data increases Accuracy & coverage [Church] How to do quantitatively? How to weight different data sources? General classification problem (machine learning) Bayesian Approaches….
Do not reproduce without permission 45 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Example of data integration: RNA polymerase II Which subunits interact? Based on protein-protein interaction experiments [Kornberg] Compare with Gold Std. structure Edwards, Kus, et al. TIG 18:529
Do not reproduce without permission 46 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II
Do not reproduce without permission 47 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II
Do not reproduce without permission 48 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Interaction experiments before structure was known
Do not reproduce without permission 49 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II
Do not reproduce without permission 50 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Integrate using naive Bayes classifier Data integration: RNA polymerase II
Do not reproduce without permission 51 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier
Do not reproduce without permission 52 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier
Do not reproduce without permission 53 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Weighted Voting: the Likelihood Ratio Vote: +2 = With weights: likelihood ratio L = L 1 + L 2 + L 3 …
Do not reproduce without permission 54 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Correlations between similar features
Do not reproduce without permission 55 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Relative quality of different expts. L = (TP/FP) (N/P) [for uncorrelated features]
Do not reproduce without permission 56 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods
Do not reproduce without permission 57 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Acknowledgements Protein Function Prediction (Networks.GersteinLab.org) J Qian, R Jansen, A Drawid, C Wilson, H Yu, D Greenbaum, J Lin, N Luscombe, H Hegyi, Y Kluger Pseudogenes (Pseudogene.org) P Harrison, Z Zhang, Y Liu, S Balasubramanian, P Bertone, T Johnson, J Karro Macromolecular Motions (MolMovDB.org) J Junker, H Yu, N Echols, V Alexandrov, W Krebs, D Milburn, U Lehnert Collaborators J Chang, R Basri, J Greenblatt (N Krogan) Yale CEGS M Snyder (A Kumar, H Zhu, M Bilgin …) S Weissmann, P Miller (K Cheung) NESG.org G Montelione, A Edwards (B Kuss) NIH, NSF Structural Proteomics (PartsList.org) C Goh, N Lan, H Hegyi, R Das, S Douglas, B Stenger
Do not reproduce without permission 58 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Web-based data analysis system with graphical interface Data processing Web-based data analysis system with graphical interface Data processing Data quality Web-based data analysis system with graphical interface Data processing Data quality Differential hybridisation scoring Web-based data analysis system with graphical interface Data processing Data quality Differential hybridisation scoring Graphical interface allows visualisation of data and comparison with original slides Slide level Web-based data analysis system with graphical interface Data processing Data quality Differential hybridisation scoring Graphical interface allows visualisation of data and comparison with original slides Slide level Spot level
Do not reproduce without permission 59 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Examples: SIM: Example Shifted Relationship (SIM)
Do not reproduce without permission 60 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MIM: Examples: Example Shifted Relationship (MIM)
Do not reproduce without permission 61 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ratios of wt+gal to wt gal ICAT vs microarray N ~ 290, r = 0.6 Ideker et al Science, 2001
Do not reproduce without permission 62 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu murine hematopoietic precursor MPRO change in expression hr
Do not reproduce without permission 63 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Outliers Generally… Alcohol dehydrogenase is also a stress induced protein in many organisms (Matton et al. 1990; An et al. 1991; Millar et al. 1994), Faster Ramp Up? Alternatively, it is possible to look into mRNA stability as a factor Presently there are many structures within mRNA that are thought to influence stability including, among others, stem loops, UTRs premature stops and uORFS (Klaff et al. 1996) MIPS fxns AA metabolism & Energy are 2X as likely to have high protein vs mRNA than the general population MIPS fxns Protein synthesis and Protein fate (folding, modification, destination) are more likely to have low protein vs mRNA than the general population
Do not reproduce without permission 64 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu murine hematopoietic precursor MPRO change in expression hr R = 0.58 ~ 80% of the genes are located in the first and third quadrants
Do not reproduce without permission 65 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Understanding Protein Function on a Genomic Scale Originally, 250 of 650 known on chr. 22 [Dunham et al.] >>30K+ Proteins in Entire Human Genome (with alt. splicing).…… ~650
Do not reproduce without permission 66 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Issues in defining protein function on a genomic scale Multi-functionality: 2 functions/protein (also 2 proteins/function) Role Conflation: molecular, cellular, phenotypic Fun terms… but do they scale?....
Do not reproduce without permission 67 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Names in Biology: Systematic? Yippee named for the reaction of a graduate student upon cloning protein. If she has a good result, she would write "yippee" in the margin of her notebook vulcan & klingon stranded at second mutant dies during development, usually in 2nd larval stage sarah affects female fertility (biblical ref.) Sonic & kryptonite Darkener of apricot & suppressor of white apricot ROP vs ROM "Regulator of Copy Number" or RNA-I- II-complex-binding-protein Barentsz named for Dutch explorer who froze to death near the North Pole. The mutant blocks the movement of a key mRNA, causing it to get stuck in wrong place Agoraphobic mutant for which the larvae look normal but never crawl out of the egg single-minded Redtape series of designations given to genes which, when mutated, block transport along axons. Lush & cheapdate former wants alcohol, later makes susceptible [Adapted from conversations + Am. Sci.]
Do not reproduce without permission 68 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Issues in defining protein function on a genomic scale Fun terms… but do they scale? Starry night (P Adler, ’94) For now, definable aspects of function: interactions, location, enzymatic rxn. [Babbit]
Do not reproduce without permission 69 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Predicting Protein Function on a Genome Scale Relating microarray experiments to protein abundance Local Clustering (to identify time-shifted and inverted relationships) Relating Clustering to Known Regulatory Relationships Relating them to protein interactions Bayesian methods to uniformly & optimally combine evidence (in application to integration of protein interaction data) Predicting interactions in yeast de novo from non-interaction data sources (with verification)