Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Differential Gene Expression
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Computational tools for whole-cell simulation Cara Haney (Plant Science) E-CELL: software environment for whole-cell simulation Tomita et al Bioinformatics.
Four of the many different types of human cells: They all share the same genome. What makes them different?
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB Data Mining Predicting Networks through Bayesian Integration #2 - Application Mark Gerstein, Yale.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Do not reproduce without permission 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein,
Protein domains vs. structure domains - an example.
Copyright  2004 limsoon wong Assessing Reliability of Protein- Protein Interaction Experiments Limsoon Wong Institute for Infocomm Research.
25. Lecture WS 2003/04Bioinformatics III1 Integrating Protein-Protein Interactions: Bayesian Networks - Lot of direct experimental data coming about protein-protein.
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
Finding Transcription Modules from large gene-expression data sets Ned Wingreen – Molecular Biology Morten Kloster, Chao Tang – NEC Laboratories America.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Protein-Protein Interaction Screens. Bacterial Two-Hybrid System selectable marker RNA polymerase DNA binding protein bait target sequence target.
Epistasis Analysis Using Microarrays Chris Workman.
Analysis of microarray data
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Correlating mRNA and protein abundance via genomic and proteomic characteristics Dov Greenbaum Gerstein Lab Thesis Seminar April 21, 2004.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
12. Lecture WS 2004/05Bioinformatics III1 Direct comparison of different data sets Bayesian Network approach V12: Reliability of Protein Interaction Networks.
12.4 Gene Regulation and Mutation
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
AP Biology Control of Eukaryotic Genes.
Protein analysis and proteomics (Part 2 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Interactions and more interactions
Regulation of Gene Expression Eukaryotes
Finish up array applications Move on to proteomics Protein microarrays.
Literature reviews revised is due4/11 (Friday) turn in together: revised paper (with bibliography) and peer review and 1st draft.
Proteome and interactome Bioinformatics.
Gene regulation results in differential gene expression, leading to cell specialization.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Claim, Evidence, Reasoning and Experimental Design Review.
Relating Protein Abundance & mRNA Expression Mark B Gerstein Yale (Comp. Bio. & Bioinformatics) NIDA site visit at Yale
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Eukaryotic Genomes: Organization, Regulation and Evolution.
Central dogma: the story of life RNA DNA Protein.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
Introduction to biological molecular networks
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Comparative transcriptomic analysis of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Regulatory networks [Horak, et.
(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB Data Mining Predicting Networks through Bayesian Integration #1 - Theory Mark Gerstein, Yale University.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
1 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts and Features: Surveys of.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Eukaryote Regulation and Gene Expression
Quantitative Genetic Interactions Reveal Biological Modularity
Genetics and Information
Volume 14, Issue 7, Pages (February 2016)
From Mendel to Genomics
Brandon Ho, Anastasia Baryshnikova, Grant W. Brown  Cell Systems 
Interactome Networks and Human Disease
Presentation transcript:

Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2003, Feel free to use images in it with PROPER acknowledgement.

Do not reproduce without permission 2 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions Mark B Gerstein Yale U Talk at Experimental Biology

Do not reproduce without permission 3 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Understanding proteins, through analysis of populations rather than individuals

Do not reproduce without permission 4 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The central post- genomic term Greenbaum et al., Genome Res. 11:1463

Do not reproduce without permission 5 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The central post- genomic term Greenbaum et al., Genome Res. 11:1463

Do not reproduce without permission 6 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu The central post- genomic term PubMed Hits Proteome Greenbaum et al., Genome Res. 11:1463

Do not reproduce without permission 7 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Many Manifestations of Proteins & Research Topics in Proteomics Analyzing protein fossils (pseudogenes) in genomes Predicting protein function on a genomic scale Comparing folds & families between proteomes Analyzing protein flexibility in terms of packing Structures Sequences ArraysGels

Do not reproduce without permission 8 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods

Do not reproduce without permission 9 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Simple models connecting mRNA and Proteins where k s,i and k d,i are the protein synthesis and degradation rate constants At steady state: P i = k s;i [mRNA i ] k d,i = k s,i [ mRNA i ] – k d,i P i dPi dt

Do not reproduce without permission 10 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Same mRNA levels yet protein data varied > 20X (N ~100, r = 0.9) Early Experiments: Gygi et al. MCB (1999)

Do not reproduce without permission 11 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Same mRNA levels yet protein data varied > 20X Do some ORFs bias the results? 73 proteins (69%) R = 0.36

Do not reproduce without permission 12 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data Integration High-throughput data is less reliable than smaller-scale experiments. Merging reduces noise. Combining data increases accuracy & coverage

Do not reproduce without permission 13 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Merged mRNA vs Protein (2D gels) r =0.67 Greenbaum et al Bioinformatics 2001

Do not reproduce without permission 14 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Outliers (> 2 SD from the mean) High Protein Metabolism (1) Energy(2) MIPS [Mewes et al.] Low Protein Prot. Syn. (5) Prot. Fate (6)

Do not reproduce without permission 15 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Merged mRNA vs Protein (2D gels + Mudpit) mRNA Set 6249 ORFsProtein Set # 2 2 2DE sets & 2 Mudpit ~2000 ORFs

Do not reproduce without permission 16 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Functional Categories Co-regulated proteins

Do not reproduce without permission 17 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Subcellular Localization

Do not reproduce without permission 18 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Very Broad Categories: Enrichment of Amino Acids in Transcriptome and Translatome relative to Genome

Do not reproduce without permission 19 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Changes in mRNA vs. protein Murine hematopoietic precursor MPRO change in expression hr [S Weissmann]

Do not reproduce without permission 20 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Changes in mRNA vs. protein Murine hematopoietic precursor MPRO change in expression hr R = 0.58 ~ 80% of the genes are located in the first and third quadrants

Do not reproduce without permission 21 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Changes in mRNA vs. protein Ratios of wt+gal to wt gal (ICAT vs microarray) Ideker et al Science, 2001 N ~ 290 r = 0.6

Do not reproduce without permission 22 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods

Do not reproduce without permission 23 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Expression & interactions Interactions form indirect relationship between mRNA expression & protein abundance  Proteins in a complex should occur in stoichiometric equal amounts and (apx) rise & fall in expression in unison Types of interactions  Protein complexes (e.g. proteasome, ribosome)  Aggregated interactions in vivo pull down (Ho, Gavin) yeast two-hybrid (Uetz, Ito)

Do not reproduce without permission 24 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Interactions provide a way to define function Networks [Eisenberg et al.] Hierarchies & DAGs [Enzyme, Bairoch; GO, Ashburner; MIPS, Mewes, Frishman] Interaction Vectors [Lan et al, IEEE 90:1848]

Do not reproduce without permission 25 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Relationship of Interactions to Abs. Expression Level

Do not reproduce without permission 26 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein-Protein Interactions & Expression Correlations between selected expression timecourses (all pairs, control) (strong interactions in perm- anent complexes, clearly diff.) Cell Cycle CDC28 expt. (Davis) Sets of interactions (from MIPS) (Uetz et al.) Pairwise interactions

Do not reproduce without permission 27 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Protein-Protein Interactions & Expression Correlations Sets of interactions between selected expression timecourses (all pairs, control) (from MIPS) (strong interactions in perm- anent complexes, clearly diff.) (Uetz et al.) Cell Cycle CDC28 expt. (Davis) Pairwise interactions

Do not reproduce without permission 28 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 Representing Expression Correlations within a Large Complex in a Matrix

Do not reproduce without permission 29 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCM3 MCM6 CDC47 MCM2 CDC46 CDC54 DPB3 CDC45 DPB2 CDC2 CDC7 POL2 HYS2 POL32 DBF4 ORC2 ORC6 ORC5 ORC4 ORC3 ORC1 MCMs prots. ORC Polym.  &  Expression Correlations Segment Replication Complex into Component Parts

Do not reproduce without permission 30 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Range of Expression Correlations within Complexes Replication Cplx Overall.05 ORC.19, MCMs.75 Pol. .45, .75, Ribosome Overall.80 Large.80 Small.81 Proteasome Overall.43 20S.50 19S.51

Do not reproduce without permission 31 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods

Do not reproduce without permission 32 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Predicting Interactions from Expression Analysis ~313K significant relationships from ~18M possible Correlated mRNA profiles predict possible interactions Very noisy

Do not reproduce without permission 33 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Overview of information integrated and Bayesian Formalism Data suggestive of interactions (co-expression, co-localization, similar essentiality) Noisy high-throughput experiments (Gavin et al., Uetz et al. &c) Gold-standard complexes (MIPS, Mewes, Frishman et al.) Jansen et al. Science 302:449

Do not reproduce without permission 34 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Overview of information integrated and Bayesian Formalism Cross-validated training and testing Thresholding L at various values Tabulation of observed TP and FP at various thresholds Jansen et al. Science (in press)Jansen et al. Science 302:449

Do not reproduce without permission 35 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of network against gold standard complex interactions Positives 8250 known interactions in MIPS complexes [Mewes] Negatives ~2.7 M pairs in diff. Subcellular compartments TP FP Set of predicted “interactions” [Related Data in Bind, DIP]

Do not reproduce without permission 36 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Observed TP/FP Ratio Tracks L, Suggesting a Threshold Jansen et al. Science 302:449

Do not reproduce without permission 37 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Observed TP/FP Ratio Tracks L, Suggesting a Threshold Jansen et al. Science 302:449

Do not reproduce without permission 38 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Observed TP/FP Ratio Tracks L, Suggesting a Threshold Jansen et al. Science 302:449

Do not reproduce without permission 39 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Integration of Features Gives Much Higher Likelihood Ratios than Any Individual Feature Jansen et al. Science 302:449

Do not reproduce without permission 40 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison of Predictions with Known Complexes Jansen et al. Science 302:449

Do not reproduce without permission 41 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Example prediction: Mito. Ribosome Jansen et al. Science 302:449

Do not reproduce without permission 42 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Comparison with new experiments J Greenblatt RFA cplx Jansen et al. Science 302:449

Do not reproduce without permission 43 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods

Do not reproduce without permission 44 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Intuition for how to do Integration of Interactomes Diverse sources of interaction information  Databases (BIND, DIP, MIPS etc.) Individual expts. in literature  High-throughput datasets in vivo pull down (Ho, Gavin) yeast two-hybrid (Uetz, Ito)  Genomic data Expression Phenotypes Localization Functional Noisy  High-throughput data is less reliable than smaller scale experiments [Grigorev, Bork] Combining data increases  Accuracy & coverage [Church] How to do quantitatively?  How to weight different data sources?  General classification problem (machine learning) Bayesian Approaches….

Do not reproduce without permission 45 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Example of data integration: RNA polymerase II Which subunits interact? Based on protein-protein interaction experiments [Kornberg] Compare with Gold Std. structure Edwards, Kus, et al. TIG 18:529

Do not reproduce without permission 46 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II

Do not reproduce without permission 47 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II

Do not reproduce without permission 48 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Interaction experiments before structure was known

Do not reproduce without permission 49 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II

Do not reproduce without permission 50 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Integrate using naive Bayes classifier Data integration: RNA polymerase II

Do not reproduce without permission 51 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier

Do not reproduce without permission 52 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Data integration: RNA polymerase II Integrate using naive Bayes classifier

Do not reproduce without permission 53 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Weighted Voting: the Likelihood Ratio Vote: +2 = With weights: likelihood ratio L = L 1 + L 2 + L 3 …

Do not reproduce without permission 54 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Correlations between similar features

Do not reproduce without permission 55 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Relative quality of different expts. L = (TP/FP) (N/P) [for uncorrelated features]

Do not reproduce without permission 56 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Towards Protein Function: Relating mRNA Expression, Protein Abundance, and Protein-protein Interactions 1.Relating mRNA expression & protein abundance: Correlation within broad categories and merge data 2.Relating mRNA expression & protein interactions (indirectly through abundance): Noisy Correlations in complexes 3.Predicting interactions through expression correlations integrated with other genomic information (reducing the noise) a.Results on yeast b.Intuition on Bayesian Methods

Do not reproduce without permission 57 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Acknowledgements Protein Function Prediction (Networks.GersteinLab.org) J Qian, R Jansen, A Drawid, C Wilson, H Yu, D Greenbaum, J Lin, N Luscombe, H Hegyi, Y Kluger Pseudogenes (Pseudogene.org) P Harrison, Z Zhang, Y Liu, S Balasubramanian, P Bertone, T Johnson, J Karro Macromolecular Motions (MolMovDB.org) J Junker, H Yu, N Echols, V Alexandrov, W Krebs, D Milburn, U Lehnert Collaborators J Chang, R Basri, J Greenblatt (N Krogan) Yale CEGS M Snyder (A Kumar, H Zhu, M Bilgin …) S Weissmann, P Miller (K Cheung) NESG.org G Montelione, A Edwards (B Kuss) NIH, NSF Structural Proteomics (PartsList.org) C Goh, N Lan, H Hegyi, R Das, S Douglas, B Stenger

Do not reproduce without permission 58 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Web-based data analysis system with graphical interface Data processing Web-based data analysis system with graphical interface Data processing Data quality Web-based data analysis system with graphical interface Data processing Data quality Differential hybridisation scoring Web-based data analysis system with graphical interface Data processing Data quality Differential hybridisation scoring Graphical interface allows visualisation of data and comparison with original slides Slide level Web-based data analysis system with graphical interface Data processing Data quality Differential hybridisation scoring Graphical interface allows visualisation of data and comparison with original slides Slide level Spot level

Do not reproduce without permission 59 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Examples: SIM: Example Shifted Relationship (SIM)

Do not reproduce without permission 60 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu MIM: Examples: Example Shifted Relationship (MIM)

Do not reproduce without permission 61 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ratios of wt+gal to wt gal ICAT vs microarray N ~ 290, r = 0.6 Ideker et al Science, 2001

Do not reproduce without permission 62 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu murine hematopoietic precursor MPRO change in expression hr

Do not reproduce without permission 63 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Outliers Generally… Alcohol dehydrogenase is also a stress induced protein in many organisms (Matton et al. 1990; An et al. 1991; Millar et al. 1994), Faster Ramp Up? Alternatively, it is possible to look into mRNA stability as a factor Presently there are many structures within mRNA that are thought to influence stability including, among others, stem loops, UTRs premature stops and uORFS (Klaff et al. 1996) MIPS fxns AA metabolism & Energy are 2X as likely to have high protein vs mRNA than the general population MIPS fxns Protein synthesis and Protein fate (folding, modification, destination) are more likely to have low protein vs mRNA than the general population

Do not reproduce without permission 64 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu murine hematopoietic precursor MPRO change in expression hr R = 0.58 ~ 80% of the genes are located in the first and third quadrants

Do not reproduce without permission 65 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Understanding Protein Function on a Genomic Scale Originally, 250 of 650 known on chr. 22 [Dunham et al.] >>30K+ Proteins in Entire Human Genome (with alt. splicing).…… ~650

Do not reproduce without permission 66 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Issues in defining protein function on a genomic scale Multi-functionality: 2 functions/protein (also 2 proteins/function) Role Conflation: molecular, cellular, phenotypic Fun terms… but do they scale?....

Do not reproduce without permission 67 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Names in Biology: Systematic? Yippee  named for the reaction of a graduate student upon cloning protein. If she has a good result, she would write "yippee" in the margin of her notebook vulcan & klingon stranded at second  mutant dies during development, usually in 2nd larval stage sarah  affects female fertility (biblical ref.) Sonic & kryptonite Darkener of apricot & suppressor of white apricot ROP vs ROM  "Regulator of Copy Number" or RNA-I- II-complex-binding-protein Barentsz  named for Dutch explorer who froze to death near the North Pole. The mutant blocks the movement of a key mRNA, causing it to get stuck in wrong place Agoraphobic  mutant for which the larvae look normal but never crawl out of the egg single-minded Redtape  series of designations given to genes which, when mutated, block transport along axons. Lush & cheapdate  former wants alcohol, later makes susceptible [Adapted from conversations + Am. Sci.]

Do not reproduce without permission 68 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Issues in defining protein function on a genomic scale Fun terms… but do they scale? Starry night (P Adler, ’94) For now, definable aspects of function: interactions, location, enzymatic rxn. [Babbit]

Do not reproduce without permission 69 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Predicting Protein Function on a Genome Scale Relating microarray experiments to protein abundance  Local Clustering (to identify time-shifted and inverted relationships)  Relating Clustering to Known Regulatory Relationships Relating them to protein interactions  Bayesian methods to uniformly & optimally combine evidence (in application to integration of protein interaction data)  Predicting interactions in yeast de novo from non-interaction data sources (with verification)