Presentation is loading. Please wait.

Presentation is loading. Please wait.

Networks of Protein Interactions Introduction and Integration Balaji S. Srinivasan CS 374 Lecture 5 10/11/2005.

Similar presentations


Presentation on theme: "Networks of Protein Interactions Introduction and Integration Balaji S. Srinivasan CS 374 Lecture 5 10/11/2005."— Presentation transcript:

1 Networks of Protein Interactions Introduction and Integration Balaji S. Srinivasan CS 374 Lecture 5 10/11/2005

2 Overview Genomics 1 genome Assembly, Gene Finding Comparative Genomics N genomes Sequence Alignment Functional Genomics 1 assay Microarray Analysis Integrative Genomics N assays Network Integration (this talk)

3 Coexpression 1.81 1 -.6 -.7 Gene A Gene B Gene C Gene B Gene A Gene C Pearson Correlation =.8 -.7 -.6 Expression Genes Arrays Microarray data

4 Coinheritance 1.951 1 - -1 Protein A Protein B Protein C Protein B Protein A Protein C.95 -.95 = Spearman Correlation 600200300100 500 100300200400 250 50 Protein A Protein B Protein C Species 2Species 1Species 4 Species 3 Inheritance BLAST bit scores

5 Colocation 0.060 0.25 Protein A Protein B Protein C Protein B Protein A Protein C Average chromosomal distance.06.25 =.6.2.3.1.5.1.3.2.4.25.05 Protein A Protein B Protein C Chrom 2Chrom 1Chrom 4 Chrom 3 Location Assembled Genomes

6 Coevolution 1.91 1 -.7 -.8 Prt Fam A Prt Fam B Prt Fam C Prt Fam B Prt Fam A Prt Fam C Tree Distances.9 -.8 -.7 = C’’ Evolution AA’A’’A’’’ B’B’’B’’’ B C’C’’’ C Multiple Alignments

7 Functional Genomics Many others… Experimental TAP + Mass Spec Y2H Pheno & antibody arrays Synthetical lethal RNAi knockdown Computational Rosetta Stone (conserved domain) Shared Operon PSIMAP Experimental Computational

8 Integration Motivation Can we combine data? Example: Caulobacter crescentus flagellar proteins Coexpression cluster Compare to coinheritance Potential for integration… Coexpression Coinheritance

9 How to use 2 predictors? Agree & disagree… Scales, noise levels, sources, very different Can we do network integration ? coinheritance coexpression

10 Early Integration Hacks Given 2 nets intersection union average weights + coexpression coinheritance =

11 Early Integration Hacks.9.8.7.6 Coexpression.5.7.8.9 Coinheritance + = Intersection Too strictToo lenientToo dumb :) Union.65.35.45.75 Average.35.4

12 Early Integration Hacks Useful dumb… All data equal? No explicit, statistical formulation diff noise levels diff intervals Uninformed by prior data….65.35.85.75 Average.35 Too dumb

13 Recent work Bayesian Networks (Troyanskaya 2003) Decision Trees (Wong 2004) Naïve Bayes + Boosting (Lu 2005)Likelihood Ratios (Lee 2004)

14 Recent work Major innovation: Training Set MIPS, “Gold Standard” (Gerstein) SSL, synthetic lethals (Wong) DIP (Marcotte) Defines the signal What is our algorithm learning? KEGG (Pyrimidine Metabolism)

15 Recent work Major limitations Method specific Decision trees  binary coding Bayesian Networks  need to poll people for prior All methods Biological:  limited to yeast Statistical  dependency hacks!  Lee: heuristic weighting  Naïve Bayes Naïve Bayes (Lu 2005) Heuristic Weighting (Lee 2004)

16 Recap Just shown Functional Genomics Integration Problem Previous work all in S. cerevisiae major innovation: training set major shortcoming: dependence hacks To come training set, common scale rigorous statistical dependence microbes only (for now…) +++ … coexpressioncoinheritancecolocation

17 … Training Set Observation Known linkages for nontrivial fraction of pairs Caulobacter crescentus KEGG: 783 of 3737 proteins in 1 or more KEGG pathways Ex: pyrimidine metabolism, pathway 240

18 Training Set Tabulate pairs 1 if shared COG/KEGG/GO 0 if unshared ? If one or both unknown Most pairs totally unknown…

19 Training Sets Most pairs totally unknown… Caulobacter crescentus 3737 proteins, 783 KEGG small in relative terms large in absolute terms 6667480 pairs 6980716 pairs All pairs: L=0,1,? 298961 pairs+ 14275 pairs + relative frequency: training pairs vs. all pairs

20 Training Sets 6667480 pairs 298961 pairs 14275 pairs All pairs: L=0,1,? 6980716 pairs

21 Training Sets Training data is crucial Reveals hidden structure Small effort yields large payoff L=0,1,? stats Puts data on common scale meter in biology (predictive power), not physics (units) add training set raw data hidden structure

22 Bayes’ Rule in 1D Predict Linkages Bayes’ Rule Coexpression Evaluate posterior at millions of pairs P(L=1|E) for L=? Optimal decision rule “Bayes error rate” = min. error rate of classifier Bayes’ Rule: Calculate conditional probability of linkage given evidence

23 2D Network Integration Account for statistical dependence 2D Scatterplot coexpression vs. coinheritance

24 2D Network Integration Estimate densities Kernel Density Estimation Gray-Moore dual tree algorithm (digression #1)

25 2D Network Integration

26 Posterior probability of interaction P(L=1|E) visual, geometric interpretation

27 2D Network Integration Hacks revisited Intersection Union Average All are suboptimal… including decision trees, naïve bayes, etc.

28 Hidden Biology Dividend of Network Integration Joint density reveals hidden biology Moderate evidence from multiple sources! Subtle interactions missed by univariate methods…

29 Recap #2 Just shown Training set: scale to common axes Scatterplot + KDE Posterior probability of interaction Hidden biology To show generalizations N evidences, arbitrary microbes…

30 Using N predictors Example with N = 3 (coinheritance, colocation, coexpression) note evidence coupling high colocation compensates for low coexpression nonlinear reln. revealed by joint density… coexpression (E1) colocation (E2) coinheritance (E3)

31 Binary Classifier Paradigm Pair w/ unknown linkage status given interaction predictors predict func association A B L=? E known Different Function A B L=0 Same Function A B L=1 P(L|E)

32 Blessing of Dimensionality

33 Classifier builds network Binary classifier on pairs apply to all microbes, all protein pairs in 230 species first rigorous nets for many human pathogens Escherichia coli K12 Helicobacter pylori 26695 Caulobacter crescentus

34 MreB Example: MreB relative of eukaryotic actin predict interaction partners

35 CtrA and CcrM Laub et al., 2000

36 C. jejuni glycosylation Eukaryote-like N-linked glycosylation mysterious biotechnological & clinical importance

37 Nets speed experiment Sec partners for MreB Natalie Dye Predicting mislocalization Grant Bowman, Esteban Toro Interacting 2- component proteins Nathan Hillson

38 Recap integrate data sources non-naïvely rigorous probabilistic formulation moderate evidence from multiple sources Result: Unified p-value for prob. of functional linkage given all evidence.

39 Further Directions Whatcha gonna do with it? M genomes + N assays Comparative Genomics + Integrative Genomics = Network Alignment To be continued…


Download ppt "Networks of Protein Interactions Introduction and Integration Balaji S. Srinivasan CS 374 Lecture 5 10/11/2005."

Similar presentations


Ads by Google