Protein Interaction Networks

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
A Genomic Code for Nucleosome Positioning Authors: Segal E., Fondufe-Mittendorfe Y., Chen L., Thastrom A., Field Y., Moore I. K., Wang J.-P. Z., Widom.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
MiRNA in computational biology 1 The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig C. Mello for their discovery of "RNA interference.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Protein-protein interactions
Adaptive evolution of bacterial metabolic networks by horizontal gene transfer Chao Wang Dec 14, 2005.
Protein domains vs. structure domains - an example.
Protein Modules An Introduction to Bioinformatics.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Similar Sequence Similar Function Charles Yan Spring 2006.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Protein Classification A comparison of function inference techniques.
Protein Interactions and Disease Audry Kang 7/15/2013.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Improving Gene Function Prediction Using Gene Neighborhoods Kwangmin Choi Bioinformatics Program School of Informatics Indiana University, Bloomington,
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Inferring Functional Information from Domain co-evolution Yohan Kim, Mehmet Koyuturk, Umut Topkara, Ananth Grama and Shankar Subramaniam Gaurav Chadha.
Protein Interaction Networks Thanks to Mehmet Koyuturk.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
I. Prolinks: a database of protein functional linkage derived from coevolution II. STRING: known and predicted protein-protein associations, integrated.
Chapter 10 Phylogenetic Basics. Similarities and divergence between biological sequences are often represented by phylogenetic trees Phylogenetics is.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
1 Computational functional genomics Lital Haham Sivan Pearl.
(H)MMs in gene prediction and similarity searches.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
bacteria and eukaryotes
Transcription.
Transcription(I) 王之仰.
CSCI2950-C Lecture 12 Networks
Functional organization of the yeast proteome by systematic analysis of protein complexes Presented by Nathalie Kirshman and Xinyi Ma.
FLiPS Functional Linkage Prediction Service.
Large Scale Data Integration
Target selection strategies for the mouse genome
Sequence Based Analysis Tutorial
Pairwise Sequence Alignment (cont.)
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Gautam Dey, Tobias Meyer  Cell Systems 
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Protein Interaction Networks Thanks to Mehmet Koyuturk

Protein-Protein Interactions 7. Protein Interaction Networks Protein-Protein Interactions Physical association between proteins Signal transduction, phosphorylation Docking, complex formation Permanent vs. transient interactions Co-location of proteins Proteins that work in the same cellular component Soluble location: lysosome, mitochondrial stroma Membrane location: receptors in plasma membrane, transporters in mitochondrial membrane Functional association of proteins Proteins involved in the same biomolecular activity Enzymes in the same pathway, co-regulated proteins

Permanent vs Transient Interactions 7. Protein Interaction Networks Permanent vs Transient Interactions Permanent interactions Some proteins form a stable protein complex that carries out a structural or functional biomolecular role These proteins are protein subunits of the complex and they work together ATPase subunits, subunits of nuclear pore Transient interactions Proteins that come together in certain cellular states to undertake a biomolecular function DNA replicative complex, signal transduction

Signal Transduction Phosphorylation Signaling cascade 7. Protein Interaction Networks Signal Transduction Phosphorylation Protein-kinase interaction Enzyme activation Signaling cascade

Why Study Protein Interactions? 7. Protein Interaction Networks Why Study Protein Interactions? Identification of functional modules and interconnections between these modules Functional annotation based on binding partners and interaction patterns Identification of evolutionarily conserved pathways Identification of drug target proteins to minimize side effects

Identification of Protein Interactions 7. Protein Interaction Networks Identification of Protein Interactions Traditionally, protein interactions are identified by wetlab experiments based on hypotheses on candidate proteins Small scale assays Coimmunoprecipitation: Immunoprecipitate one protein, see if other is also precipitated Reliable, but can only verify interactions between suspected partners High throughput screening Throw in thousands of ORFs and see which ones bind to each other Yeast two hybrid, tandem affinity purification Large scale, but a lot of noise

7. Protein Interaction Networks Yeast Two Hybrid Split yeast GAL4 gene, which encodes a transcription factor, required for activation of GAL genes in two parts Activating domain, binding domain The split protein does not work unless the two parts are in physical contact

Protein Interaction Networks Organize all identified interactions in a network, where proteins are represented by nodes and interactions are represented by edges TAP identifies a group of proteins that are caught by target protein Spoke model (star network) vs. matrix model (clique) Interaction Protein

Functional Modularity in PPI Networks 7. Protein Interaction Networks Functional Modularity in PPI Networks A protein complex Dense subgraph A signal transduction pathway Simple path, parallel paths A protein with common, key, fundamental role (e.g., a kinase) Hub node

Computational Prediction of PPIs 7. Protein Interaction Networks Computational Prediction of PPIs Functional association is a higher level conceptualization of interaction Proteins that act as enzymes catalyzing reactions in the same metabolic pathway Functionally associated proteins are likely to show up in similar contexts Co-regulation, co-expression, co-evolution, co-citation… Functional association between proteins can be computationally identified by looking at different sources of data such as sequences, gene expression, literature Can also be extended to capture physical associations, for example, by taking into account evolution at structural level

Conservation of Gene Neighborhood 7. Protein Interaction Networks Conservation of Gene Neighborhood In bacteria, the genome of an organism is organized in such a way that that functionally related proteins are coded by neighboring regions Operons When more than one bacterial species are considered, it is observed that this neighborhood relationship becomes even more relevant Distribution of neighboring genes in H. Influenzae and E. coli into functional classes

Comparison of Nine Bacterial Genomes 7. Protein Interaction Networks Comparison of Nine Bacterial Genomes trpB-trpA is the only gene pair whose proximity is conserved across nine prokaryotic genomes These genes encode the two subunits of tryptophan synthase that interact and catalyze a single reaction

Close Orthologs Run of genes Bidirectional best hits 7. Protein Interaction Networks Close Orthologs Run of genes A set of genes on one strand, such that gaps between adjacent genes is less than a threshold, g (in practice, g  300 bp) Any pair of genes on the same run are said to be close Bidirectional best hits Genes X1 and X2 from genomes G1 and G2 are BBH, if their sequence similarity is significant and there are no Y1 (Y2) in G1(G2) that is more similar to X2 (X1) than X1 (X2) Pair of close bidirectional best hits: Xa, Ya close in G1, Xb, Yb close in G2, Xa&Xb BBH, Ya& Yb BBH

Predicting Interactions 7. Protein Interaction Networks Predicting Interactions For each pair of close orthologs (occuring at least one pair of genomes), calculate a score Score should increase with the phylogenetic distance between the two genomes, since closely related organisms are more likely to have similar genes nearby due to chance alone Existence of a triplet (P1, P2, P3) should be stronger than the existence of two pairs (P1, P2 and P1, P3) Triplet distance can be estimated as the minimum distance between any pair of organisms (in addition to pair score)

Reconstructing Pathways 7. Protein Interaction Networks Reconstructing Pathways Purine Metabolism Can identify the association between unknown proteins and known pathways!

Projection of Gene Neighborhood 7. Protein Interaction Networks Projection of Gene Neighborhood The composition of operons is evolutionarily variable A particular set of functionally related genes do not always comprise an operon The application of gene neighborhood based interaction prediction is limited for a single organism With multiple organisms, it is possible to statistically strengthen conclusions and project findings on other organisms If an operon with functionally related genes exists in several genomes, a functional association can be predicted for other organisms, even if the corresponding genes are scattered Variability turns out to be an advantage for prediction

Gene Neighborhood - Limitations 7. Protein Interaction Networks Gene Neighborhood - Limitations It is only directly applicable to bacteria (and archaea), because relevance of gene order does not necessarily extend to eukaryotes For closely related species, conserved gene order might just be due to lack of time for genome rearrangements We are interested in selective constraints that preserve gene order Compared species should be distant enough But not too distant, because we need sufficient number of orthologs to be able to derive statistically meaningful results

Gene Fusion Domain fusion events 7. Protein Interaction Networks Gene Fusion Domain fusion events Two protein domains that act as independent proteins (components) in one organism may form (part of) a single polypoptide chain (composites) in another organism Most proteins that are involved in domain fusion events are known to be subunits of multiprotein complexes (76% in E. coli metabolic network)

Gene Fusion Based PPI Prediction 7. Protein Interaction Networks Gene Fusion Based PPI Prediction A pair of proteins in query genome are candidate interacting pairs if They show (local) sequence similarity to the same protein (rosetta stone) in reference genome They do now show sequence similarity with each other Complete genomes!

Predicted Interactions 7. Protein Interaction Networks Predicted Interactions Known physical interactions same pathway Proteins in the

Gene Fusion Based Prediction - Results 7. Protein Interaction Networks Gene Fusion Based Prediction - Results Interactions predicted based on gene fusion events Distance on circle shows distance on genome

Co-evolution of Interacting Proteins 7. Protein Interaction Networks Co-evolution of Interacting Proteins Selective pressure is likely to act on common function Proteins that are interacting are expected to either be conserved together along with their interactions, or not conserved at all Hypothesis 1: Orthologs of interacting proteins also interact in other species (supported by evidence, but there are subtleties, which we will discuss this later) Hypothesis II: If two proteins are interacting, then they will show similar conservation patterns Phylogenetic profiles

Phylogenetic Profiles 7. Protein Interaction Networks Phylogenetic Profiles

Correlation of Phylogenetic Profiles 7. Protein Interaction Networks Correlation of Phylogenetic Profiles Assume we have N genomes, protein X has homologs in x of them, Y has y, and they co-occur in z genomes Hamming distance: Pearson correlation: Mutual information: Statistical significance:

Phylogenetic Profiles - Limitations 7. Protein Interaction Networks Phylogenetic Profiles - Limitations Many processes may be common across lineages Too many false positives Database of genomes may be biased All organisms are treated equally Improvement: Use trees instead of profiles Proteins are assumed to be conserved as a whole It is domains that interact Improvement: Use domain profiles Yeast nucleoli and ribosomal proteins Organisms

Phylogenetic Tree Based Prediction 7. Protein Interaction Networks Phylogenetic Tree Based Prediction Phylogenetic trees of Ntr-family two-component sensor histidine kinases and their corresponding regulators

7. Protein Interaction Networks Mirror Tree Method Need to have sufficient number of genomes that contain homologs of both proteins

7. Protein Interaction Networks Matrix Method Start with families of proteins that are suspected to interact Identify specific pairs of proteins that interact by aligning the phylogenetic trees that underly the two families Assumption: Identical number of proteins in each family

7. Protein Interaction Networks Correlated Mutations Co-evolution of interacting proteins can be followed more closely by quantifying the degree of co-variation between pairs of residues from these proteins Correlated mutations may correspond to compensatory mutations that stabilize the mutations in one protein with changes in the other Distribution of distances between aminoacid positions on a folded protein

7. Protein Interaction Networks In silico Two-Hybrid The correlation of mutations between two positions (may be on different proteins) can be estimated from pairwise assessment of aligned multiple sequences Position pairs with high correlation are potential contact points Interaction index For a protein pair, compute the aggregate correlation (of mutations) across all positions

7. Protein Interaction Networks In silico Two-Hybrid

7. Protein Interaction Networks Performance of I2H I2H predicts physical, rather than functional association It requires complete genomes & sufficient number of homologs

Co-citation Based PPI Prediction 7. Protein Interaction Networks Co-citation Based PPI Prediction Functionally associated proteins are likely to be cited in the same research article We can assess the statistical significance of co-citation based on hypergeometric model Algorithmic problem: How to recognize & match protein names? Train algorithm using annotated abstracts via conditional random fields (CRF)

Performance of Co-citation Statistical significance is quite relevant until it saturates The method is robust to choice of parameters for name recognition

Integrating PPI Networks 7. Protein Interaction Networks Integrating PPI Networks Interaction data coming from multiple sources Different sources refer to different levels of interaction Can integration handle noise, making interaction data more reliable? Superpose interactions based on their reliability

7. Protein Interaction Networks Bayesian Integration For each prediction method, compute log-likelihood score Let P(L|E) be the number of interactions predicted by method E, such that functional association between corresponding proteins is known Let ~P(L|E) be the number of false positives Let P(L) and ~P(L) be the corresponding priors Assign weights to methods based on their log-likelihood scores

Comparison of Prediction Methods 7. Protein Interaction Networks Comparison of Prediction Methods Integrated network captures functional association better Note that the integrated network is “trained” using available data on functional association

Classification Based Integration 7. Protein Interaction Networks Classification Based Integration Points: Proteins, Space: Expression, Conservation, Labels: Function Points: Protein Pairs, Space: Co-expression, Co-evolution, etc., Labels: Existence of Interaction

Performance of Domain Co-evolution 7. Protein Interaction Networks Performance of Domain Co-evolution

Co-Evolutionary Matrix 7. Protein Interaction Networks Co-Evolutionary Matrix

Domain Identification 7. Protein Interaction Networks Domain Identification

Difference between Predicted PPIs 7. Protein Interaction Networks Difference between Predicted PPIs