Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen.

Slides:



Advertisements
Similar presentations
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Advertisements

Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Domain-SLiM mining from High Throughput Protein Interaction Data Hugo Willy August 19, 2010.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Structural bioinformatics
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to BioInformatics GCB/CIS535
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
BCB 570 Spring Protein-Protein Interaction Networks & methods Julie Dickerson Electrical and Computer Engineering.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Interaction Networks in Biology: Interface between Physics and Biology, Shekhar C. Mande, August 24, 2009 Interaction Networks in Biology: Interface between.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Comparative Genomics of the Eukaryotes
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Ch10. Intermolecular Interactions and Biological Pathways
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Functional Linkages between Proteins. Introduction Piles of Information Flakes of Knowledge AGCATCCGACTAGCATCAGCTAGCAGCAGA CTCACGATGTGACTGCATGCGTCATTATCTA.
Network Analysis and Application Yao Fu
Interactions and more interactions
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Computational prediction of protein-protein interactions Rong Liu
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Comp. Genomics Recitation 3 The statistics of database searching.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Construction of Substitution Matrices
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Introduction to biological molecular networks
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
GO based data analysis Iowa State Workshop 11 June 2009.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Construction of Substitution matrices
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Copyright OpenHelix. No use or reproduction without express written consent1.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Protein Sequence Motifs
Bioinformatics Overview
CSCI2950-C Lecture 12 Networks
Identification of protein-protein binding motifs
Sequence Based Analysis Tutorial
SEG5010 Presentation Zhou Lanjun.
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Nora Pierstorff Dept. of Genetics University of Cologne
Introduction to Bioinformatics
Basic Local Alignment Search Tool
Presentation transcript:

Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen University Feb. 21, 2013

My research Protein complex structures –Protein-protein docking –Correlated mutations Interaction site prediction/analysis –Protein-protein interactions –Enzyme active sites –Protein-DNA interactions Network modelling –Gene regulatory networks –Flowering related

Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

Protein Interaction Networks Obligatory hemoglobin

ObligatoryTransient hemoglobinMitochondrial Cu transporters Protein Interaction Networks

Experimental approaches (1) Yeast two-hybrid (Y2H)

Experimental approaches (2) Affinity Purification + mass spectrometry (AP-MS)

Interaction Databases STRING

Interaction Databases

STRING HPRD

Interaction Databases

STRING HPRD MINT

Interaction Databases

STRING HPRD MINT INTACT

Interaction Databases

STRING HPRD MINT INTACT BIOGRID

Interaction Databases

Some numbers OrganismNumber of known interactions H. Sapiens113,217 S. Cerevisiae75,529 D. Melanogaster35,028 A. Thaliana13,842 M. Musculus11,616 Biogrid (physical interactions)

Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

Binding site

Binding site prediction Applications:

Binding site prediction Applications: Understanding network evolution Understanding changes in protein function Predict protein interactions Manipulate protein interactions

Binding site prediction Applications: Understanding network evolution Understanding changes in protein function Predict protein interactions Manipulate protein interactions Input data: Interaction network Sequences (possibly structures)

Sequence-based predictions

Sequences and networks Goal: predict interaction sites and/or motifs

Sequences and networks Goal: predict interaction sites and/or motifs Data: interaction networks, sequences

Sequences and networks Goal: predict interaction sites and/or motifs Data: interaction networks, sequences Validation: structure data, “motif databases”

Motif search in groups of proteins Group proteins which have same interaction partner Use motif search, e.g. find PWMs Neduva Plos Biol 2005

Correlated Motifs

Motif model Search Scoring

Predefined motifs

Correlated Motif Mining Find motifs in one set of proteins which interact with (almost) all proteins with another motif

Correlated Motif Mining Find motifs in one set of proteins which interact with (almost) all proteins with another motif Motif-models: PWM – so far not applied (l,d) with l=length, d=number of wildcards Score: overrepresentation, e.g. χ 2

Correlated Motif Mining Find motifs in one set of proteins which interact with (almost) all proteins with another motif Search: Interaction driven Motif driven

Interaction driven approaches Mine for (quasi-)bicliques  most-versus-most interaction Then derive motif pair from sequences

Motif driven approaches Starting from candidate motif pairs, evaluate their support in the network (and improve them)

D-MOTIF Tan BMC Bioinformatics 2006

IMSS: application of D-MOTIF Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010 protein Y protein X Test error Number of selected motif pairs

Experimental validation protein Y protein X Test error Number of selected motif pairs Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010

protein Y protein X Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010 Test error Number of selected motif pairs Experimental validation

protein Y protein X Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010 Test error Number of selected motif pairs Experimental validation

SLIDER Boyen et al. Trans Comp Biol Bioinf 2011

Faster approach, enabling genome wide search Scoring: Chi 2 Search: steepest ascent SLIDER

Performance assessment on simulated data Performance assessment using using protein structures Validation

Extension I: better coverage of network Extensions of SLIDER Boyen et al. Trans Comp Biol Bioinf 2013

Extensions of SLIDER Extension I: better coverage of network Extension II: use of more biological information

bioSLIDER DGIFELELYLPDDYPMEAPKVRFLTKI

conservation bioSLIDER

DGIFELELYLPDDYPMEAPKVRFLTKI conservation accessibility bioSLIDER

DGIFELELYLPDDYPMEAPKVRFLTKI conservation accessibility bioSLIDER Thresholds for conservation and accessibility Extension of motif model: amino acid similarity (BLOSUM)

DGIFELELYLPDDYPMEAPKVRFLTKI conservation No conservation, no accessibility Conservation and accessibility Using human and yeast data for training and optimizing parameters Interaction-coverage accessibility bioSLIDER Motif-accuracy Leal Valentim et al., PLoS ONE 2012

Application to Arabidopsis Arabidopsis Interactome Mapping Consortium, Science 2011 Input data: 6200 interactions, 2700 proteins Interface predictions for 985 proteins (on average 20 residues)

Ecotype sequence data (SNPs) SNPs tend to ‘avoid’ predicted binding sites In 263 proteins there is a SNP in a binding site  these proteins are much more connected to each other than would be randomly expected

Summary Prediction of interaction sites using protein interaction networks and protein sequences Correlated motif approaches

Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

Protein Interaction Prediction Lots of genomes are being sequenced… ( CompleteIncomplete ARCHAEA BACTERIA EUKARYA TOTAL

Protein Interaction Prediction Lots of genomes are being sequenced… ( CompleteIncomplete ARCHAEA BACTERIA EUKARYA TOTAL But how do we know how the proteins in there work together?!

Protein Interaction Prediction Interactions of orthologs: interologs Phylogenetic profiles Domain-based predictions A B

Orthology based prediction

Phylogenetic profiles A B C D

Domain Based Predictions

Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

Duplications

Duplications and interactions Gene duplication

Duplications and interactions Gene duplication

Duplications and interactions 0.1 Myear -1 Gene duplicationInteraction loss Myear -1

Duplications and interaction loss Duplicate pairs share interaction partners

Interaction network evolution Science 2011

Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

Network alignment Local Network Alignment: find multiple, unrelated regions of Isomorphism Global Network Alignment: find the best overall alignment

PATHBLAST Kelley, PNAS 2003

PATHBLAST: scoring Kelley, PNAS 2003 homology interaction

PATHBLAST: results Kelley, PNAS 2003

PATHBLAST: results Kelley, PNAS 2003 For yeast vs H.pylori, with L=4, all resulting paths with p<=0.05 can be merged into just five network regions

Multiple alignment Scoring: Probabilistic model for interaction subnetworks Sub-networks: bottom-up search, starting with exhaustive search for L=4; followed by local search Sharan PNAS 2005

Multiple alignment: results Sharan PNAS 2005

Multiple alignment: results Applications include protein function prediction and interaction prediction Sharan PNAS 2005

Global alignment Singh PNAS 2008

Global alignment Singh PNAS 2008

Global alignment Alignment: greedy selection of matches Singh PNAS 2008

Network alignment: the future? Sharan & Ideker Nature Biotech 2006

Summary Interaction network evolution: mostly “comparative”, not much mechanistic Approaches exist to integrate and model network analysis within context of phylogeny (not discussed) Outlook: combine interaction site prediction with network evolution analysis

Exercises The datafiles “ arabidopsis_proteins.lis” and “interactions_arabidopsis.data” contain Arabidopsis MADS proteins (which regulate various developmental processes including flowering), and their mutual interactions, respectively.

Exercise 1 Start by getting familiar with the basic Cytoscape features described in section 1 of the tutorial al:Introduction_to_Cytoscape al:Introduction_to_Cytoscape Load the data into Cytoscape Visualize the network and analyze the number of interactions per proteins – which proteins do have a lot of interactions?

Exercise 2 Write a script that reads interaction data and implements a datastructure which enables further analysis of the data (see setup on next slides). Use the datafiles “ arabidopsis_proteins.lis” and “interactions_arabidopsis.data” and let the script print a table in the following format: PROTEINNumber_of_interactions Make a plot of those data

#two subroutines #input: filename #output: list with content of file sub read_list { my $infile=$_[0]; YOUR CODE } #input: protein list and interaction list #output: hash with “proteins”  list of their partners sub combine_prot_int($$) { my ($plist,$intlist) YOUR CODE return %inthash; }

#reading input data read_list($ARGV[0]); read_list($ARGV[1]); #obtaining hash with interactions YOUR CODE #loop over all proteins and print their name and their number of interactions

In “ orthology_relations.data” we have a set of predicted orthologs for the Arabidopsis proteins from exercise 1. “ protein_information.data” describes a.o. from which species these proteins are. Finally, “ interactions.data “ contains interactions between those proteins. Use the Arabidopsis interaction data from exercise 1 to “predict” interactions in other species using the orthology information. Compare your predictions with the real interaction data and make a plot that visualizes how good your predictions are. Exercise 3