Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen.

Similar presentations


Presentation on theme: "Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen."— Presentation transcript:

1 Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen University aaltjan.vandijk@wur.nl Feb. 21, 2013

2 My research Protein complex structures –Protein-protein docking –Correlated mutations Interaction site prediction/analysis –Protein-protein interactions –Enzyme active sites –Protein-DNA interactions Network modelling –Gene regulatory networks –Flowering related

3 Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

4 Protein Interaction Networks Obligatory hemoglobin

5 ObligatoryTransient hemoglobinMitochondrial Cu transporters Protein Interaction Networks

6 Experimental approaches (1) Yeast two-hybrid (Y2H)

7 Experimental approaches (2) Affinity Purification + mass spectrometry (AP-MS)

8 Interaction Databases STRING http://string.embl.de/

9 Interaction Databases

10 STRING http://string.embl.de/ HPRD http://www.hprd.org/

11 Interaction Databases

12 STRING http://string.embl.de/ HPRD http://www.hprd.org/ MINT http://mint.bio.uniroma2.it/mint/

13 Interaction Databases

14 STRING http://string.embl.de/ HPRD http://www.hprd.org/ MINT http://mint.bio.uniroma2.it/mint/ INTACT http://www.ebi.ac.uk/intact/

15 Interaction Databases

16 STRING http://string.embl.de/ HPRD http://www.hprd.org/ MINT http://mint.bio.uniroma2.it/mint/ INTACT http://www.ebi.ac.uk/intact/ BIOGRID http://thebiogrid.org/

17 Interaction Databases

18 Some numbers OrganismNumber of known interactions H. Sapiens113,217 S. Cerevisiae75,529 D. Melanogaster35,028 A. Thaliana13,842 M. Musculus11,616 Biogrid (physical interactions)

19 Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

20 Binding site

21 Binding site prediction Applications:

22 Binding site prediction Applications: Understanding network evolution Understanding changes in protein function Predict protein interactions Manipulate protein interactions

23 Binding site prediction Applications: Understanding network evolution Understanding changes in protein function Predict protein interactions Manipulate protein interactions Input data: Interaction network Sequences (possibly structures)

24 Sequence-based predictions

25 Sequences and networks Goal: predict interaction sites and/or motifs

26 Sequences and networks Goal: predict interaction sites and/or motifs Data: interaction networks, sequences

27 Sequences and networks Goal: predict interaction sites and/or motifs Data: interaction networks, sequences Validation: structure data, “motif databases”

28 Motif search in groups of proteins Group proteins which have same interaction partner Use motif search, e.g. find PWMs Neduva Plos Biol 2005

29 Correlated Motifs

30 Motif model Search Scoring

31 Predefined motifs

32

33

34

35

36 Correlated Motif Mining Find motifs in one set of proteins which interact with (almost) all proteins with another motif

37 Correlated Motif Mining Find motifs in one set of proteins which interact with (almost) all proteins with another motif Motif-models: PWM – so far not applied (l,d) with l=length, d=number of wildcards Score: overrepresentation, e.g. χ 2

38 Correlated Motif Mining Find motifs in one set of proteins which interact with (almost) all proteins with another motif Search: Interaction driven Motif driven

39 Interaction driven approaches Mine for (quasi-)bicliques  most-versus-most interaction Then derive motif pair from sequences

40 Motif driven approaches Starting from candidate motif pairs, evaluate their support in the network (and improve them)

41 D-MOTIF Tan BMC Bioinformatics 2006

42

43 IMSS: application of D-MOTIF Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010 protein Y protein X Test error Number of selected motif pairs

44 Experimental validation protein Y protein X Test error Number of selected motif pairs Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010

45 protein Y protein X Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010 Test error Number of selected motif pairs Experimental validation

46 protein Y protein X Van Dijk et al., Bioinformatics 2008 Van Dijk et al., Plos Comp Biol 2010 Test error Number of selected motif pairs Experimental validation

47 SLIDER Boyen et al. Trans Comp Biol Bioinf 2011

48 Faster approach, enabling genome wide search Scoring: Chi 2 Search: steepest ascent SLIDER

49 Performance assessment on simulated data Performance assessment using using protein structures Validation

50 Extension I: better coverage of network Extensions of SLIDER Boyen et al. Trans Comp Biol Bioinf 2013

51 Extensions of SLIDER Extension I: better coverage of network Extension II: use of more biological information

52 bioSLIDER DGIFELELYLPDDYPMEAPKVRFLTKI

53 conservation bioSLIDER

54 DGIFELELYLPDDYPMEAPKVRFLTKI conservation accessibility bioSLIDER

55 DGIFELELYLPDDYPMEAPKVRFLTKI conservation accessibility bioSLIDER Thresholds for conservation and accessibility Extension of motif model: amino acid similarity (BLOSUM)

56 DGIFELELYLPDDYPMEAPKVRFLTKI conservation No conservation, no accessibility Conservation and accessibility Using human and yeast data for training and optimizing parameters 0.0 0.3 0.6 Interaction-coverage 0.0 0.3 0.6 0.5 0.4 0.3 0.2 0.1 0.0 accessibility bioSLIDER Motif-accuracy Leal Valentim et al., PLoS ONE 2012

57 Application to Arabidopsis Arabidopsis Interactome Mapping Consortium, Science 2011 Input data: 6200 interactions, 2700 proteins Interface predictions for 985 proteins (on average 20 residues)

58 Ecotype sequence data (SNPs) SNPs tend to ‘avoid’ predicted binding sites In 263 proteins there is a SNP in a binding site  these proteins are much more connected to each other than would be randomly expected

59 Summary Prediction of interaction sites using protein interaction networks and protein sequences Correlated motif approaches

60 Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

61 Protein Interaction Prediction Lots of genomes are being sequenced… (www.genomesonline.org) CompleteIncomplete ARCHAEA182264 BACTERIA376714393 EUKARYA1832897 TOTAL413217514

62 Protein Interaction Prediction Lots of genomes are being sequenced… (www.genomesonline.org) CompleteIncomplete ARCHAEA182264 BACTERIA376714393 EUKARYA1832897 TOTAL413217514 But how do we know how the proteins in there work together?!

63 Protein Interaction Prediction Interactions of orthologs: interologs Phylogenetic profiles Domain-based predictions A1011001 B1011001

64 Orthology based prediction

65

66 Phylogenetic profiles A1011001 B1011101 C1011101 D0101001

67 Domain Based Predictions

68

69 Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

70 Duplications

71 Duplications and interactions Gene duplication

72 Duplications and interactions Gene duplication

73 Duplications and interactions 0.1 Myear -1 Gene duplicationInteraction loss 0.001 Myear -1

74 Duplications and interaction loss Duplicate pairs share interaction partners

75 Interaction network evolution Science 2011

76 Overview Introduction: protein interaction networks Sequences & networks: predicting interaction sites Predicting protein interactions Sequence and network evolution Interaction network alignment

77 Network alignment Local Network Alignment: find multiple, unrelated regions of Isomorphism Global Network Alignment: find the best overall alignment

78 PATHBLAST Kelley, PNAS 2003

79 PATHBLAST: scoring Kelley, PNAS 2003 homology interaction

80 PATHBLAST: results Kelley, PNAS 2003

81 PATHBLAST: results Kelley, PNAS 2003 For yeast vs H.pylori, with L=4, all resulting paths with p<=0.05 can be merged into just five network regions

82 Multiple alignment Scoring: Probabilistic model for interaction subnetworks Sub-networks: bottom-up search, starting with exhaustive search for L=4; followed by local search Sharan PNAS 2005

83 Multiple alignment: results Sharan PNAS 2005

84 Multiple alignment: results Applications include protein function prediction and interaction prediction Sharan PNAS 2005

85 Global alignment Singh PNAS 2008

86 Global alignment Singh PNAS 2008

87 Global alignment Alignment: greedy selection of matches Singh PNAS 2008

88 Network alignment: the future? Sharan & Ideker Nature Biotech 2006

89 Summary Interaction network evolution: mostly “comparative”, not much mechanistic Approaches exist to integrate and model network analysis within context of phylogeny (not discussed) Outlook: combine interaction site prediction with network evolution analysis

90 Exercises The datafiles “ arabidopsis_proteins.lis” and “interactions_arabidopsis.data” contain Arabidopsis MADS proteins (which regulate various developmental processes including flowering), and their mutual interactions, respectively.

91 Exercise 1 Start by getting familiar with the basic Cytoscape features described in section 1 of the tutorial http://opentutorials.cgl.ucsf.edu/index.php/Tutori al:Introduction_to_Cytoscape http://opentutorials.cgl.ucsf.edu/index.php/Tutori al:Introduction_to_Cytoscape Load the data into Cytoscape Visualize the network and analyze the number of interactions per proteins – which proteins do have a lot of interactions?

92 Exercise 2 Write a script that reads interaction data and implements a datastructure which enables further analysis of the data (see setup on next slides). Use the datafiles “ arabidopsis_proteins.lis” and “interactions_arabidopsis.data” and let the script print a table in the following format: PROTEINNumber_of_interactions Make a plot of those data

93 #two subroutines #input: filename #output: list with content of file sub read_list { my $infile=$_[0]; YOUR CODE return @newlist; } #input: protein list and interaction list #output: hash with “proteins”  list of their partners sub combine_prot_int($$) { my ($plist,$intlist) = @_; YOUR CODE return %inthash; }

94 #reading input data my @plist= read_list($ARGV[0]); my @intlist= read_list($ARGV[1]); #obtaining hash with interactions %inthash=combine_prot_int(\@plist,\@intlist); YOUR CODE #loop over all proteins and print their name and their number of interactions

95

96 In “ orthology_relations.data” we have a set of predicted orthologs for the Arabidopsis proteins from exercise 1. “ protein_information.data” describes a.o. from which species these proteins are. Finally, “ interactions.data “ contains interactions between those proteins. Use the Arabidopsis interaction data from exercise 1 to “predict” interactions in other species using the orthology information. Compare your predictions with the real interaction data and make a plot that visualizes how good your predictions are. Exercise 3


Download ppt "Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen."

Similar presentations


Ads by Google