Presentation is loading. Please wait.

Presentation is loading. Please wait.

Complementarity of network and sequence information in homologous proteins March, 2010 1 Department of Computing, Imperial College London, London, UK 2.

Similar presentations


Presentation on theme: "Complementarity of network and sequence information in homologous proteins March, 2010 1 Department of Computing, Imperial College London, London, UK 2."— Presentation transcript:

1 Complementarity of network and sequence information in homologous proteins March, 2010 1 Department of Computing, Imperial College London, London, UK 2 Department of Computer Science, University of California, Irvine, USA International Symposium on Integrative Bioinformatics Vesna Memišević 2, Tijana Milenković 2, and Nataša Pržulj 1

2 Motivation Genetic sequences – revolutionized understanding of biology Non-sequence based data of importance, e.g.: –secondary & tertiary structure of RNA have the dominant role in RNA function (tRNA: Gautheret et al., Comput. Appl. Biosci., 1990) (rRNA: Woese et al., Microbiological Reviews, 1983) –Secondary structure-based approach – more effective at finding new functional RNAs than sequence-based alignments (Webb et al., Science, 2009) What about patterns of interconnections in PPI networks? –Can they complement the knowledge learned from genomic sequence? –Wiring patterns of duplicated proteins in PPI net – insights into evol. dist.? –Does the information about homologues captured by PPI network topology differ from that captured by their sequence? Nataša Pržulj natasha@imperial.ac.uk 2

3 Background Homologs – descend from a common ancestor: 1.Paralogs: in the same species, evolve through gene duplication events 2.Orthologs: in different species, evolve through speciation events 3 Nataša Pržulj natasha@imperial.ac.uk

4 44 Background Sequence-based homology data from: 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] 4 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

5 555 Sequence-based homology data from: 1.Clusters of Orthologous Groups – COG [1] Proteins in different genomes – sequence compared for the best hits (BeTs) The graph of BeTs constructed 2.KEGG Orthology System [2] 5 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. Background

6 666 Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] Proteins in different genomes – sequence compared for the best hits (BeTs) The graph of BeTs constructed 2.KEGG Orthology System [2] 6 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4 5 6 7

7 77 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] Proteins in different genomes – sequence compared for the best hits (BeTs) The graph of BeTs constructed Triangles in it found 2.KEGG Orthology System [2] 7 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4 5 6 7

8 888 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] Proteins in different genomes – sequence compared for the best hits (BeTs) The graph of BeTs constructed Triangles in it found 2.KEGG Orthology System [2] 8 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4 6 7

9 999 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] Proteins in different genomes – sequence compared for the best hits (BeTs) The graph of BeTs constructed Triangles in it found Triangles sharing a side merged into the groups of orthologs and paralogs 2.KEGG Orthology System [2] 9 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4 6 7

10 10 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] Proteins in different genomes – sequence compared for the best hits (BeTs) The graph of BeTs constructed Triangles in it found Triangles sharing a side merged into the groups of orthologs and paralogs 2.KEGG Orthology System [2] 10 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4

11 11 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] Proteins in different genomes – sequence compared for the best hits (BeTs) The graph of BeTs constructed Triangles in it found Triangles sharing a side merged into the groups of orthologs and paralogs No dependence on the absolute level of similarity between compared proteins 2.KEGG Orthology System [2] 11 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4

12 12 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] 12 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

13 13 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] Sequences aligned If alignment score < 10 -8 then 1 assigned as “similarity bit” Otherwise, 0 assigned as “similarity bit” “Bit vectors” constructed for a protein, over all proteins Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes Cliques found in the graph = orthology groups [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. Nataša Pržulj natasha@imperial.ac.uk

14 14 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4 5 6 7 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] Sequences aligned If alignment score < 10 -8 then 1 assigned as “similarity bit” Otherwise, 0 assigned as “similarity bit” “Bit vectors” constructed for a protein, over all proteins Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes Cliques found in the graph = orthology groups

15 15 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. 11’ 23 4 5 6 7 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] Sequences aligned If alignment score < 10 -8 then 1 assigned as “similarity bit” Otherwise, 0 assigned as “similarity bit” “Bit vectors” constructed for a protein, over all proteins Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes Cliques found in the graph = orthology groups Again, no dependence on absolute level of similarity

16 16 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] We examine yeast proteins only: Extract all possible pairs of them in COG and KEGG groups = “orthologous pairs” There are 9,643 of unique such pairs What are their topological similarities within the PPI network? 16 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

17 17 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] We examine yeast proteins only: Extract all possible pairs of them in COG and KEGG groups = “orthologous pairs” There are 9,643 of unique such pairs What are their topological similarities within the PPI network? 17 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

18 18 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] We examine yeast proteins only: Extract all possible pairs of them in COG and KEGG groups = “orthologous pairs” There are 9,643 of unique such pairs What are their topological similarities within the PPI network? 18 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

19 19 Background Sequence-based homology data from : 1.Clusters of Orthologous Groups – COG [1] 2.KEGG Orthology System [2] Previous network-topology assisted approaches: Network-alignment-based (ISORank) Yosef, Sharan & Noble, Bioinformatics, 2008 (hybrid Rankprop)  Rely heavily on sequence information  Use only limited amount of network topology 19 Nataša Pržulj natasha@imperial.ac.uk [1] Tatusov et al., BMC Bioinformatics, 4(41), 2003. [2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

20 20 Our Method We examine yeast proteins only: Extract all possible pairs of them in COG and KEGG groups = “orthologous pairs” There are 9,643 of unique such pairs What are their topological similarities within the PPI network? PPI networks are noisy We analyze the high-confidence part of yeast PPI network by Collins et al. [3]: 9,074 edges amongst 1,621 proteins Focus on proteins with degree > 3 to avoid noisy PPIs There are 175 orthologous pairs amongst 181 proteins 20 Nataša Pržulj natasha@imperial.ac.uk [3] Collins et al., Molecular and Cellular Proteomics, 6(3):439–450, 2008

21 21 Our Method Nataša Pržulj natasha@imperial.ac.uk Does PPI network topology contain homology information?  Are similarly wired proteins homologous? Does homology information obtained from network topology differ from that obtained from sequence?

22 22 Our Method Nataša Pržulj natasha@imperial.ac.uk N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

23 23 Our Method Nataša Pržulj natasha@imperial.ac.uk N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.  Induced  Of any frequency

24 24 Our Method Nataša Pržulj natasha@imperial.ac.uk Generalize node degree N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

25 25 Our Method Nataša Pržulj natasha@imperial.ac.uk N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

26 26 Our Method Nataša Pržulj natasha@imperial.ac.uk N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

27 27 T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008. Graphlet Degree (GD) vectors, or “node signatures” Nataša Pržulj natasha@imperial.ac.uk Our Method

28 28 Nataša Pržulj natasha@imperial.ac.uk Our Method Similarity measure between nodes’ Graphlet Degree vectors T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

29 29 Nataša Pržulj natasha@imperial.ac.uk Our Method T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008. Signature Similarity Measure

30 30 Nataša Pržulj natasha@imperial.ac.uk Our Method For the 181 proteins in 175 orthologous pairs, we find: Graphlet degree vectors (GDVs) in the entire PPI network GDV-similarities (GDS) = topological similarities Sequence identities using Smith-Waterman local alignment with BLOSUM50 substitution matrix as the scoring scheme We compare the GDV-similarity vs. sequence identity topology vs. sequence

31 31 Results Nataša Pržulj natasha@imperial.ac.uk Orthologous pairs often perform the same or similar function. Does GD vector similarity (GDS) imply shared biological function? Note: most GO annotations were obtained from sequences  Similar topology ~ similar sequence ~ similar function Network Topology

32 32 Results Nataša Pržulj natasha@imperial.ac.uk Orthologous proteins have high GD vector similarities Network Topology

33 33 Results Nataša Pržulj natasha@imperial.ac.uk Orthologous proteins have high GD vector similarities p-value < 0.05 85% Network Topology

34 34 Results Nataša Pržulj natasha@imperial.ac.uk Orthologous proteins have high GD vector similarities p-value < 0.05 85% > 20% of orthologous pairs have GDS > 85% Network Topology

35 35 Results Nataša Pržulj natasha@imperial.ac.uk PPI networks are noisy Random edge additions, deletions and rewirings in the PPI net Network Topology – Robustness

36 36 Results Nataša Pržulj natasha@imperial.ac.uk PPI networks are noisy Random edge additions, deletions and rewirings in the PPI net Network Topology – Robustness

37 37 Results Nataša Pržulj natasha@imperial.ac.uk PPI networks are noisy Random edge additions, deletions and rewirings in the PPI net Network Topology – Robustness

38 38 Results Nataša Pržulj natasha@imperial.ac.uk Sequence identities for the 175 orthologous pairs Sequence

39 39 Results Nataša Pržulj natasha@imperial.ac.uk Sequence identities for the 175 orthologous pairs Sequence ~70% orth. pairs have seq. identity < 35% 35%

40 40 Results Nataša Pržulj natasha@imperial.ac.uk Sequence identities for the 175 orthologous pairs Sequence ~20% orth. pairs have seq. identity > 90% 90%

41 41 Results Nataša Pržulj natasha@imperial.ac.uk Sequence identities for the 175 orthologous pairs Sequence “Twilight zone” for homology 20-35% ~70% orth. pairs have seq. identity < 35%  No dependence on the absolute similarity COG & KEGG, but triangles in the graph of best matches

42 42 85% 20%35% ~20% of orthologous pairs have signature similarities above 85% (35 pairs) ~30% of orthologous pairs have sequence identities above 35% (53 pairs) Overlap: 22 pairs (~60% of the smaller set)  Sequence and network topology  somewhat complementary slices of homology information Nataša Pržulj natasha@imperial.ac.uk Results Comparison:

43 43 Results Nataša Pržulj natasha@imperial.ac.uk 59 of the yeast ribosomal proteins – retained two genomic copies Are duplicated proteins functionally redundant? No: have different genetic requirements for their assembly and localization so are functionally distinct Also note: avg sequence identity of struct. similar prots ~8-10% Two pairs with identical sequence: Examples 100% sequence identity 50% signature similarity Degrees 25 and 5

44 44 Results Nataša Pržulj natasha@imperial.ac.uk 59 of the yeast ribosomal proteins – retained two genomic copies Are duplicated proteins functionally redundant? No: have different genetic requirements for their assembly and localization so are functionally distinct Also note: avg sequence identity of struct. similar prots ~8-10% Two pairs with identical sequence: Examples 100% sequence identity 65% signature similarity Degrees 54 and 9

45 45 Conclusions Homology information captured by PPI network topology differs from that captured by sequence Complementary sources for identifying homologs Future work: Could topological similarity be used to identify orthologs from best-hits graph analysis as done for sequences?

46 Acknowledgements This project was supported by the NSF CAREER IIS-0644424 grant Nataša Pržulj natasha@imperial.ac.uk


Download ppt "Complementarity of network and sequence information in homologous proteins March, 2010 1 Department of Computing, Imperial College London, London, UK 2."

Similar presentations


Ads by Google