341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London Winter 2011.

Slides:



Advertisements
Similar presentations
341: Introduction to Bioinformatics
Advertisements

Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Analysis and Modeling of Social Networks Foudalis Ilias.
Network Properties 1.Global Network Properties ( Chapter 3 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) 1)Degree distribution.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Analysis of Social Media MLD , LTI William Cohen
Information Networks Small World Networks Lecture 5.
Introduction to Bioinformatics
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Network Statistics Gesine Reinert. Yeast protein interactions.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Advanced Topics in Data Mining Special focus: Social Networks.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Summary from Previous Lecture Real networks: –AS-level N= 12709, M=27384 (Jan 02 data) route-views.oregon-ix.net, hhtp://abroude.ripe.net/ris/rawdata –
Computer Science 1 Web as a graph Anna Karpovsky.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Information Networks Power Laws and Network Models Lecture 3.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
A graph theory approach to characterize the relationship between protein functions and structure of biological networks Serene Wong March 15, 2011.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Random-Graph Theory The Erdos-Renyi model. G={P,E}, PNP 1,P 2,...,P N E In mathematical terms a network is represented by a graph. A graph is a pair of.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Optimal Network Alignment with Graphlet Degree Vectors
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Complementarity of network and sequence information in homologous proteins March, Department of Computing, Imperial College London, London, UK 2.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Networks Igor Segota Statistical physics presentation.
Class 9: Barabasi-Albert Model-Part I
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Bioinformatics Center Institute for Chemical Research Kyoto University
341: Introduction to Bioinformatics
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Informatics tools in network science
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
A Place-based Model for the Internet Topology Xiaotao Cai Victor T.-S. Shi William Perrizo NDSU {Xiaotao.cai, Victor.shi,
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Network (graph) Models
Structures of Networks
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Biological networks CS 5263 Bioinformatics.
Section 8.6: Clustering Coefficients
Section 8.6 of Newman’s book: Clustering Coefficients
Clustering Coefficients
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London Winter 2011

2 2 Topics Introduction to biology (cell, DNA, RNA, genes, proteins) Sequencing and genomics (sequencing technology, sequence alignment algorithms) Functional genomics and microarray analysis (array technology, statistics, clustering and classification) Introduction to biological networks Introduction to graph theory Network properties  Global: network/node centralities  Local: network motifs and graphlets Network models Network/node clustering Network comparison/alignment Software tools for network analysis Interplay between topology and biology 2

3 3 Topics Introduction to biology (cell, DNA, RNA, genes, proteins) Sequencing and genomics (sequencing technology, sequence alignment algorithms) Functional genomics and microarray analysis (array technology, statistics, clustering and classification) Introduction to biological networks Introduction to graph theory Network properties  Global: network/node centralities  Local: network motifs and graphlets Network models Network/node clustering Network comparison/alignment Software tools for network analysis Interplay between topology and biology 3

Network properties: summary of last class Network Comparisons: Large network comparison is computationally hard due to NP- completeness of the underlying subgraph isomorphism problem: Given 2 graphs G and H as input, determine whether G contains a subgraph that is isomorphic to H.subgraph isomorphic Thus, network comparisons rely on easily computable heuristics (approximate solutions), called “network properties” Network properties can roughly & historically be divided in two categories: 1.Global network properties: give an overall view of the network, but might not be detailed enough to capture complex topological characteristics of large networks. 2.Local network properties: more detailed network descriptors which usually encompass larger number of constraints, thus reducing degrees of freedom in which the networks being compared can vary. 4

Network properties: summary of last class 1. Global Network Properties Readings: Chapter 3 of “Analysis of biological networks” by Junker and Schreiber. Global Network Properties: 1)Degree distribution 2)Average clustering coefficient 3)Clustering spectrum 4)Average Diameter 5)Spectrum of shortest path lengths 6)Centralities 5

6 2. Local Network Properties Readings: Chapter 5 of “Analysis of Biological Networks” by Junker and Schreiber. 1)Network motifs 2)Graphlets Two network comparison measures based on graphlets: 2.1) Relative Graphlet Frequency Distance between two networks 2.2) Graphlet Degree Distribution Agreement between two networks Network properties: summary of last class

7 1) Network motifs (Uri Alon’s group, ’02-’04) Also, see Pajek, MAVisto, and FANMOD

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg , ) Graphlets 2.1) Reltive graphlet frequency distance between two networks

2) Graphlets 2.1) Graphlet degree distribution agreement between two networks N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg , Graphlet Degree (GD) vectors, or “node signatures” 2) Graphlets 2.1) Graphlet degree distribution agreement between two networks

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg , Signature Similarity Measure between nodes u and v 2) Graphlets 2.1) Graphlet degree distribution agreement between two networks

Software that implements many of these network properties and compares networks with respect to them: GraphCrunch

Software that implements many of these network properties and compares networks with respect to them: GraphCrunch

Software that implements many of these network properties and compares networks with respect to them: GraphCrunch

Another Software: Cytoscape

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008: , 2008 (Highly Visible). Examples of signatures and signature similarities:

40% SMD1 PMA1 YBR095C T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008: , 2008 (Highly Visible). Examples of signatures and signature similarities:

T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008: , 2008 (Highly Visible). Examples of signatures and signature similarities:

90%* SMD1 SMB1 RPO26 T. Milenković and N. Pržulj, “Uncovering Biological Network Function via Graphlet Degree Signatures,” Cancer Informatics, 2008: , 2008 (Highly Visible). *Statistically significant threshold at ~85% Examples of signatures and signature similarities:

Later we will see how to use this and other techniques to link network structure with biological function

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, Generalize Degree Distribution of a network The degree distribution measures: the number of nodes “touching” k edges for each value of k

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” Bioinformatics, vol. 23, pg. e177-e183, 2007.

/ sqrt(2) (  to make it between 0 and 1) This is called Graphlet Degree Distribution (GDD) Agreement between networks G and H.

Software that implements many of these network properties and compares networks with respect to them: GraphCrunch

Software that implements many of these network properties and compares networks with respect to them: GraphCrunch

27 Topics Introduction to biology (cell, DNA, RNA, genes, proteins) Sequencing and genomics (sequencing technology, sequence alignment algorithms) Functional genomics and microarray analysis (array technology, statistics, clustering and classification) Introduction to biological networks Introduction to graph theory Network properties  Network/node centralities  Network motifs Network models Network/node clustering Network comparison/alignment Software tools for network analysis Interplay between topology and biology 27

What is a network (graph) model?

Does the model network fit the data? Use network properties: Local Global Why? “Hardness” of graph theoretic problems  E.g. NP-completeness of subgraph isomorphism Cannot exactly compare/align networks Use heuristics (approximate solutions) Exact comparison inappropriate in biology Due to biological variation Noise  revise models as data sets evolve

Why model networks? Understand laws  reproduction/predictions Network models have already been used in biological applications:  Network motifs (Shen-Orr et al., Nature Genetics 2002, Milo et al., Science 2002)  De-noising of PPI network data (Kuchaiev et al., PLoS Comp. Biology, 2009)  Guiding biological experiments (Lappe and Holm, Nature Biotechnology, 2004)  Development of computationally easy algorithms for PPI nets that are computationally intensive on graphs in general (Przulj et al., Bioinformatics, 2006)

Network models We will cover the following network models: I.Erdos–Renyi random graphs II.Generalized random graphs (with the same degree distribution as the data networks) III.Small-world networks IV.Scale-free networks V.Hierarchical model VI.Geometric random graphs VII.Stickiness index-based network model

Erdos–Renyi random graphs (ER) Model a data network G(V,E) with |V|=n and |E|=m An ER graph that models G is constructed as follows:  It has n nodes  Edges are added between pairs of nodes uniformly at random with the same probability p  Two (equivalent) methods for constructing ER graphs: G n,p : pick p so that the resulting model network has m edges G n,m : pick randomly m pairs of nodes and add edges between them with probability 1

Erdos–Renyi random graphs (ER) Number of edges, |E|=m, in G n,p is: Average degree is:

Erdos–Renyi random graphs (ER) Many properties of ER can be proven theoretically (See: Bollobas, "Random Graphs," 2002) Example: When m=n/2,suddenly the giant component emerges, i.e.: One connected component of the network has O(n) nodes The next largest connected component has O(log(n)) nodes

Erdos–Renyi random graphs (ER) The degree distribution is binomial: For large n, this can be approximated with Poisson distribution: where z is the average degree However, currently available biological networks have power-law degree distribution

Erdos–Renyi random graphs (ER) Clustering coefficient, C, of ER is low (for low p) C=p, since probability p of connecting any two nodes in an ER graph is the same, regardless of whether the nodes are neighbors However, biological networks have high clustering coefficients

Erdos–Renyi random graphs (ER) Average diameter of ER graphs is small  It is equal to Biological networks also have small average diameters Summary

Generalized random graphs (ER-DD) Preserve the degree distribution of data (“ER-DD”) Constructed as follows:  An ER-DD network has n nodes (so does the data)  Edges are added between pairs of nodes using the “stubs method”

Generalized random graphs (ER-DD) The “stubs method” for constructing ER-DD graphs:  The number of “stubs” (to be filled by edges) is assigned to each node in the model network according to the degree distribution of the real network to be modeled  Edges are created between pairs of nodes with “available” stubs picked at random  After an edge is created, the number of stubs left available at the corresponding “end nodes” of the edges is decreased by one  Multiple edges between the same pair of nodes are not allowed

Generalized random graphs (ER-DD) Summary 2 global network properties are matched by ER-DD How about local network properties ( graphlet frequencies )?  Low-density graphlets are over-represented in ER and ER-DD  However, data have lots of dense graphlets, since they have high clustering coefficients

Small-world networks (SW) Watts and Strogatz, 1998 Created from regular ring lattices by random rewiring of a small percentage of their edges E.g.

Small-world networks (SW) SW networks have:  High clustering coefficients – introduced by “ring regularity”  Large average diameters of regular lattices – fixed by randomly re-wiring a small percentage of edges Summary

Scale-free networks (SF) Power-law degree distributions: P(k) = k −γ  γ > 0; 2 < γ < 3

Scale-free networks (SF) Power-law degree distributions: P(k) = k −γ  γ > 0; 2 < γ < 3

Scale-free networks (SF) Different models exist, e.g.:  Preferential Attachment Model (SF-BA) (Barabasi-Albert, 1999)  Gene Duplication and Mutation Model (SF-GD) (Vazquez et al., 2003)

Scale-free networks (SF) Preferential Attachment Model (SF-BA)  “Growth” model: nodes are added to an existing network  New nodes preferentially attach to existing nodes with probability proportional to the degrees of the existing nodes; e.g.:  This is repeated until the size of SF network matches the size of the data  “Rich getting richer”  The starting network strongly influences the properties of the resulting network ( F. Hormozdiari, et al., PLoS Computational Biology, 3(7):e118, July )PLoS Computational Biology  SF-BA: particularly effective at describing Internet

Scale-free networks (SF) Gene Duplication and Mutation Model (SF-GD)  Biologically motivated  Attempts to mimic gene duplication and mutation processes

Scale-free networks (SF) Gene Duplication and Mutation Model (SF-GD)  At each time step, a node is added to the network as follows:

Scale-free networks (SF) Summary

Hierarchical model Preserves network “modularity” via a fractal- like generation of the network

Hierarchical model These graphs do not match any biological data and are highly unlikely to be found in data sets

Geometric random graphs “Uniform” geometric random graphs (GEO) N. Przulj lab, Geometric gene duplication and mutation model (GEO-GD) N. Przulj et al., PSB 2010

Geometric random graphs “Uniform” geometric random graphs (GEO)  Take any metric space and, using a uniform random distribution, place nodes within the space  If any nodes are within radius r (calculated via any chosen distance norm for the space), they will be connected  Choose r so that the size of the GEO network matches that of the data  There are many possible metric spaces (e.g., Euclidean space)  There are many possible distance norms (e.g. the Euclidean distance, the Chessboard distance, and the Manhattan/Taxi Driver distance)

Geometric random graphs “Uniform” geometric random graphs (GEO) Summary

Geometric random graphs Geometric gene duplication and mutation model (GEO-GD)  Gene duplications and mutations can be used to guide the growth process in geometric graph

Geometric random graphs Geometric gene duplication and mutation model (GEO-GD)  Gene duplications and mutations can be used to guide the growth process in geometric graph

Geometric random graphs Geometric gene duplication and mutation model (GEO-GD)  Gene duplications and mutations can be used to guide the growth process in geometric graph

Geometric random graphs Geometric gene duplication and mutation model (GEO-GD)  Gene duplications and mutations can be used to guide the growth process in geometric graph

Geometric random graphs Geometric gene duplication and mutation model (GEO-GD)  Gene duplications and mutations can be used to guide the growth process in geometric graph

Geometric random graphs Geometric gene duplication and mutation model (GEO-GD)  Gene duplications and mutations can be used to guide the growth process in geometric graph

Geometric random graphs Geometric gene duplication and mutation model (GEO-GD)  This variant also reproduces graphlet properties of the empirical dataset  Also, these networks have power-law degree distributions -GD

Stickiness index-based network model ( N. Przulj and D. Higham, Journal of the Royal Society Interface, vol 3, num 10, pp , ) Based on the stickiness index:  A number based on the a protein’s normalized degree in a PPI network  Used to summarize the abundance and popularity of binding domains of a protein Assumption: a high degree protein has many binding domains  However, remember “date” vs. “party” hubs A pair of proteins is more likely to interact under this model if both proteins have high stickiness indices The probability of an edge between two nodes is the product of their stickiness indices

Stickiness index-based network model “Sticky networks” have the expected degree distribution of the data Also, they mimic well the clustering coefficients and the diameters of real-world networks Summary

Software that implements many of these network models and evaluates their fit to data networks with respect to a variety of network properties (but there are others): GraphCrunch:

Software that implements many of these network models and evaluates their fit to data networks with respect to a variety of network properties (but there are others): GraphCrunch:

76 Topics Introduction to biology (cell, DNA, RNA, genes, proteins) Sequencing and genomics (sequencing technology, sequence alignment algorithms) Functional genomics and microarray analysis (array technology, statistics, clustering and classification) Introduction to biological networks Introduction to graph theory Network properties  Global: network/node centralities  Local: network motifs and graphlets Network models Network/node clustering Network comparison/alignment Software tools for network analysis Interplay between topology and biology 76