Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey

Slides:



Advertisements
Similar presentations
Network analysis Sushmita Roy BMI/CS 576
Advertisements

The multi-layered organization of information in living systems
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Biological networks: Types and sources Protein-protein interactions, Protein complexes, and network properties.
Biological networks: Types and sources Protein-protein interactions, Protein complexes, and network properties.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
University at BuffaloThe State University of New York Young-Rae Cho Department of Computer Science and Engineering State University of New York at Buffalo.
Scale Free Networks Robin Coope April Abert-László Barabási, Linked (Perseus, Cambridge, 2002). Réka Albert and AL Barabási,Statistical Mechanics.
Proteome Network Evolution by Gene Duplication S. Cenk Şahinalp Simon Fraser University.
Biological Networks Feng Luo.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Network Statistics Gesine Reinert. Yeast protein interactions.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
6. Lecture SS 20005Cell Simulations1 V6: the “interactome” - Protein-protein interaction data is noisy and incomplete - V5: use Bayesian networks to combine.
Global topological properties of biological networks.
Advanced Topics in Data Mining Special focus: Social Networks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Biological networks: Types and origin
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Computer Science 1 Web as a graph Anna Karpovsky.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Network Clustering Experimental network mapping Graph theory and terminology Scale-free architecture Integrating with gene essentiality Robustness Lecturer:
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Network Evolution Statistics of Networks Comparing Networks Networks in Cellular Biology A. Metabolic Pathways B. Regulatory Networks C. Signaling Pathways.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Bioinformatics Center Institute for Chemical Research Kyoto University
Network resilience.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
1 Lesson 12 Networks / Systems Biology. 2 Systems biology  Not only understanding components! 1.System structures: the network of gene interactions and.
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Structures of Networks
CSCI2950-C Lecture 12 Networks
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Biological networks CS 5263 Bioinformatics.
Peer-to-Peer and Social Networks Fall 2017
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Peer-to-Peer and Social Networks
Bioinformatics 3 V6 – Biological PPI Networks - are they really scale-free? - network growth - functional annotation in the network Fri, Nov 8, 2013.
Modelling Structure and Function in Complex Networks
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Presentation transcript:

Comparative Network Analysis BMI/CS Spring 2013 Colin Dewey

Protein-protein Interaction Networks Yeast protein interactions from yeast two-hybrid experiments Largest cluster in network contains 78% of proteins lethal non-lethal slow growth unknown Knock-out phenotype (Jeong et al., 2001, Nature)

Overview Experimental techniques for determining networks Properties of biological networks Comparative network tasks

Experimental techniques Yeast two-hybrid system Protein-protein interactions Microarrays/RNA-Seq Expression patterns of mRNAs Similar patterns imply involvement in same regulatory or signaling network Knock-out studies Identify genes required for synthesis of certain molecules

Yeast two-hybrid system (Stephens & Banting, 2000, Traffic)

Microarrays (Eisen et al., 1998, PNAS) genes

Knock-out studies Rich mediaHis - media Growth? Yeast with one gene deleted

Topological properties of networks Degree: number of edges in/out of a node Average degree Degree distribution: P(k), fraction of nodes with degree k Clustering coefficient: measure of grouping in graph Path length: shortest path between two nodes Average path length

Clustering coefficient Interesting to look at C(k): average clustering coefficient of nodes with degree k

Erdös & Rényi random graphs Erdös & Rényi (1960): On the evolution of random graphs Construction Start with N vertices, zero edges Add each possible edge with probability p Expected number of edges: pN(N - 1)/2 Expected degree: p(N-1)

Properties of ER graphs Degree of nodes ~ Poisson distribution Most nodes have degree close to average degree Average path length ~ log n Clustering coefficient does not depend on degree k (Barabási & Oltvai, 2004)

Scale-free networks Barabási & Albert (1999): Emergence of scaling in random networks Random construction: Start with a few connected nodes Add nodes one at a time Add m edges between new node and previous nodes For each edge, probability of being incident to node i is degree of node j

Properties of scale- free networks Degrees: P(k) ~ k -γ (power law distribution) Most nodes have very small degree A few nodes (hubs) with large degree Average path length ~ log log n In general, flat C(k) Properties depend on value of γ (Barabási & Oltvai, 2004)

Hierarchical network Recursive generation Scale-free Clustering coefficient dependent on degree: C(k) ~ k -1 (Barabási & Oltvai, 2004)

Classifying networks Metabolic networks scale-free PPI networks scale-free Regulatory networks mixed out-degree of transcription factors is scale- free in-degree of regulated genes is exponential

Small-world networks Small-world networks are graphs with small average path length ER graphs are small-world: log n average path length Scale-free graphs often very small: log log n (for some values of ϒ ) However, biological networks are both small-world and disassortative: hubs are not often connected to each other

Evolving networks Growth Early nodes have more links Preferential attachment As new nodes added, more likely to be connected to already highly-connected nodes Leads to scale-free networks Gene duplication Major force in protein network evolution Highly-connected nodes more likely to have neighbors duplicate and add more edges

Network problems Network inference Given raw experimental data Infer network structure Motif finding Identify common subgraph topologies Module detection Identify subgraphs that perform same function Conserved modules Identify modules that are shared in networks of multiple species

Network motifs Problem: Find subgraph topologies that are statistically more frequent than expected Brute force approach Count all topologies of subgraphs of size m Randomize graph (retain degree distribution) and count again Output topologies that are over/under represented Feed-forward loop: over- represented in regulatory networks not very common

Network modules Modules: dense (highly-connected) subgraphs (e.g., large cliques or partially incomplete cliques) Problem: Identify the component modules of a network Difficulty: definition of module is not precise Hierarchical networks have modules at multiple scales At what scale to define modules?

Conserved modules Identify modules in multiple species that have “conserved” topology Typical approach: Use sequence alignment to identify homologous proteins and establish correspondence between networks Using correspondence, output subsets of nodes with similar topology

Conserved interactions Network comparison between species also requires sequence comparison Protein sets compared to identify orthologs Common technique: highest scoring BLAST hits used for establishing correspondences Y X B A humanyeast orthologs interologs interaction

Conserved modules Conserved module: orthologous subnetwork with significantly similar edge presence/absence Y X Z W C A B D humanyeast

Comparative network analysis Compare networks from different... interaction detection methods yeast 2-hybrid, mass spectrometry, etc. conditions heat, media, other stresses time points development, cell cycle species

Comparative tasks Integration Combine networks derived from different methods (e.g. experimental data types) Alignment Identify nodes, edges, modules common to two networks (e.g., from different species) Database query Identify subnetworks similar to query in database of networks

Network alignment graph Analogous to pairwise sequence alignment Y X Z W C A B D A,X B,Z D,W C,Y network alignment graph

Conserved module detection (Sharan & Ideker, 2006)

Real module example Note: a protein may have more than one ortholog in another network yeastwormfly Module for RNA metabolism (Sharan et al., 2005)

Basic alignment strategy Define scoring function on subnetworks high score ⇒ conserved module Use BLAST to infer orthologous proteins Identify “seeds” around each protein: small conserved subnetworks centered around the protein Grow seeds by adding proteins that increase alignment score

Scoring functions via Subnetwork modeling We wish to calculate the likelihood of a certain subnetwork U under different models Subnetwork model (M s ) Connectivity of U given by target graph H, each edge in H appearing in U with probability β (large) Null model (M n ) Each edge appears with probability according to random graph distribution (but with degree distribution fixed) (Sharan et al., 2005)

Noisy observations Typically weight edges in graph according to confidence in interaction (expressed as a probability) Let T uv : event that proteins u, v interact F uv : event that proteins u, v do not interact O uv : observations of possible interactions between proteins u and v

Subnetwork model probability Assume (for explanatory purposes) that subnetwork model is a clique:

Null model probability Given values for p uv : probability of edge (u,v) in random graph with same degrees How to get random graph if we don’t know true degree distribution? Estimate them:

Likelihood ratio Score subnetwork with (log) ratio of likelihoods under the two models Note the decomposition into sum of scores for each edge

Seed construction Finding “heavy induced subgraphs” is NP-hard (Sharan et al., 2004) Heuristic: Find high-scoring subgraph “seeds” Grow seeds greedily Seed techniques: for each node v: Find heavy subgraph of size 4 including v Find highest-scoring length 4 path with v

Randomizing graphs For statistical tests, need to keep degree distribution the same Shuffle step: Choose two edges (a,b), (c,d) in the current graph Remove those edges Add edges (a,d), (c,b) C A B D C A B D

Predictions from alignments Conserved modules of proteins enriched for certain functions often indicate shared function of other proteins Use to predict function of unannotated proteins Sharan et al., 2005: annotated 4,645 proteins with estimated accuracy of % Predict missing interactions Sharan et al., 2005: 2,609 predicted interactions in fly, 40 –52% accurate

Parallels to sequence analysis (Sharan & Ideker, 2006)