Network-based interpretation of perturbations to living systems

Slides:



Advertisements
Similar presentations
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Advertisements

Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
ResponseNet Esti Yeger-Lotem, Laura Riva, Linhui Julie Su, Aaron D Gitler, Anil G Cashikar, Oliver D King, Pavan K Auluck, Melissa L Geddie, Julie S Valastyan,
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene- induced Signaling Anthony Gitter Cancer Bioinformatics.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
ResponseNet revealing signaling and regulatory networks linking genetic and transcriptomic screening data CSE Fall.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
The Bi-Module problem: new algorithms and applications Group meeting January 2013 David Amar.
HIT’nDRIVE: Multi-driver Gene Prioritization Based on Hitting Time Raunak Shrestha, Ermin Hodzic, Jake Yeung, Kendric Wang, Thomas Sauerwald, Phuong Dao,
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
A randomized linear time algorithm for graph spanners Surender Baswana Postdoctoral Researcher Max Planck Institute for Computer Science Saarbruecken,
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Graph clustering to detect network modules
Spectral clustering of graphs
Non negative matrix factorization for Global Network Alignment
Cohesive Subgraph Computation over Large Graphs
Journal club Jun , Zhen.
Hierarchical Agglomerative Clustering on graphs
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Multi-task learning approaches to modeling context-specific networks
Probabilistic models for interpreting perturbations in networks: Nested effect models Sushmita Roy Computational Network Biology.
Dynamics and context-specificity in biological networks
Dept of Biomedical Informatics University of Pittsburgh
Identifying Signaling Pathways
Hierarchical clustering approaches for high-throughput data
1 Department of Engineering, 2 Department of Mathematics,
Hidden Markov Models Part 2: Algorithms
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Schedule for the Afternoon
Algorithms for Budget-Constrained Survivable Topology Design
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Solving the Minimum Labeling Spanning Tree Problem
Principle of Epistasis Analysis
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Network-Based Coverage of Mutational Profiles Reveals Cancer Genes
Survey on Coverage Problems in Wireless Sensor Networks
Presentation transcript:

Network-based interpretation of perturbations to living systems Sushmita Roy sroy@biostat.wisc.edu Computational Network Biology Biostatistics & Medical Informatics 826 Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu Dec 1st, 6th 2016

Review of topics we have seen Integrative network reconstruction Bayesian networks, Dependency networks, Module NEtworks Examining dynamics and context-specificity of networks Dynamic Bayesian networks, Active subgraphs, Input-output HMMs Finding modules on graphs Graph clustering Comparison of graphs Graph alignment and clustering Networks for prioritizing genes Random walks and diffusion kernels Networks for interpreting perturbations Diffusion kernels, factor graphs, information flow on graphs

Perturbations in networks Understanding genetic perturbations are important in biology Genetic perturbations are useful to identify the function of genes What happens if knock gene A down? Measure some morphological phenotype like growth rate or cell size Measure global expression signatures Perturbations can be artificial or natural Artificial perturbations Deletion strains Natural perturbations Single nucleotide polymorphisms Natural genetic variation Perturbations in a network can affect Nodes or edges Edge perturbations Mutations on binding sites

Types of algorithms used to examine perturbations in networks Graph diffusion followed by subnetwork finding methods HOTNET NETBAG Information flow-based methods (also widely used for integrating different types of data) Prize collecting steiner tree Min cost max flow Probabilistic graphical model-based methods Factor graphs Nested Effect Models (NEMs)

Identification of subnetworks perturbed in diseases Cho D-Y, Kim Y-A, Przytycka TM (2012) Chapter 5: Network Biology Approach to Complex Diseases. PLoS Comput Biol 8(12): e1002820. doi:10.1371/journal.pcbi.1002820 http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002820

Motivation of HOTNET Somatic mutations play a major role in cancer Mutation profiles of cancers is very heterogeneous Tumors harbor on average approximately 80 somatic mutations, but two tumors rarely have the same complement of mutations Thousands of genes can be mutated in cancer This makes it difficult to identify “driver” mutations vs “passenger” mutations Mutations at the pathway level (group of genes) is likely responsible for a particular type of cancer Can we identify subnetworks (representing new pathways) that are significantly mutated? F. Vandin, E. Upfal, and B. J. Raphael, "Algorithms for detecting significantly mutated pathways in cancer." Journal of computational biology 2011

HOTNET problem setup Given A network of protein-protein interactions A set of patient tumour mutation profiles Do Find “significantly” mutated subgraphs A subgraph that best connects these genetic alterations Best means a subgraph that includes as many tumour samples with as few genes How? Find the global influence of mutations on a particular gene Search for a subnetwork in this global influence graph that is significantly mutated

HOTNET’s approach vs ActiveSubgraphs HOTNET aims to find a subnetwork that has significantly many more mutations than random subnetworks This bears some resemblance to the ActiveSubgraph approach where we were trying to find subnetworks significantly up or down-regulated The key differences in HOTNET is that we do not have (gene expression) measurement of mutations for all genes only a small number of genes maybe mutated

Key steps of HOTNET algorithm Build an influence graph which specifies the influence of one node over another Graph diffusion Builds a network with the tested genes as well as their neighborhood Find significantly mutated subnetworks (two ways) include genes mutated in a lot of samples Enhanced influence model where the influence edges are weighted by the number of mutations Test for the significance of the number of subnetworks of a particular size

Diffusion kernel used in HOTNET HOTNET uses a specific type of kernel called the heat diffusion kernel L denote the graph Laplacian Let γ denote the constant rate at which heat is lost at any node Lγ is L+γI Let s be a source node The influence of s on all n nodes at time t is denoted as Influence of s on node 1

Diffusion kernel used in HOTNET Rate at which diffusion occurs from a source node on the graph is given by where bs be a unit vector which is 1 for node s and 0 otherwise The influence on all nodes at steady-state is given by This diffusion process is very similar to what we had in the Diffusion kernel we used for ranking genes

Graph diffusion to downplay hub intermediate nodes Mutations in a linear chain are more “interesting” than in a star graph

Computing the influence between vertex pairs Assume we have two vertices u and v Let i(u,v) be the influence from u to v w(u,v) is the influence between u and v and is min[i(u,v),i(v,u)] The influence graph is thus an n X n weighted graph Further prune by removing edges with weight w(u,v)<δ

Key steps of HOTNET algorithm Build an influence graph which specifies the influence of one node over another Graph diffusion Builds a network with the tested genes as well as their neighborhood Find significantly mutated subnetworks (two ways) Set cover to include genes mutated in many samples Enhanced influence model where the influence edges are weighted by the number of mutations Test for the significance of the number of subnetworks of a particular size

HOTNET’s maximal connected cover approach Given A weighted influence network G=(V,E) T be a subset of V, comprising genes with a mutation : the set of samples with mutation in gene vi Do: Find a subnetwork over gene subset C that covers a maximal number of mutated samples Formally, we want to find a C={v1,.. vk} such the following is maximized:

Heuristic algorithm to find a maximal connected cover Finding the maximal connected cover set is computationally difficult A heuristic algorithm is used Add a vertex v such it is connected to the current vertex set via a node u, and maximizes the ratio of the number of new samples covered to the number of nodes between u and v

Heuristic algorithm to find a maximal connected cover Exploration Pv(u)\Pcv is the numerator and specifies what is not covered. Lv(u) is the set of nodes on the path to u from v. lv(u)\Cv is the fewest additional nodes that need to be added. Keep adding a neighbor that has the maximal coverage with fewest additional vertices

Enhanced influence model The enhanced influence model was a more computationally efficient approach Enhance the influence measure between genes by the number of mutations observed in each of these Specifically, let vj and vk be two genes with a mutation Remove edges with influence <δ Decompose the associated enhanced influence graph into connected components Enhanced influence Number of samples with mutation in vj

Statistical analysis for determining significance of subnetworks Null distribution of subnetworks Assign mutations uniformly at random Shuffle gene labels in mutation data (preserve mutation frequencies) Assess significance of the total number of subnetworks with s or more genes Multiple testing correction

Application of HOTNET to cancer dataset 453 mutations in 601 genes in 91 GBM samples 1,013 mutations in 623 genes in 189 samples of lung adenocarcinoma Protein-protein interaction network with 18796 genes and 37,107 edges

HOTNET recovers pathways relevant to cancer

Application of HOTNET of pan-cancer mutation analysis More recently, an updated version of HOTNET (HOTNET2) was applied to mutation profiles of samples from 12 different cancers Dataset description After data pre-processing there were 3,110 samples with mutations in 11,565 genes Genes mutation frequency varied a lot: 1-1,291 samples M. D. M. Leiserson, F. Vandin, H.-T. Wu, J. R. Dobson, J. V. Eldridge, J. L. Thomas, A. Papoutsaki, Y. Kim, B. Niu, M. McLellan, M. S. Lawrence, A. Gonzalez-Perez, D. Tamborero, Y. Cheng, G. A. Ryslik, N. Lopez-Bigas, G. Getz, L. Ding, and B. J. Raphael, "Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes," Nature Genetics, vol. 47, no. 2, pp. 106-114, Dec. 2014. [Online].

HOTNET2 for pan-cancer mutation analysis Very hot genes Leiserson et al . 2014, Nature Genetics

HOTNET2 for pan-cancer mutation analysis Leiserson et al . 2014, Nature Genetics

Overview of HOTNET2 results Which cancer types? Do mutations co-occur in the same pathway or is one type of mutation exclusive of the other in the same pathway Mutations were found in expected and new pathways: PI3K, TP53, cohesin

SWI/SNF complex pathways identified by HOTNET2 Number of samples

HOTNET summary An algorithm to find significantly mutated subnetworks Based on creating an “influence graph”, followed by identification of “interesting” subnetworks Strengths Conceptually simple and helped identify important known and novel pathways Non-local, less sensitive to network hubs Weakness Uses undirected, unweighted graphs Not clear how to integrate other types of measurements Relies on known interaction networks

RECAP: Types of algorithms used to examine perturbations in networks Graph diffusion followed by subnetwork finding methods HOTNET NETBAG Information flow-based methods (also widely used for integrating different types of data) Min cost max flow Prize collecting steiner tree Probabilistic graphical model-based methods Factor graphs Nested Effect Models (NEMs)

Problem definition of information flow Given two node sets and a weighted directed network with edge weights corresponding to the flow between two nodes Do Find the subnetwork that maximizes the flow between the two node sets

Information flow between sink to source nodes Sink nodes

Information flow-based methods Used for integrating different types of data, as well as for examining perturbations and their effect Integration of different types of “omics” data Min cost max flow (ResponseNet; Yeger-Lotem et al 2008) Prize-collecting Steiner tree variants (Huang & Fraenkel 2009, OmicsIntegrator) Application to genetic perturbation data Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Kim Y-A, Wuchty S, Przytycka TM (2011). PLoS Comput Biol 7(3): e1001095. doi:10.1371/journal.pcbi.1001095

Notation A flow network is defined as directed graph G=(V,E), with capacities for each edge V: vertex set E: edge set s: source node t: sink node c(u,v)>0: Capacity of edge (u,v)

Flow in a graph G A flow in G is defined by a function f that has the following properties for each edge (u,v): The value of a flow is defined as Capacity constraint Conservation of flow

Max-flow problem Given A flow network G, source s and sink t Do find a flow f with maximum value

Variation: Min cost max flow Often the question is not to maximize flow, but to find the most efficient/least expensive away of doing this In addition to the flow, there is also a cost associated with each edge For example, the cost might be inversely proportional to the edge confidence So we would try to maximize the overall flow at the smallest cost

Min cost max flow Define cost of each edge as a(u,v) Overall cost: Minimize cost subject to capacity and conservation of flow constraints This idea was used in ResponseNet tool E. Yeger-Lotem, L. Riva, L. J. J. Su, A. D. Gitler, A. G. Cashikar, O. D. King, P. K. Auluck, M. L. Geddie, J. S. Valastyan, D. R. Karger, S. Lindquist, and E. Fraenkel, "Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity." Nature genetics, vol. 41, no. 3, pp. 316-323, Mar. 2009.

Types of algorithms used to examine perturbations in networks Graph diffusion followed by subnetwork finding methods HOTNET NETBAG Information flow-based methods (also widely used for integrating different types of data) Min cost max flow Prize collecting steiner tree Probabilistic graphical model-based methods Factor graphs Nested Effect Models (NEMs)

Alternate problem definition of information flow Given A node set A weighted network Do Find the minimal graph connecting the nodes, where minimal is defined by the graph with the lowest total weight We will use a Steiner tree approach to address this problem

Steiner tree Let’s start by defining a Steiner tree Given edge-weighted graph G={V, E, w} A subset S of V A Steiner tree is a minimal length tree connecting S, including potentially intermediate nodes This problem is NP-complete

Steiner tree examples A B A B E E F C C D S={A, B, C} S={A, B, C, D}

Prize-collecting Steiner tree objective function p(i): Define prize of node i as y(i): include a node i a(i,j): Define cost of edge (i,j) x(i,j): include an edge Constrain that subnetwork must be a tree PCST objective Solve using variety of optimization techniques E.g. integer linear programming-based method (Ljubic et al, 2006) Trade-off between cost and prize

Prize-collecting Steiner trees (PCST) connect signaling proteins to gene regulation Top: functional screen hits, bottom: mRNA response Predicts relevant nodes, paths, transcription factors Cannot directly predict transcriptome effect from perturbations; edges are not oriented Image from Tuncbag et al, 2012

PCST to phosphoproteomic and transcriptomic data to find genes relevant to glioblastoma multiforme Huang et al, 2013, Figure 2

Summary Graph diffusion methods Define a graph kernel, do a random walk, find a connected subnetwork can identify significant, functional groups from input data Max flow or prize-collecting Steiner trees Find a subnetwork that connects different gene sets useful for connecting two layers of omics data All assist interpretation by showing connections and help prioritize new genes

Web and software resources GeneMANIA (Network integration and diffusion-based subnetworks) http://www.genemania.org HOTNET (Diffusion-based subnetworks) http://compbio.cs.brown.edu/projects/hotnet2/ ResponseNet (flow network) http://netbio.med.ad.bgu.ac.il/respnet/ OmicsIntegrator (PCST) http://fraenkel-nsf.csbi.mit.edu/omicsintegrator/