Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments Discussion leader: Navneet Scribe: James Computational.

Slides:



Advertisements
Similar presentations
Conceptual Clustering
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
Author: Jim C. Huang etc. Lecturer: Dong Yue Director: Dr. Yufei Huang.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Heuristic alignment algorithms and cost matrices
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Decision Tree Algorithm
Functional genomics and inferring regulatory pathways with gene expression data.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Chapter 11: Limitations of Algorithmic Power
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Reverse engineering gene networks using singular value decomposition and robust regression M.K.Stephen Yeung Jesper Tegner James J. Collins.
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Epistasis Analysis Using Microarrays Chris Workman.
Radial Basis Function Networks
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Network Aware Resource Allocation in Distributed Clouds.
Influence Maximization in Dynamic Social Networks Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, Xiaoming Sun.
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
1 CSC 321: Data Structures Fall 2013 See online syllabus (also available through BlueLine2): Course goals:  To understand.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Online Social Networks and Media
Introduction to biological molecular networks
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Output Grouping Method Based on a Similarity of Boolean Functions Petr Fišer, Pavel Kubalík, Hana Kubátová Czech Technical University in Prague Department.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
CSC 321: Data Structures Fall 2016
CSC 321: Data Structures Fall 2015
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Predicting Gene Expression from Sequence
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Presentation transcript:

Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments Discussion leader: Navneet Scribe: James Computational Network Biology BMI 826/Computer Sciences By Ewa Szczurek, Irit Gat-Viks, Jerzy Tiuryn and Martin Vingron Molecular Systems Biology, 2009

Problem Overview

Environmental stimuli trigger signaling cascades, which regulate transcription. Suppose we qualitatively understand a signaling pathway. How can we reconstruct the underlying regulatory mechanisms? Must run targeted experiments. Experimental Design problem: Find the best set of experiments to run.

Detour: Let’s Play Hangman

Hangman Some word is chosen from a dictionary least, setal, slate, stale, steal, stela, taels, tales, teals, tesla (anagrams of aestl, from Scrabble dictionary) Guess a letter to identify its position in the chosen word. Strategy?

Strategy Guessing a tells you its location in the word. We can partition the words based on this location. taels tales least slate stale teals setal steal stela tesla Guessing t gives us a different partition. taels tales teals tesla stale steal stela setalslateleast Which is better? a or t ?

Strategy Guessing t gives us a partition with more classes, with fewer words per class, than guessing a. i.e., t distinguishes the words better than a. This is captured in the notion of Entropy. taels tales least slate stale teals setal steal stela tesla taels tales teals tesla stale steal stela setalslateleast vs.

MEED Model Expansion Experiment Design

Inputs 1.A logical model of the signaling pathway 2.A set of candidate experiments 3.A set of regulation functions [Note: No high throughput TF-DNA binding data required.]

Signaling Pathway Model Stimulator variables – Environmental signals – States are different external stimulations Regulator variables – Signaling molecules controlling transcription – States: o Activated (+1) o Neutral (0) o Deactivated (-1)

Signaling Pathway Model Regulation Function – Determines state of a variable as a function upstream effectors’ state Structure – Topology of this signaling network – May be cyclic – Stimulators have 0 in-degree

Experiment Definition Stimulation – States of all the stimulator variables – i.e., all environmental signals applied in the experiment Perturbed variables – Model variables (regulators) subject to perturbation [Note: At most one perturbed variable is allowed in MEED] Perturbation states – Fixed state of perturbed variables in the experiment – e.g., knockout (-1) or over-activation (+1)

Logical Model Expt. Definition Predicted Model State Assignment of states to all the model variables. Unique predicted model state for acyclic models But cyclic models may have none or multiple model states Predicted Model State

Regulation Functions Maps states of regulators to state of the regulatory target [Note: MEED only considers single- regulator functions.] There are 27 (=3 3 ) different possible regulation functions. Not all of them are biologically relevant. Regulatory Program – Set of regulators and corresponding regulation functions – Specify “who” regulates, and “how” respectively.

Regulation Functions

Regulatory Program Predicted Response Predicted Model State Upregulated Neutral Downregulated

Predicted Profile Candidate Expts Predicted Response Predicted Profiles

Distinguishing Regulatory Programs Recall how t distinguishes words in the Hangman example. taels tales teals tesla stale steal stela setalslateleast Similarly, an experiment or a set of experiments may distinguish regulatory programs. A set of experiments distinguishes two regulatory programs if their predicted profiles are different.

Distinguishing Regulatory Programs e.g., e 2 distinguishes the regulatory programs f 2 (A) and f 3 (A) e.g., e 2 does not distinguish f 1 (A) and f 2 (A), but e 6 does. e.g., The set of experiments e 2, e 3 and e 6 does not distinguish f 1 (B) and f 3 (B) MEED tries to choose a minimal set of experiments from the candidate set that maximally distinguishes as the regulatory programs.

Entropy Score MEED tries to choose a minimal set of experiments from the candidate set that maximally distinguishes as the regulatory programs. But general problem is NP-Hard. MEED uses a greedy heuristic based on Entropy score. Suppose a list E of experiments partitions r regulatory programs into C disjoint blocks with n c programs 1 ≤ c ≤ C

Entropy is 0 when there is only one block with all programs. Entropy is log(r) if each block contains exactly one program. – i.e., if all the regulatory programs are distinguished by E Intuitively, higher entropy means programs are more spread out into more blocks. Entropy gain from adding an experiment e to a list E is [Note: Not the same as H({e}).] Entropy Score

Greedy heuristic – Start with E = empty list of experiments. – Find the experiment e with maximum entropy gain. Append e to the ordered list E. – If no experiment has any entropy gain, stop. Provably approximates the optimal solution of the NP-hard problem within a factor logarithmic in r Output list may not distinguish all regulatory programs. Output list distinguishes as many regulatory programs as the candidate experiment set. MEED Algorithm

Experimental Design Note: Everything so far is deterministic Note: We have not actually run any experiments or obtained any data yet. Everything is based on logical model predictions so far. MEED Greedy Algorithm Predicted Profiles Ordered List of Expts

Expansion Regulatory Modules – Assignment of target genes to best matching regulatory program Input – Logical model – List of experiments – Experiment measurements (gene expression profiles) Probabilistic Matching – Compare observed expression profiles with the predicted profiles and compute probability that they match. – Considered a match if probability exceeds cutoff threshold p # of experiments with p = 0.7

Expansion Predicted Profiles Regulatory Modules Observed Profiles from Expts

Evaluation

Evaluation of Experiment Design Evaluation Metric: FUP Score – Fraction of Undistinguished Pairs – FUP is 0 when all regulatory programs are distinguished – FUP is 1 when no regulatory programs are distinguished – Intuitively, smaller FUP score means more program pairs are distinguished – We want to minimize FUP score – Based only on model predictions, no measurements needed

Human Pathway Experiments Input – Tests on four human signaling pathways – Structure: 1000 cyclic models each, generated through random shuffling – Regulators: All variables that are not stimulators – Regulation function: Only activation both

Human Pathway Experiments Alternative methods 1: INDEP – same entropy measure, but list generated independently Alternative methods 2: Random network-based – Perturbed variables chosen using topological features in structure – Perturbation and stimulation states chosen randomly – R-IN_DEGREE, R-OUT_DEGREE, R-CONNECTIONS, R-TOPOL Alternative methods 3: Hybrid network-based – Perturbed variables chosen using topological features in structure – Perturbation and stimulation states chosen using MEED – M-IN_DEGREE, M-OUT_DEGREE, M-CONNECTIONS, M-TOPOL

MEED outperforms INDEP – Important to score sets of experiments together, rather than independently

MEED outperforms Network-based methods Hybrid network-based methods outperform Random methods – MEED useful in selecting states even when perturbation variables are fixed

Yeast Pathway Experiments 2 stimulators: environmental osmotic concentration, pheromone 15 regulators: all variables except Hog scaffold 6 biologically relevant regulation functions (shown before) Therefore, 90 regulatory programs 25 candidate experiments from microarray databases

MEED proposes 11 out of 25 experiments

MEED outperforms INDEP and Network-based methods Hybrid network-based methods outperform Random methods – MEED useful in selecting states even when perturbation variables are fixed

Evaluation of Expansion Procedure Unambiguous module matches exactly one regulatory program Ambiguous module matches more than one regulatory program Ambiguity network – Nodes regulatory programs that matched ambiguous modules size of ambiguous module – Edges from size node to matching regulatory programs

Example ambiguity network – After 5 experiments, there is one ambiguous module with 58 genes matching 7 regulatory programs

Evaluation of Expansion Procedure Evaluation Metric: Ambiguity score – Average number of regulatory programs identified for each gene – Intuitively, the more regulatory programs matching each ambiguous module, and the more genes it contains, the higher the overall ambiguity score. – Utilizes experimental data, not just model predictions (unlike FUP)

Comparison with extant methods: Barrett and Palsson More number of modules than B&P

MEED mostly showed lower ambiguity score than alternatives (log scale)

Other Results Some of the larger regulatory modules are significantly enriched by GO, so are functionally coherent During expansion using successive experiments, – subgraphs in ambiguity networks break up, showing reduced ambiguity scores – genes rarely get reassigned to different regulatory modules Using additional candidate experiments not in databases, MEED identified two experiments that can significantly reduce ambiguity beyond the prior 11 out of 25

Evaluation

Advantages General framework for discovering regulatory modules downstream of a studied signaling path- way. MEED significantly reduces the lab effort needed to attain the same level of ambiguity in regulatory module assignment MEED consistently outperforms alternatives, including INDEP and network-based ones. MEED outperforms prior work such as Barrett, Idekker etc. – Even though they are “online”, i.e., need results of one experiment to suggest the next one MEED does not require high throughput binding data. Logical model can encode expert knowledge succinctly.

Extensions Signaling model is not probabilistic – Should be possible to extend to probabilistic network models (e.g., Bayesian networks) – Bayesian framework can reduce reliance on correctness of prior knowledge (signaling network model) Limitation to single-regulator programs is necessary only for data and computational complexity reasons – Incorporating biologically relevant combinatorial regulatory programs may significantly improve regulatory module assignment – MEED is linear in number of regulatory programs MEED only considers steady state regulation. What about dynamic? Can we incorporate high throughput experimental data?