Computational Models in Systems Biology Karan Mangla 22 nd April, 2008
References Sachs, K., Perez, O., Pe'er, D., Lauffenburger, D. A. & Nolan, G. P. Causal protein-signaling networks derived from multiparameter single- cell data. Science 308, (2005) Fisher, J. & Henzinger, T. A. Executable cell biology. Nat Biotech 25, (2007)
Overview Introduction to Systems Biology Review of Modeling Techniques An Example of Systems Biology in action
Systems Biology Goal of systems biology: How do the individual parts interact to yield system behavior? Biology has focused on figuring out the pieces But what happens when you fit them together? Slide courtesy of Prof. David Dill, Stanford University
Large Data sets in Biology Protein Interaction Maps Synthetic Lethality Tests Genome Sequencing DNA Microarray Capture the relative quantities of a large number mRNA’s in the cell
Need for Systems Biology Large data sets Need to store, integrate and analyze this information into a coherent system Simple diagrammatic representation schemes can no longer provide usable information
A Simple Kohn Map Source:
Types of Models Mathematical models are used to represent actual quantitative relations between the molecules in the system Used widely in physics Generally use a system of differential equations to represent the process Can be simulated and, in some cases, analyzed Require very detailed knowledge of the system
A Mathematical Model Souce:
Computational Models Allow for abstract representations of biological processes Have an inherent execution scheme attached to the model Certain techniques create finite state machines which can be model checked
Model Checking A technique to analyze finite state machines Essentially can check for certain temporal properties along all possible executions of the machine Properties are of two types LTL : Only temporal properties In the next state, eventually, always CTL : Temporal and path properties Does there exist a path, Along all paths
Criteria for Evaluating Models Scalability of the modeling scheme Completeness of representation Ability to incorporate a variety of effects at different levels of abstraction Ease and Intuitiveness of the modeling scheme The scheme should be related to actual biology Tools available for the analysis of the information encoded in the model
Types of Biological Processes Gene Regulatory Networks Source:
Metabolic Pathways Source:
Protein Interaction Pathways Source:
Computational Models Boolean Network: Each molecule is considered a node with states as active or inactive Connections between molecules define activation or inhibition of one molecule by another A molecule is considered to become active if the sum of its activation is smaller than the sum of its inhibitions
Robustness of the Yeast Cell Cycle Built a boolean network for the yeast cell cycle Identified one fixed point attracting 86% of the states Found that the cell cycle steps are extremely stable Proc Natl Acad Sci U S A Apr 6;101(14): Epub 2004 Mar 22
Analysis of Boolean Networks Good representation of genetic regulatory networks Cannot capture effects such as strength of inhibition or activation Cannot capture protein interactions easily Useful analysis tools such as finding stable states for the system and looking for cycles Can be used to study large networks due to simplified dynamics
Petri Net Modeling Two types of nodes: places and transitions Edges are either from places to transitions or transitions to places State of the system is defined by the places holding tokens t2t2 t1t1 p1p1 p3p3 p2p2 t3t3
Petri Net Modeling Any transition for which all incoming places have tokens is active State of the system changes when an active transition fires shifting tokens from in-places to out-places t2t2 t1t1 p1p1 p3p3 p2p2 t3t3 t 1 fires
Petri Net Modeling Any transition for which all incoming places have tokens is active State of the system changes when an active transition fires shifting tokens from in-places to out-places t2t2 t1t1 p1p1 p3p3 p2p2 t3t3 t 1 fires
Petri Net Modeling Any transition for which all incoming places have tokens is active State of the system changes when an active transition fires shifting tokens from in-places to out-places t2t2 t1t1 p1p1 p3p3 p2p2 t3t3 t 1 fires
Petri Net Modeling Any transition for which all incoming places have tokens is active State of the system changes when an active transition fires shifting tokens from in-places to out-places t2t2 t1t1 p1p1 p3p3 p2p2 t3t3 t 2 fires
Petri Net Modeling Any transition for which all incoming places have tokens is active State of the system changes when an active transition fires shifting tokens from in-places to out-places t2t2 t1t1 p1p1 p3p3 p2p2 t3t3 t 2 fires
Petri Net Modeling Any transition for which all incoming places have tokens is active State of the system changes when an active transition fires shifting tokens from in-places to out-places t2t2 t1t1 p1p1 p3p3 p2p2 t3t3 t 2 fires
Pathalyzer A place represents a molecule, a location and an activation state Transitions represent reactions possible in the process Source:
Factors of Petri Net pathways More general modeling scheme than Boolean Networks Does not intuitively capture inhibition of reactions Can utilize a wide variety of existing tools for analysis associated with Petri Nets
Interacting State Machines Model biological systems as state machines Allow multiple levels of hierarchy to capture different levels of detail in biological systems Model concurrency through definition of parallel communicating state machines Source: “Statecharts: A Visual Formalism for Complex Systems”, Jeff Pang
Modeling C. Elegans Vulval Development using StateChart Actual Biology StateChart Model
Process-Calculus Model molecules as communicating processes Model reactions as communication between these processes Try to capture the underlying constraints behind interactions Source: Phillips et. al, Bioconcur, 2004
Pi-Calculus Source: Phillips et. al, Bioconcur, 2004
Analysis of pi-calculus Assume that all reactions are binary interactions between molecules Good at modeling protein interaction networks Poor capability to abstract to gene regulatory networks Analyzed through stochastic simulation of model
Hybrid Models Combine mathematical models with computational models Have discrete variables controlled by discrete state changes Have continuous variables with rate of change governed by discrete variable
Delta-Notch Example A cell signaling process Constructed as a hybrid model Steady state analysis of the model reveals possible steady configurations of the system
Analysis of Hybrid Models Very powerful modeling tools Can incorporate models at different levels of abstractions Restricted by feasibility of the size of the models Analysis is based on simulation
Challenges for the Future
Discovering Signaling Pathways
Problem Definition Cell Processes require numerous cellular signaling pathways Information flow occurs through a cascade of molecules being modified chemically and physically These transitions activate molecules allowing further propagation of the signal Source:
Traditional Means of Studying Pathways Identify the phenotypic response generated by the pathway Construct mutants to identify genes involved in the pathway Perform double mutant experiments to discover relation between genes to understand causality in the pathway
Drawbacks Cannot capture interactions between the different pathways Cannot consider changes in behavior of the pathway under varied conditions
Algorithm for Discovering Signaling Pathways
Flow Cytometry Cells are treated with antibodies which stain specific phosphorylated proteins in the cells These cells are injected into a sheath flow to cross a laser one cell at a time Light scatter and light excitation are used to identify quantity of stained molecules Source:
Sample Pathway
Modeling Scheme-Bayesian Networks A Bayesian network over a set X is a representation of the joint probability distribution over X The representation consists of a directed acyclic graph with variables as nodes and conditional distribution of each variable given its parent Each variable is independent of its non-descendants, given its paren X1X1 X4X4 X2X2 X5X5 X3X3 Source: Peer et. al
Example- A Simple Garden There are two events which could cause grass to be wet: either the sprinkler is on or it's raining Also, suppose that the rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler is usually not turned on) Source:
Modeling Signaling Pathways using Bayesian Networks Model molecules in specific activation states as variables Arcs represent dependencies between molecules Direction of arc is decided using intervention data
Need for Bayesian Learning
Bayesian Inference Algorithm Use standard scoring metrics that reward relatively simple models Adapt model to incorporate interventions
Bayesian Inference Algorithm Start with a random network Explore the possible networks with steps of addition, deletion or reversal of single arc Accept transition if score is increased
Choosing High-Confidence Edges Process initialized 500 times with different random graphs Choose only the high confidence networks Select final edges present in >85% of the high confidence graphs
Results- A High Accuracy Map of the Signal Causality Pathway
Features of the Approach
Experimental Validation of Hypothesis Tested reported edges experimentally To test Erk1 on Akt causality used small interfering RNA to inhibit Erk1
Advantages of Flow Cytometry Ability to observe molecular quantities in each cell separately preventing population averaging of results Large amounts of data generated to enable accurate prediction of pathway structure Possible to apply a variety of intervention reagents to further classify inter-pathway connections
Verification of importance of Flow Cytometry Applied Bayesian Network Analysis on 3 different data sets An observation only data set A population averages data set A truncated individual cell data set
Future Possibilities Flow Cytometry will grow in power as more antibodies are discovered to allow measurement of different molecules Handling shortcomings due to need for acyclic graphs enforced by Bayesian Networks All three edges missed in this paper were due to the acyclic condition
Thank You