Pathway Modeling and Problem Solving Environments Cliff Shaffer Department of Computer Science Virginia Tech Blacksburg, VA 24061
The Fundamental Goal of Molecular Cell Biology
Application: Cell Cycle Modeling How do cells convert genes into behavior? Create proteins from genes Protein interactions Protein effects on the cell Our study organism is the cell cycle of the budding yeast Saccharomyces cerevisiae.
S cell division G1 DNA replication G2 M (mitosis)
growth Clb5 MBF P Sic1 SCF Sic1 Swi5 Clb2 Mcm1 APC Cdc14 CDKs Cln2 SBF ? and Cln3 Bck2 DNA synthesis Inactive trimer P Clb2 Budding Cdc20 Cdh1 Mcm1 Mad2 unaligned chromosomes RENT Cdc14 APC-P Cln2 Clb2 Clb5 Lte1 SBF Esp1 Pds1 Net1 Net1P PPX Cdc15/MEN Tem1-GDP Tem1-GTP Bub2 unaligned chromosomes Cdh1 Sister chromatid separation Mcm1 Cdc20 Mitosis
Modeling Techniques One method: Use ODEs that describe the rate at which each protein concentration changes Protein A degrades protein B: … with initial condition [A](0) = A 0. Parameter c determines the rate of degradation. Sometimes modelers use “creative” rate laws to approximate subsystems
synthesisdegradation synthesis degradation binding activation inactivation Mathematical Model
Time (min) CKI mass Clb2 Cln2 Cdh1 Simulation of the budding yeast cell cycle G1S/M Cdc20
Differential equations Parameter values k1 = , v2’ = 0.001, v2” = 0.17, k3’ = 0.02, k3” = 0.85, k4’ = 0.01, k4” = 0.9, J3 = 0.01, J4 = 0.01, k9 = 0.38, k10 = 0.2, k5’ = 0.005, k5” = 2.4, J5 = 0.5, k6 = 0.33, k7 = 2.2, J7 = 0.05, k8 = 0.2, J8 = 0.05, … Experimental Data
Tyson’s Budding Yeast Model Tyson’s model contains over 30 ODEs, some nonlinear. Events can cause concentrations to be reset. About 140 rate constant parameters Most are unavailable from experiment and must set by the modeler
Fundamental Activities Collect information Search literature (databases), Lab notebooks Define/modify models A user interface problem Run simulations Equation solvers (ODEs, PDEs, deterministic, stochastic) Compare simulation results to experimental data Analysis
Modeling Lifecycle
Our Mission: Build Software to Help the Modelers Typical cycle time for changing the model used to be one month Collect data on paper lab notebooks Convert to differential equations by hand Calibrate the model by trial and error Inadequate analysis tools Goal: Change the model once per day. Bottleneck should shift to the experimentalists
Another View Current models of simple organisms contain a few 10s of equations. To model mammalian systems might require two orders of magnitude in additional complexity. We hope our current vision for tools can supply one order of magnitude. The other order of magnitude is an open problem.
JigCell Current Primary Software Components: JigCell Model Builder JigCell Run Manager JigCell Comparator Automated Parameter Estimation (PET) Bifurcation Analysis (Oscill8)
Model Builder Run Manager Comparator Parameter Values Parameter Optimizer Optimum Parameter Values
From a wiring diagram… JigCell Model Builder
N.B. Parameters are given names, not numerical values! …to a reaction mechanism … to ordinary differential equations (ode files, SBML) JigCell Model Builder
Mutations Wild type cell Mutations Typically caused by gene knockout Consider a mutant with no B to degrade A. Set c = 0 We have about 130 mutations each requires a separate simulation run
Inheritance patterns Basal Set (wild-type) Derived Set (mutant A) Derived Set (mutant B) Derived Set (mutant C) Derived Set (mutant A’) Derived Set (mutant AB) Derived Set (mutant A’C) Run Manager
JigCell Run Manager
Phenotypes Each mutant has some observed outcome (“experimental” data). Generally qualitative. Cell lived Cell died in G1 phase Model should match the experimental data. Model should not be overly sensitive to the rate constants. Overly sensitive biological systems tend not to survive
Visualize results Kumagai1Kumagai2 Comparator
Optimization How to decide on parameter values? Key features of optimization Each problem is a point in multidimensional space Each point can be assigned a value by an objective function The goal is to find the best point in the space as defined by the objective function We usually settle for a “good” point
Parameter Optimization
Error Function orthogonal distance regression Levenberg-Marquardt algorithm Parameter Optimization
Only 1 experiment shown here. The model must be fitted simultaneously to many different experiments. Parameter Optimization
Global DIRECT Search (DIViding RECTangles)
Composition Motivation Models are reaching the limits of manageability due to an increase in: Size Complexity Making a model suitable for stochastic simulation increases the number of reactions by a factor of 3-5. Models of the mammalian cell cycle will require reactions (even more for stochastic simulation).
Model Composition Notice that the yeast cell diagram contains natural components
Composition Processes Fusion Merging two or more existing models Composition Build up model hierarchy from existing models by describing their interactions and connections Aggregation Connects modular blocks using controlled interfaces (ports) Flattening Convert hierarchy back into a single “flat” model for use with standard simulators
Composition Processes
Sample Sub-models
Sample Composed Model
Composition Wizard Final Species Mapping Table
Composition Wizard Final Reaction Mapping Table
Aggregated Submodels
Final Aggregated Model
Aggregation Connector
Composition in SBML Virginia Tech’s proposed language features to support composition/aggregation being written into forthcoming SBML Level 3 definition
Stochastic Simulation ODE-based (deterministic) models cannot explain behaviors introduced by random nature of the system. Variations in mass of division Variations in time of events Differences in gross outcomes
Gillespie’s Stochastic Simulation Algorithm There is a population for each chemical species There is a “propensity” for each reaction, in part determined by population Each reaction changes population for associated species Loop: Pick next reaction (random, propensity) Update populations, propensities Slow, there are approximations to speed it up
Comments on Collaboration Domain team routinely underestimates how difficult it is to create reliable and usable software. CS team routinely underestimates how difficult it is to stay focused on the needs of the domain team. Partial solution: truly integrate.
How to Succeed in CBB Programming skills are necessary but not sufficient Math is usually the biggest bottleneck Statistics for Bioinformatics Numerical analysis, optimization, differential equations for computational biology Chemistry/biochemistry are good choices for domain knowledge You have to have an “interdisciplinary attitude”