Stochastic Simulation of Biological Systems
Chemical Reactions Reactants Products m 1 R 1 + m 2 R 2 + ··· + m r R r – ! n 1 P 1 + n 2 P 2 + ··· + n p P p
Example 3 reactions, 2 species: Y 1 ! 2Y 1 (1) Y 1 + Y 2 ! 2Y 2 (2) Y 2 ! (3) Lotka-Volterra
Representations (I) Y1Y2 110= A A = Net Effect Matrix
Example x(t 0 ) = (105) Reaction (2) fires at time t 1 > t 0 Then x(t 1 ) = x(t 0 ) + r.A = x(t 0 ) + (0 1 0).A = (105) + (-11) = (96)
Representations (II) SBML Electronic XML-based system Represent models in a standard format Sharing models Transferring models between different software systems
Simulating the LV model ODEs Use 2 assumptions (i) Continuous quantities (not integers) (ii) System behaves deterministically Assumptions are often appropriate….. ….but not always.
Y 1 ! 2Y 1 (1) Y 1 + Y 2 ! 2Y 2 (2) Y 2 ! (3) c 1, c 2, and c 3 are the rate constants for reactions 1, 2, and 3 respectively.
Start at time t 0 and state x 0 Simulate time of first reaction event ‘Hazards’ of the three reactions are: h 1 =c 1 y 1 h 2 =c 2 y 1 y 2 h 3 =c 3 y 2 Total hazard function is h 0 =h 1 +h 2 +h 3 h 0 is constant during inter-event time periods T ~ Exponential(h 0 )
Now select which of the 3 reactions has occurred. Probability of choosing reaction i is given by: h i / h 0
Update system time: t := t 0 + T Update state vector: x := x 0 + r i.A
Continue moving forwards in time Gillespie’s Stochastic Simulation Algorithm (the SSA) A single run generates a time vector (tvec) and a state matrix (xmat) Algorithm must be run many times Calculate statistical quantities E.g. mean and standard deviation of Y 1 species over the first 50 seconds
SSA is an exact algorithm Each reaction event simulated individually Algorithm must run thousands of times Very time consuming for large, complex systems How to speed things up?
Time Discretization Divide time interval into discrete chunks of width Δt. Go from t to t+Δt in one “leap” More than one reaction may occur in the interval [t, t+Δt) Simulate which reactions have occurred in this interval.
Time interval [0, t] Let X be the number of times reaction 1 fires in [0, t] Divide interval into N discrete chunks of width δt N large, δt small Prob(Reaction 1 fires once in time δt) h 1.δt Prob(Reaction 1 fires more than once in time δt) is negligible (relative to δt) Then X ~ Bin(N, h 1.δt) = Bin(N, h 1.(t/N)) E(X) = h 1.t As N , δt 0, X~Poisson(h 1.t)
“Tau-Leap” Algorithm Start at time t 0, state x 0 Select discrete step size, Move forwards to time t 0 + Simulate the number of firings of reaction i during the interval as a Poisson(h i. ) (i=1,…n) Update the system state according to the reactions which have occurred. Now advance to t , t , ……
Tau-Leap is faster than the SSA But some accuracy is sacrificed Relies on a sensible choice of
Hybrid Algorithms Divide the system species and reactions into discrete and continuous regimes The discrete regime contains species with low concentrations and reactions which fire infrequently The continuous regime contains species with high concentrations and reactions which fire frequently
Use different methods for simulating the two regimes. Simulate the continuous regime using deterministic or stochastic differential equations Simulate the discrete regime using the SSA The two regimes must be syncronized Some species will be involved in both regimes
Measuring Accuracy The accuracy of an approximate algorithm (leaping or hybrid) can be assessed by comparing results with SSA Run SSA and approximate algorithm 1,000 times for the same system, using the same rate constants and initial concentrations Do the results from the approximate algorithm closely resemble the results from the SSA?
The Test Suite for Stochastic Simulators Available at Contains stochastic simulation results for some simple biological systems Results generated using analytic expressions for species means and standard deviations. Can be used to measure the correctness of a stochastic simulator
Testing stochastic simulators is not straightforward. Even if a simulator is working perfectly, it will not produce exactly the results contained in the test suite. Need statistical method to compare the simulator results with the test suite results. E.g. if the concentration of species X after 10 seconds varies by 1% from the test suite value, is this evidence of a problem with the simulator?