Informing disease control strategies using stochastic models S
Gastrointestinal Illness # 2 cause of death in children worldwide Largely preventable Transmission pathways –Contaminated water –Lack of sanitation facilities –Poor hygiene
Question: Suppose you are interested in reducing the burden of G.I. illness in children in a country that currently has a high burden of G.I. disease. Also suppose that resources are limited What intervention(s) should you implement? –Review recent literature
Previous Research Fewtrell & Colford: meta-analyses of RCTs to reduce diarrhea, 2004 (Summary of results from developing countries) Intervention Studies Estimate 95%C.I. Hand Washing (0.33, 0.93) Sanitation (0.53, 0.87) Water Supply (0.73, 1.46) Water Quality (0.53, 0.89) Multiple (0.59, 0.76)
What explains this variability? ? ? ? Effect estimates vary substantially from 0% reduction up to 85% reduction in G.I. illness
Some Considerations: Randomization and blinding procedures (internal validity) Generalizability (external validity) Selection bias (participants are different than non-participants) Publication bias (positive findings are more likely to be published)
Randomization and Blinding: Randomization: ensures that comparison groups differ only by chance Double blinding: ensures that neither the participant or investigator knows which treatment group the participant is in – prevents investigator/participant bias Makes RCTs the gold standard of Epi studies
Randomization and Blinding: Both are often missing in GI interventions Fewtrell & Colford –52 distinct studies –16/52 (31%) employed randomization –3/52 (3%) blinded participants to their exposure status
Another Consideration….
We Can Use Models Statistical models? T h e d a t a a r e n o t r e a l l y i n d e p e n d e n t. Y I K E S ! W e n e e d a p a d d l e
Why Mathematical Models? Controlled Inexpensive Able to handle complex interactions Generate and test hypotheses Provide explicit description of the system under study (as opposed to statistical models)
A note on models: A primary goal of using models is not necessarily to come up with accurate predictions about the process under study, but is to observe and describe fundamental principles and relationships that emerge Rule of thumb: don’t make it more complex than it needs to be!
Some model assumptions Population is static: no one enters, no one leaves or dies Individuals can only become infected once People are equally likely to contact one another – there are no “networks” or cliques
Village Model S 2500 households 4 people per household All are initially susceptible
- People can become infected by being exposed to an infected person in their household - h governs this route of infection I S S S
- People can become infected by being exposed to an infected person from another household in their community - c governs this route of infection S I S S S S S S
-People can become infected by being exposed to pathogens from the environment (does not include drinking water) - e governs this route of infection S S S S Pathogens from the Environment
-People can become infected by consuming pathogens in drinking water - dw governs this route of infection S S S S Pathogens from drinking water
- Infected individuals shed pathogens into the drinking water supply at a constant rate - governs the rate of shedding - The total number of pathogens shed is directly dependent on the total number of infected individuals, I total S I S I
-After time, infected people recover and are no longer susceptible to infection - governs the time to recovery S I S S S R S S Household at time of Infection, T 0 Household at time of Recovery, T 1
- Pathogens in drinking water and the environment die off at a constant rate. - governs pathogen die off Time = T 1 Time = T 2
Summary of routes of infection for any given individual Environment (not drinking water) ee Household member hh Pathogens in source of drinking water dw I total * People from other households cc Die off
Simulation steps Determine hazards for each event Determine time to next event Determine type of event Determine which household is affected Update I, S, and R values for each household
Model Events Five events can occur: –Infection via drinking water –Infection via a household member –Infection via another member of the village –Infection via the environment –Recovery The number of Susceptibles (S), Infecteds (I), and Recovereds (R) for each household is updated after each event
Hazards Hazards for each event are calculated based on transmission parameters, c, h, e, dw, , , And based on I and S Hazards are calculated for each household, thus, each household has 5 hazards associated with it
Hazard Formulas Hazard for infection via drinking water: dw = dw * S hh Hazard for infection via household contact: h = h * S hh * I hh Hazard for infection via community contact: c = c * S hh * I total
Drinking Water Hazard
Hazard Formulas Hazard for infection via the environment: e = e * S hh Hazard for recovery (moving from I to R): r = * I hh
Time to Infection and Hazard Recall that for an exponential distribution with mean = 1/ : P (T < t) = 1 – e - t This is a probability, it must be between 0 and 1
Time of Next Event –We know that 1-e - t is between 0 and 1 –Solving for t: t = -log (1-p)/ –Thus, given a uniform random number and substituting it in for 1-p, we can randomly generate an event time that will come from an exponential distribution with mean = 1/
Example Suppose = 2 We generate 1000 random numbers between 0 and 1 and plug into the formula: t = -log (1-p)/2 The resulting distribution of t looks like the following:
Mean(t) = 1/ = 1/2
Time of Next Event In our case the total hazard is the sum of all ’s, thus: t = i Time to next event is then: T next = - log (U(0,1)) / t The average time will be 1/ t
Which event occurred? Where? We still need to determine which event will happen at T next and where it will happen ( which household) Random numbers are employed to make these decisions
dw1 r1 e1 h1 c1 Total for Household #1 Total for Household #2 dw2 r2 e2 h2 c2 dw3 Remember that the total hazard is divided up into may smaller hazards for each event in each household
dw1 r1 e1 h1 c1 Total for Household #1 Total for Household #2 dw2 r2 e2 h2 c2 dw3 Random selection Randomly selecting a number between 0 and t determines which event happens and in which household In the example below, an individual in household 2 is infected from someone else in the community
Bookkeeping After an event is determined, the appropriate household is updated so it has the correct number of I’s, S’s, and R’s The process is then repeated: –New hazards are calculated –T next is determined –An event is selected –A household population is updated
When does it end? After each time step the total time, T, is updated: T = T + T next When T becomes greater than a predetermined value, the simulation stops.
So What? Example questions this model can help to answer: –How do sanitation, hygiene, and water quality impact disease transmission? –Under what conditions is diarrheal disease endemic vs epidemic? –What amount of disease is attributable to drinking water?
How much disease is attributable to contaminated water? Extremely difficult to answer this question with observational data or randomized controlled trials (experiments) Bias due to confounding (observational data) Bias due to lack of blinding and randomization procedures (RCT/experiments)
The ‘Perfect Study’ We’d like to have two observations from everyone –Disease status with clean water –Disease status without clean water In reality, we can only observe one outcome Counterfactual outcomes are the hypothetical outcomes we don’t observe
An Example(?) Can’t observe both – but with models, WE CAN
Example: Water Quality Intervention We run the model two times –First run: don’t allow people to become infected via drinking water. This is like “filtering” their water so that no exposure via drinking water is possible –Second run: normal run. We allow exposure via drinking water –In both runs we keep h and c relatively small
Example Total cases with active filter = 412 Total cases with placebo filter = 1651 Total population = Under these conditions (low h, c ), (1651 – 412) / 1651 = percent of cases could have been prevented if all drinking water would have been filtered
Example Suppose we repeat this in a population where h and c are higher Person-to-person transmission will be more of a factor
Example Total cases with active filter = 5619 Total cases with placebo filter = 7866 Total population = Under these conditions (high h, c ), (7866 – 5619) / 7866 = percent of cases could have been prevented if all drinking water would have been filtered
Impact This example illustrates that the success of water quality interventions is dependent on the level of person-to-person (PTP) transmission occurring in a population –Water quality interventions will likely have more impact when PTP levels are low –They may not be the best choice in areas where PTP is high
But wait – we can do better Investigate many different parameter sets Use a ‘realistic’ population (household size not constant) What generalizations can be made? Can community transmission and household level transmission explain null effects in water quality interventions?
Transmission Pathways
Population Population mirrored that of villages in coastal Ecuador Median household size = 4 –Min: 1 Max: 19 Number of households = 2498 Total population = 11260
Distribution of Household Size
Parameter Values
Simulations Simulations were run for all combinations of parameter sets 5 x 8 x 8 x 2 = 640 parameter sets 10 simulations were run for each set Total simulations = 640 x 10 = 6400
Poor Hygiene Poor Sanitation % disease attributable to water No shedding of pathogens (contamination) into the water ( = 0)
Poor Hygiene Poor Sanitation % disease attributable to water Some contamination ( = 0.5)
Poor Hygiene Poor Sanitation % disease attributable to water Moderate contamination ( = 1.0)
Poor Hygiene Poor Sanitation % disease attributable to water High contamination ( = 1.5)
Poor Hygiene Poor Sanitation % disease attributable to water Very high contamination ( = 2.0)
Conclusions Person-to-person transmission may explain variability in water quality interventions When both HH and Community transmission levels are high, water interventions may not be the best choice When either is high, water interventions have the potential to significantly reduce disease
Conclusions Rate of pathogen shedding influences effectiveness of intervention Public health efforts to reduce enteric diseases should focus on critical transmission pathways If more than one exists, multiple interventions may be necessary
Acknowledgments Co-authors –Joseph Eisenberg –Travis Porco Contributors –Bryan Lewis
Drinking Water Hazard
Survival function
Drinking Water events In step 3, a uniform random number, U, is selected and compared to the value 1 - F(t) where t is equal to the time of the next non-drinking water event. The value F(t) represents the probability that a drinking water event occurs by time t. Thus if U < F(t), a drinking water event occurs, otherwise, one of the four non-drinking water events occur at time t. If a drinking water event occurs, the time of the event is calculated by substituting U in for 1 - F(t) and using a Matlab interpolating algorithm to solve for t.
Motivation VanDerslice and Briscoe showed that: –The effect of drinking water quality on diarrheal disease varies based on household and community sanitation levels: Water quality interventions have the most impact in places with good sanitation They have less impact in places with poor sanitation Can this be recreated with a disease transmission model?