Relating models to data: A review P.D. O’Neill University of Nottingham.

Slides:



Advertisements
Similar presentations
Modelling Healthcare Associated Infections: A case study in MRSA.
Advertisements

Mathematical Modelling of Healthcare Associated Infections Theo Kypraios Division of Statistics, School of Mathematical Sciences
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Dynamic Bayesian Networks (DBNs)
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Attaining Realistic and Substantial Reductions in HIV Incidence: Model Projections of Combining Microbicide and Male Circumcision interventions in Rural.
Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham.
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
MCMC for Stochastic Epidemic Models Philip D. O’Neill School of Mathematical Sciences University of Nottingham.
CS8803-NS Network Science Fall 2013 Instructor: Constantine Dovrolis
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
The construction and analysis of epidemic trees with reference to the 2001 UK FMDV outbreak Dan Haydon, Dept Zoology, University of Guelph, On. Ca.
Modeling the process of contact between subgroups in spatial epidemics Lisa Sattenspiel University of Missouri-Columbia.
Insights from economic- epidemiology Ramanan Laxminarayan Resources for the Future, Washington DC.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Classical and Bayesian analyses of transmission experiments Jantien Backer and Thomas Hagenaars Epidemiology, Crisis management & Diagnostics Central Veterinary.
1 The epidemic in a closed population Department of Mathematical Sciences The University of Liverpool U.K. Roger G. Bowers.
Modeling the SARS epidemic in Hong Kong Dr. Liu Hongjie, Prof. Wong Tze Wai Department of Community & Family Medicine The Chinese University of Hong Kong.
Network modeling of the Ebola Outbreak Ahmet Aksoy.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
1) Need for multiple model types – beyond simulations. 2) Approximation models – successes & failures. 3) Looking to the future.
Robin McDougall, Ed Waller and Scott Nokleby Faculties of Engineering & Applied Science and Energy Systems & Nuclear Science 1.
How does mass immunisation affect disease incidence? Niels G Becker (with help from Peter Caley ) National Centre for Epidemiology and Population Health.
Neil Ferguson Dept. of Infectious Disease Epidemiology Faculty of Medicine Imperial College WG 7: Strategies to Contain Outbreaks and Prevent Spread ©
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Are global epidemics predictable ? V. Colizza School of Informatics, Indiana University, USA M. Barthélemy School of Informatics, Indiana University, USA.
A Beginner’s Guide to Bayesian Modelling Peter England, PhD EMB GIRO 2002.
Issues with Mixed Models. Model doesn’t converge… OR.
SIR Epidemic Models CS 390/590 Fall 2009
V5 Epidemics on networks
Modelling infectious diseases Jean-François Boivin 25 October
Why Canadian fur trappers should stay in bed when they have the flu: modeling the geographic spread of infectious diseases Lisa Sattenspiel Department.
Measles Vaccination in Epidemic Contexts RF Grais, ACK Conlan, MJ Ferrari, C Dubray, A Djibo, F Fermon, M-E Burny, KP Alberti, I Jeanne, BS Hersh, PJ Guerin,
EpiFast: A Fast Algorithm for Large Scale Realistic Epidemic Simulations on Distributed Memory Systems Keith R. Bisset, Jiangzhuo Chen, Xizhou Feng, V.S.
This presentation is made available through a Creative Commons Attribution- Noncommercial license. Details of the license and permitted uses are available.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
R E 1 Uncovering the contact networks behind emerging epidemics of respiratory spread agents Jacco Wallinga.
Bayesian evidence for visualizing model selection uncertainty Gordon L. Kindlmann
1 Immunisation Strategies for a Community of Households Niels G Becker ( with help from David Philp ) National Centre for Epidemiology and Population Health.
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
Dynamic Random Graph Modelling and Applications in the UK 2001 Foot-and-Mouth Epidemic Christopher G. Small Joint work with Yasaman Hosseinkashi, Shoja.
CS Statistical Machine learning Lecture 24
Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
The Contribution of Early HIV Infection to HIV Spread in Lilongwe, Malawi: Implications for Transmission Prevention Strategies Kimberly Powers, 1 Azra.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Preparedness for an Emerging Infection Niels G Becker National Centre for Epidemiology and Population Health Australian National University This presentation.
Numerical Analysis Yu Jieun.
Mathematical Modeling for understanding and predicting communicable diseases: a tool for evidence-based health policies Antoine Flahault Geneva, May 20,
1 Immunisation with a Partially Effective Vaccine Niels G Becker National Centre for Epidemiology and Population Health Australian National University.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Mean Field Methods for Computer and Communication Systems Jean-Yves Le Boudec EPFL Network Science Workshop Hong Kong July
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Public Policy and Managing Bioterrorism
STA 216 Generalized Linear Models
Middle East respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility  Simon Cauchemez,
Transmission Decay Taken from Statistical Inference in a Stochastic Epidemic SEIR Model with Control Intervention: Ebola as a Case Study, by Phenyo E.
STA 216 Generalized Linear Models
Effective Social Network Quarantine with Minimal Isolation Costs
Estimating Networks With Jumps
Epidemiological Modeling to Guide Efficacy Study Design Evaluating Vaccines to Prevent Emerging Diseases An Vandebosch, PhD Joint Statistical meetings,
The construction and analysis of epidemic trees with reference to the 2001 UK FMDV outbreak Dan Haydon, Dept Zoology, University of Guelph, On. Ca.
Susceptible, Infected, Recovered: the SIR Model of an Epidemic
Middle East respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility  Simon Cauchemez,
Epidemiological parameters from transmission experiments: new methods for old data Simon Gubbins, David Schley & Ben Hu Transmission Biology Group The.
Presentation transcript:

Relating models to data: A review P.D. O’Neill University of Nottingham

Caveats   Scope is strictly limited   Review with a view to future challenges

Outline 1. Why relate models to data? 2. How to relate models to data 3. Present and future challenges

Outline 1. Why relate models to data? 2. How to relate models to data 3. Present and future challenges

1. Why relate models to data? 1. Scientific hypothesis testing e.g. Can within-host heterogeneity of susceptibility to HIV explain decreasing prevalence? e.g. Did control measures alone control SARS in Hong Kong?

1. Why relate models to data? 2. Estimation e.g. What is R 0 ? e.g. What is the efficacy of a vaccine?

1. Why relate models to data? 3. What-if scenarios e.g. What would have happened if transport restrictions were in place sooner in the UK foot and mouth outbreak? e.g. How much would school closure prevent spread of influenza?

1. Why relate models to data? 4. Real-time analyses e.g. Has the epidemic finished yet? e.g. Are control measures effective?

1. Why relate models to data? 5. Calibration/parameterisation e.g. What range of parameter values are sensible for simulation studies?

Outline 1. Why relate models to data? 2. How to relate models to data 3. Present and future challenges

2. How to relate models to data 2.1 Fitting deterministic models Options include (i) “Estimation from the literature” (ii) Least-squares / minimise metric (iii) Can be Bayesian (Elderd, Dukic and Dwyer 2006)

2. How to relate models to data 2.2 Fitting stochastic models Available methods depend heavily on the model and the data.

2. How to relate models to data 2.2 Fitting stochastic models (i) Explicit likelihood e.g. Longini-Koopman model for household data (Longini and Koopman, 1982)

2. How to relate models to data P (Avoid infection from outside) = q P (Avoid infection from housemate) = p Given data on final outcome in (independent) households, can formulate likelihood L (p,q) SEIR model within household

2. How to relate models to data 2.2 Fitting stochastic models (i) Explicit likelihood (continued) Related household models examples: Bayesian analysis (O’Neill at al., 2000) Multi-type models (van Boven et al., 2007)

2. How to relate models to data 2.2 Fitting stochastic models (i) Explicit likelihood (continued) Methods include Max likelihood (e.g. Longini and Koopman, 1982) EM algorithm (e.g. Becker, 1997) MCMC (e.g. O’Neill et al., 2000) Rejection sampling (e.g. Clancy and O’Neill, 2007)

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Can arise due to model complexity and/or insufficient data

2. How to relate models to data Ever-infected Never-infected SampleUnseen Two-level mixing model

2. How to relate models to data Individual-based transmission models involve unseen infection times

2. How to relate models to data Even detailed data from studies generally only give bounds on unseen infection times – e.g. infection occurs between last –ve test and first +ve test

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Use a simpler approximating model e.g. use pseudolikelihood, e.g. Ball, Mollison and Scalia-Tomba, 1997

2. How to relate models to data Ever-infected Never-infected Two-level mixing model Explicit interactions between households

2. How to relate models to data Ever-infected Never-infected Two-level mixing model -> independent households model In a large population, households are approximately independent

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Use a simpler approximating model e.g. discrete-time model instead of a continuous time model (e.g. Lekone and Finkenstädt, 2006)

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Direct approach – e.g. Martingale methods (Becker, 1989)

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood Solutions include: Data augmentation: add in “missing data” or extra model parameters to formulate a likelihood

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood: Data augmentation (continued) Common example - model describes individual-to-individual transmission - observe times of case ascertainment, test results, etc, but not times of infection/exposure - augment data with missing infection/exposure times

2. How to relate models to data Exposure time Infectivity starts Not observed Observed data Infectivity ends = -ve test TITI TETE Höhle et al. (2005) = +ve test

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood: Data augmentation (continued) Data-augmentation methods include MCMC (e.g. Gibson and Renshaw, 1998; O’Neill and Roberts, 1999; Auranen et al., 2000) EM algorithm (e.g. Becker, 1997)

2. How to relate models to data 2.2 Fitting stochastic models (ii) No explicit likelihood: Data augmentation (continued) Data-augmentation methods can also be used in less “obvious” settings e.g. final size data for complex models

2. How to relate models to data Ever-infected Never-infected Two-level mixing model Augment parameter space using links to describe potential infections  Data Demiris and O’Neill, 2005

Outline 1. Why relate models to data? 2. How to relate models to data 3. Present and future challenges

3. Present & future challenges 3.1 Large populations/complex models Current methods often struggle with large-scale problems. e.g: Large population, Many missing data, Many hard-to-estimate parameters/covariates

3. Present & future challenges 3.1 Large populations/complex models e.g. UK foot & Mouth outbreak 2001 Keeling et al. (2001) stochastic discrete-time model, parameterised via likelihood estimation and tuning/ simulation. Attempting to fit this kind of model using “standard” Bayesian/MCMC methods does not work well.

3. Present & future challenges Large data set and many missing data can cause problems for standard (and also non-standard) MCMC

3. Present & future challenges 3.1 Large populations/complex models e.g. Measles data Cauchemez and Ferguson (2008) discuss the problems that arise when fitting a standard SIR model to large-scale temporal aggregated data in a large population using standard methods.

3. Present & future challenges 3.1 Large populations/complex models Problems of this kind are usually tackled via approximations (e.g. of the model itself). Challenge: Can generic non-approximate methods be found?

3. Present & future challenges 3.2 Data augmentation Comment: this technique is surprisingly powerful and is (probably) under- developed.

3. Present & future challenges 3.2 Data augmentation e.g. Cauchemez and Ferguson (2008) use a novel MCMC data-augmentation scheme using a diffusion model to approximate an SIR epidemic model.

3. Present & future challenges 3.2 Data augmentation e.g. For final size data, instead of imputing a graph describing infection pathways, could instead impute generations of infection (joint work with Simon White). This can lead to much faster MCMC algorithms.

3. Present & future challenges Ever-infected Never-infected Two-level mixing model Imputing edges in graph

3. Present & future challenges Ever-infected Never-infected Two-level mixing model Infection chain = {1, 3, 1, 2, 1}

3. Present & future challenges 3.2 Data augmentation e.g. Augmented data can also (sometimes) be used to bound quantities of interest. Clancy and O’Neill (2008) show how to obtain stochastic bounds on R 0 and other quantities by considering “minimal” and “maximal” configurations of unobserved infection times in an SIR model.

3. Present & future challenges 3.2 Data augmentation Observed removal times Imputed infection times xxxxx x

3. Present & future challenges 3.2 Data augmentation Observed removal times Imputed infection times x x x x x x Soon as possible

3. Present & future challenges 3.2 Data augmentation Observed removal times Imputed infection times xxxxx x Late as possible Can show that “Soon as possible” maximises R 0 but that minimal value is not necessarily given by “Late as possible” – use Linear Programming to find actual solution. General idea also applicable to final outcome data

3. Present & future challenges 3.3 Model fit and model choice Various methods are used in the literature to assess model fit, e.g. Simulation-based methods; use of Bayesian predictive distribution; standard methods where applicable; Bayesian p-values

3. Present & future challenges 3.3 Model fit and model choice Likewise for model choice methods include AIC, RJMCMC Challenge Better understanding of pros and cons of such methods

References B. D. Elderd, V. M. Dukic, and G. Dwyer (2006) Uncertainty in predictions of disease spread and public health responses to bioterrorism and emerging diseases. PNAS 103, I.M. Longini, Jr and J.S. Koopman (1982) Household and community transmission parameters from final distributions of infections in households. Biometrics 38, P.D. O'Neill, D. J. Balding, N. G. Becker, M. Eerola and D. Mollison (2000) Analyses of infectious disease data from household outbreaks by Markov Chain Monte Carlo methods. Applied Statistics 49, M. Van Boven, M. Koopmans, M. D. R. van Beest Holle, A. Meijer, D. Klinkenberg, C. A. Donnelly and H.A.P. Heesterbeek (2007) Detecting emerging transmissibility of Avian Influenza virus in human households. PLoS Computational Biology 3, D. Clancy and P.D. O'Neill (2007) Exact Bayesian inference and model selection for stochastic models of epidemics among a community of households. Scandinavian Journal of Statistics 34, N.G. Becker (1997) Uses of the EM algorithm in the analysis of data on HIV/AIDS and other infectious diseases. Statistical Methods in Medical Research 6, F.G. Ball, D. Mollison and G-P. Scalia-Tomba (1997) Epidemic models with two levels of mixing. Annals of Applied Probability 7, M. Höhle, E. Jørgensen. and P.D. O'Neill (2005) Inference in disease transmission experiments by using stochastic epidemic models. Applied Statistics 54,

References… N. G. Becker (1989) Analysis of Infectious Disease Data. Chapman and Hall, London. G. Gibson and E. Renshaw (1998). Estimating parameters in stochastic compartmental models using Markov chain methods. IMA Journal of Mathematics Applied in Medicine and Biology 15, P.D. O’Neill and G.O. Roberts (1999) Bayesian inference for partially observed stochastic epidemics. Journal of the Royal Statistical Society Series A 162, K. Auranen, E. Arjas, T. Leino and A. K. Takala (2000) Transmission of pneumococcal carriage in families: a latent Markov process model for binary longitudinal data. Journal of the American Statistical Association 95, P.E. Lekone and B.F. Finkenstädt (2006) Statistical Inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study. Biometrics 62, M.J. Keeling, M.E.J. Woolhouse, D.J. Shaw, L. Matthews, M. Chase-Topping, D.T. Haydon, S.J. Cornell, J. Kappey, J. Wilesmith, B.T. Grenfell (2001). Dynamics of the 2001 UK Foot and Mouth Epidemic: Stochastic Dispersal in a Heterogeneous Landscape. Science 294, S. Cauchemez and N.M. Ferguson (2008). Likelihood-based estimation of continuous-time epidemic models from time-series data: application to measles transmission in London. Journal of the Royal Society Interface 5, D. Clancy and P.D. O'Neill (2008) Bayesian estimation of the basic reproduction number in stochastic epidemic models. Bayesian Analysis, in press.