Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham.

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham

Professors: How they spend their time

1. High-resolution genetic data 2. Model assessment

Gardy 2011 NEJM

“High-resolution genetic data”: what are they?  individual-level data on the pathogen  can be taken at single or multiple time points  high-dimensional e.g. whole genome sequences  proportion of individuals sampled could be high/low  becoming far more common due to cost reduction

“High-resolution genetic data”: what use are they?  better inference about transmission paths  more reliable estimates of epi quantities?  understand evolution of the pathogen

. A C C C T T G G G A A A.....

Modelling and Data Analysis methods Two kinds of approaches exist: 1. Separate genetic and epidemic components (e.g. Volz, Rasmussen) 2. Combine genetic and epidemic components (e.g. Ypma, Worby, Morelli)

1. Separate genetic and epidemic components e.g: - estimate phylogenetic tree - given the tree, fit epidemic model or - cluster individuals into genetically similar groups - given the groups, fit multi-type epidemic model

1. Separate genetic and epidemic components + “Simple” approach + Avoids complex modelling - Ignores any relationship between transmission and genetic information

2. Combine genetic and epidemic components e.g: - model genetic evolution explicitly - define model featuring both genetic and epidemic parts

2. Combine genetic and epidemic components + “Integrated” approach - Is modelling too detailed? - Initial conditions: typical sequence? +/- Model differences between individuals instead?

1. High-resolution genetic data 2. Model assessment

“Model assessment”: what is it?  Does our model fit the data?  Is there a better model?

“Model assessment”: why do it?  Poor fit sheds doubt on conclusions from modelling  Model choice can be a tool for directly addressing questions of interest

Linear regression: y k = ax k + b + e k, e k ~ N(0,v) Minimise distance of model mean from observed data

For outbreak data:  What are the right residuals?  Should observed or unobserved data be compared to the model? (Streftaris and Gibson)  Mean model may only be available via simulation  Is the mean the right quantity to consider?

Simulation-based approaches to model fit:  Forward simulation – “close” to data?  Choice of summary statistics?  Close ties to ABC methods (McKinley, Neal)

Approaches to model choice  Hypermodels/saturated models  Bayesian non-parametric methods  Bayesian methods e.g. RJMCMC  Mixture models

 Hypermodels/saturated models e.g. Infection rates βS or βSI or βSI 0.5 in an SIR model? Instead use βSI  and estimate  (O’Neill and Wen)

 Bayesian non-parametric methods e.g. Infection rate β(t)SI or β(t) in an SIR model; Estimate β(t) in a Bayesian non-parametric manner using Gaussian process machinery (Kypraios, O’Neill and Xu; Knock and Kypraios)

 Reversible Jump MCMC e.g. Distinct models (usually small number), estimate Bayes factors by running MCMC on union of parameter spaces (O’Neill; Neal and Roberts; Knock and O’Neill)

 Mixture models e.g. Given two models (f, g), create mixture model f(x) =  g(x) + (1-  ) h(x); estimation of  enables estimation of Bayes Factors (Kypraios and O’Neill)

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham.

Similar presentations

Presentation on theme: "Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham.

Similar presentations

Presentation on theme: "Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham."— Presentation transcript:

Similar presentations

About project

Feedback