Bayesian insights into health-care performance monitoring in the UK David Spiegelhalter MRC Biostatistics Unit, Cambridge Thanks to Nicky Best, Clare Marshall,

Slides:



Advertisements
Similar presentations
Significance Tests Hypothesis - Statement Regarding a Characteristic of a Variable or set of variables. Corresponds to population(s) –Majority of registered.
Advertisements

Our goal is to assess the evidence provided by the data in favor of some claim about the population. Section 6.2Tests of Significance.
INFERENCE: SIGNIFICANCE TESTS ABOUT HYPOTHESES Chapter 9.
Departments of Medicine and Biostatistics
Significance Tests About
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Modelling Partially & Completely Missing Preference-Based Outcome Measures (PBOMs) Keith Abrams Department of Health Sciences, University of Leicester,
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Significance Testing Chapter 13 Victor Katch Kinesiology.
Opportunities for Bayesian analysis in evaluation of health-care interventions David Spiegelhalter MRC Biostatistics Unit Cambridge
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Stat 112 – Notes 3 Homework 1 is due at the beginning of class next Thursday.
October 2004 Screening and Surveillance of routine data Adrian Cook.
Data Analysis Statistics. Inferential statistics.
By Dr. Ahmed Mostafa Assist. Prof. of anesthesia & I.C.U. Evidence-based medicine.
Inference about Population Parameters: Hypothesis Testing
1 1 Slide | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UCL CL LCL Chapter 13 Statistical Methods for Quality Control n Statistical.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
7.1 Lecture 10/29.
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Inference for regression - Simple linear regression
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Measuring the Quality of Hospital Care Min Hua Jen Imperial College London.
Lesson 11 - R Review of Testing a Claim. Objectives Explain the logic of significance testing. List and explain the differences between a null hypothesis.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Chapter 8 Introduction to Hypothesis Testing
Monitoring Bernoulli Processes William H. Woodall Virginia Tech
Outcomes surveillance using routinely collected health data Paul Aylin Professor of Epidemiology and Public Health Dr Foster Unit at Imperial College London.
Section Inference for Experiments Objectives: 1.To understand how randomization differs in surveys and experiments when comparing two populations.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 8 Statistical inference: Significance Tests About Hypotheses
Analysis and presentation of quality indicators
Chapter 20 Testing hypotheses about proportions
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
1 The Use of Control Charts in Health Care Monitoring and Public Health Surveillance William H. Woodall William H. Woodall Department of Statistics Department.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Randomized Trial of Preoperative Chemoradiation Versus Surgery Alone in Patients with Locoregional Esophageal Carcinoma, Ursa et al. Statistical Methods:
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
CHAPTER 9 Testing a Claim
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Understanding Study Design & Statistics Dr Malachy O. Columb FRCA, FFICM University Hospital of South Manchester NWRAG Workshop, Bolton, May 2015.
How confident are we in the estimation of mean/proportion we have calculated?
© Imperial College LondonPage 1 A method for estimating the cost of reducing the false alarm rate in multi- institution performance monitoring using CUSUM.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Statistical inference Statistical inference Its application for health science research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Insert name of presentation on Master Slide HCAI Charts HCAI Information for Action, November 2010 Presenter: Mari Morgan, Wendy Harrison.
The ‘Centre Effect’ and Statistical Process Control Alex Hodsman.
100 years of living science Date Location of Event Monitoring clinical performance Dr Paul Aylin Dr Foster Unit Imperial College
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Critical Appraisal Course for Emergency Medicine Trainees Module 2 Statistics.
Chapter 9 -Hypothesis Testing
Bayesian Semi-Parametric Multiple Shrinkage
Review of Testing a Claim
Prognostic factors for musculoskeletal injury identified through medical screening and training load monitoring in professional football (soccer): a systematic.
Could statistical science have caught Harold Shipman earlier?
Public Health Physician, Lecturer Critical Care Medicine,
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Cumulative sum techniques for assessing surgical results
Presentation transcript:

Bayesian insights into health-care performance monitoring in the UK David Spiegelhalter MRC Biostatistics Unit, Cambridge Thanks to Nicky Best, Clare Marshall, Ken Rice, Paul Aylin, Gordon Murray, Stephen Evans, Vern Farewell, Robin Kinsman, Olivia Grigg, Tom Treasure.

Some background 1997: General Medical Council finds 2 Bristol surgeons guilty of ‘Gross Professional Misconduct’ after high mortality rates 2000: Dr Harold Shipman convicted of murdering 15 of his patients 2001: Bristol Inquiry recommends establishing ‘Office of Information on Health-Care Performance’ as part of CHI 2002: ‘Office’ set up, responsible for performance assessment, ‘star ratings’ etc.

Some statistical issues not to be discussed Selection and use of performance indicators Data quality Risk-adjustment Aggregating multiple indicators Evaluation of programme Performance management vs quality improvement etc

To be discussed (briefly) ‘League tables’ and ranking Role of shrinkage estimation? ‘Extreme’ or ‘divergent’ cases? ‘Null distributions’ Sequential analysis (False discovery rates) (risk-adjusted dynamic models)

Criticisms of ‘league tables’  Spurious ranking – ‘someone’s got to be bottom’ · Encourages comparison when perhaps not justified · 95% intervals arbitrary · No consideration of multiple comparisons · Risk-adjustment is always inadequate · Single-year cross-section – what about change?

Funnel plot No ranking Data displayed No preset threshold for ‘out-of-control’ Visual relationship with volume Emphasises increased variability of smaller centres Links to Shewhart control chart / SPC methods / six-sigma etc

Can make inferences on ranks using MCMC

‘Dr Foster’ - commercial publisher

Can fit hierarchical / ‘random-effects’ models

Care with random effects / shrinkage models Outlying centres, if included, can be very influential in estimating between-centre variability But smaller outlier centres can be shrunk and so remain undetected Should check plausibility of random- effects distribution before shrinkage

Need to be clear about hypotheses A ‘shrunk’ interval tests whether Centre X > average, assuming X is ‘similar’ (exchangeable) with rest But we are interested in whether X is ‘similar’ ‘Divergent’ rather than simply ‘extreme’

‘Null distributions’ Y i ~ p( y i |  i ) H 0 :  i ~ N( ,  2 ) H 1 :  i ~ ‘divergent’ Bayesian perspective: partial exchangeability with unknown group membership SPC perspective: over-dispersed ‘in-control’ distribution Problem: are ,  estimated from data, or specified as ‘acceptable’ target variability?

Bristol model r i ~ Binomial(  i, n i ) logit(  i ) =  i H 0 :  i ~ N( ,  2 ) Uniform priors on ,  Cross-validation: 1.each centre i left out in turn 2.  i rep simulated 3.r i rep |  i rep simulated and compared with r i obs

(a) Leave-one-out Cross-validation

But there may be many ‘divergent cases’: e.g. re-admission rates in acute hospitals

(b) Model H 1 Want ‘robust’ estimates of ,  ; uninfluenced by potential divergent centres Many options: trimming, censoring, mixtures, e.g.  i ~ p 0 N( ,  2 ) + (1 – p 0 ) N( ,  2 ) Estimate parameters (Bayes, ML) Get P-values P i = P(Y>y i obs | H 0 ) Allow for uncertainty about ,  Can put P-values into FDR analysis

Two-sided P-values for re-admission data: null Normal random-effects distribution

One-sided P-values allowing for over-dispersed null distribution;  = 0.26 (0.08 to 0.56)

Informative prior distributions Often limited evidence on ‘null’ sd  Over-dispersion of ‘in-control’ group Represents unexplained heterogeneity, due to inadequate risk-adjustment etc Often on logit or log scale Interpretation important

Interpretation of  : sd of logit(  i )  ‘Range’ of true ORs: ie 97.5% / 2.5% exp(3.92  ) Median ratio of two random ORs exp(1.09  ) Suggests  = 0.2 is reasonable in many contexts,  = 0.5 is quite high

Distributions for  : sd of logit(  i )

Informative prior distributions For Bristol, mixture model with uniform prior for  gives  = 0.26 (0.08 to 0.56); p-Bristol = Half-normal prior with upper 90% point at  = 0.5 gives  = 0.22 (0.09 to 0.44); p-Bristol = Informative prior should be an acceptable input in this context

Shipman Inquiry July 2002: 215 definite victims, 45 probable

(NB: Shipman Inquiry total of definite or probable victims: 189 female > 65, 55 male over 65)

Developed in 1940’svs Most powerful sequential test: H 0 vs H 1 LLR = log [ p(data| H 0 ) / p(data| H 1 ) ] Contribution to log(likelihood ratio) is O log r - ( r -1) E To detect doubling of risk on death: LLR = O log 2 - E Horizontal thresholds set by error rates Sequential probability ratio test (SPRT)

Risk-adjusted CUSUM Same steps as SPRT Never drops below 0 Will always eventually ‘signal’ so need to check performance over limited periods

Aylin, Best, Bottle, Marshall (2003) Retrospective analysis of 1000 GPs, including Shipman Risk-adjusted CUSUM (same steps as SPRT, constrained to be > 0 ) Transformation to approximate Normality Adjustment for over-dispersion (although  estimated including Shipman etc) Evaluation of system over limited time period, using simulated sensitivity and FDR Retrospective identification of ‘divergent’ GPs

Aylin et al (2003), Fig1 CUSUM charts for the 12 GPs signalling at any time between 1993 and Charts designed to detect increase of four standard deviations in standardised excess mortality (K=4) using an alarm threshold of h=3 (h=5 is also shown). Harold Shipman’s CUSUM chart is shown in bold.

Conclusions Wide range of issues Caution in using hierarchical models There is enthusiasm for robust methods with good graphical output Partial exchangeability can translate to ‘over-dispersed in-control distribution’ Tie-in with SPC techniques attractive Bayesian insights useful ‘Bayesian’ / ‘non-Bayesian’ division not so useful

References P Aylin, NB Best, A Bottle, EC Marshall EC (2003) Following Shipman: a pilot system for monitoring mortality rates in primary care. Lancet O Grigg, VT Farewell and DJ Spiegelhalter (2003) A comparison of approaches to sequential monitoring of risk-adjusted health outcome Statistical Methods in Medical Research 12, E C Marshall and D J Spiegelhalter (2003) Approximate cross-validatory predictive checks in disease-mapping model. Statistics in Medicine 22, D J Spiegelhalter, P Aylin, S J W Evans, G D Murray, and N G Best. Commissioned analysis of surgical performance using routine data: lessons from the Bristol Inquiry (with discussion). Journal of the Royal Statistical Society, Series A, 165:191–232, D. J. Spiegelhalter (2002) Funnel plots for institutional comparisons. Quality Safety in Health Care, 11, 390—391. DJ Spiegelhalter (2002) An investigation into the relationship between mortality and volume of cases: an example in paediatric cardiac surgery between 1991 to British Medical Journal 324, DJ Spiegelhalter, R Kinsman, O Grigg and T Treasure. (2003) Risk-adjusted sequential probability ratio tests: applications to Bristol, Shipman, and adult cardiac surgery. International Journal for Quality in Health Care 15:7–13, DJ Spiegelhalter, K Abrams, and JP Myles. Bayesian Approaches to Clinical Trials and Health Care Evaluation. Wiley, Chichester, 2003.