Tackling over- dispersion in NHS performance indicators Robert Irons (Analyst – Statistician) Dr David Cromwell (Team Leader) 20/10/2004.

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

The t Test for Two Independent Samples
Multistage Sampling.
1 Economic Freedom of the World: 2002 Annual Report n Presentation to n Fraser Institute Press Conference n Calgary n June 24, 2002 n Fred McMahon and.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Chapter 1 The Study of Body Function Image PowerPoint
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
STATISTICS HYPOTHESES TEST (II) One-sample tests on the mean and variance Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
Detection of Hydrological Changes – Nonparametric Approaches
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
1 Superior Safety in Noninferiority Trials David R. Bristol To appear in Biometrical Journal, 2005.
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
Assumptions underlying regression analysis
Chapter 7 Sampling and Sampling Distributions
Box and Whiskers with Outliers. Outlier…… An extremely high or an extremely low value in the data set when compared with the rest of the values. The IQR.
Simple Linear Regression 1. review of least squares procedure 2
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Department of Engineering Management, Information and Systems
(This presentation may be used for instructional purposes)
EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.
Hash Tables.
Chi-Square and Analysis of Variance (ANOVA)
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Primary Care – Changing Future 1 PRIMIS 23 rd April 2002 Metropole Birmingham.
VOORBLAD.
Hypothesis Tests: Two Independent Samples
Chapter 4 Inference About Process Quality
Quantitative Analysis (Statistics Week 8)
Correlation and Regression
© 2012 National Heart Foundation of Australia. Slide 2.
Forecasting using Discrete Event Simulation for the NZ Prison Population Dr Jason (Qingsheng) Wang Mr Ross Edney Ministry of Justice.
Lecture plan Outline of DB design process Entity-relationship model
Statistical Analysis SC504/HS927 Spring Term 2008
Comparing Two Population Parameters
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Module 17: Two-Sample t-tests, with equal variances for the two populations This module describes one of the most utilized statistical tests, the.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Putting Statistics to Work
Januar MDMDFSSMDMDFSSS
Determining How Costs Behave
Statistical Inferences Based on Two Samples
© The McGraw-Hill Companies, Inc., Chapter 12 Chi-Square.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
Statistically-Based Quality Improvement
Chapter 11: The t Test for Two Related Samples
Experimental Design and Analysis of Variance
1 Chapter 20: Statistical Tests for Ordinal Data.
Testing Hypotheses About Proportions
Simple Linear Regression Analysis
Copyright © 2010 Pearson Addison-Wesley. All rights reserved. Chapter 13 One-Factor Experiments: General.
Multiple Regression and Model Building
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Section 7-2 Estimating a Population Proportion Created by Erin.
UK Renal Registry 17th Annual Report Figure 5.1. Trend in one year after 90 day incident patient survival by first modality, 2003–2012 cohorts (adjusted.
Commonly Used Distributions
October 2004 Screening and Surveillance of routine data Adrian Cook.
Presentation transcript:

Tackling over- dispersion in NHS performance indicators Robert Irons (Analyst – Statistician) Dr David Cromwell (Team Leader) 20/10/2004

2 Outline of presentation NHS Star Ratings Model Criticism of some of the indicators The reason – overdispersion Options for tackling the problem Our solution – an additive random effects model Effects on the ratings indicators

3 Performance Assessment in the UK 1990s: Government focused on efficiency 1997: Labour replaces Conservative government Late 90s: Labour focus on quality & efficiency –Define Performance Assessment Framework –Publish NHS Plan in 2000 –Commission for Health Improvement (CHI) created –Performance ratings first published in 2001, responsibility passed to CHI for 2003 publication –Healthcare Commission replaces CHI on April 2004, has broader inspection role

4 NHS Performance Ratings An at a glance assessment of NHS trusts performance –Performance rated as 0, 1, 2, or 3 stars –Yearly publication Focus on how trusts deliver government priorities –Linked to implementation of key policies Priorities and Planning framework National Service Frameworks Have limited role in direct quality improvement –Modernisation agency helps trusts with low rating

5 Scope of NHS ratings Acute trusts Ambulance trusts Mental health trusts Primary care trusts

6 The ratings model Overall rating derived from many different indicators –and affected by Clinical Governance Reviews Two types of indicators, organised in 4 groups –Key targets & Balanced Scorecard indicators –BS indicators grouped into 3 focus areas Patient focus, clinical focus, capacity & capability

7 Combining the indicators Indicators are measured on different scales –Categorical (eg. Yes/No) –Proportional (eg. proportion of patients waiting longer than 15 months) –Rates (eg. mortality rate within 30 days following selected surgical procedures) Further complication –Performance on some indicators is measured against published targets – define thresholds –Performance on other indicators is based on relative differences between trusts

8 Combining the indicators Indicators first transformed so they are all on an equivalent scale Key targets assigned to three levels: –achieved –under-achieved –significantly under-achieved Balanced scorecard indicators –1 – significantly below average (worst performance) –2 – below average –3 – average –4 – above average –5 – significantly above average (best performance)

9 Transforming the indicators Key target indicators transformed using thresholds defined by government policy Balanced scorecard indicators transformed via several methods –Percentile method –Statistical method –Absolute method, if policy target exists –Mapping method (for indicators with ordinal scales) Trust type Acute trustsAmbulance trusts Mental health trusts Primary care trusts Percentile1139 Statistical Absolute8354 Defined mapping4587

10 Transforming the indicators - the statistical method Trust type IndicatorsAcute trustsAmbulance trusts Mental health trusts Primary care trusts Clinical indicators42 Patient survey5545 Staff survey3333 Change in rate indicators 3

11 The old statistical method Based on simple confidence intervals 95% and 99% confidence intervals calculated for a trusts indicator value Trust confidence interval compared with the overall national rate (effectively a single point) Significantly below average 1 no 99% confidence interval overlap: higher values Below average 2 no 95% confidence interval overlap: higher values Average 3 overlapping 95% confidence intervals, eg England: 5.51% to 5.55% Above average 4 no 95% confidence interval overlap: lower values Significantly above average 5 no 99% confidence interval overlap: lower values

12 The old statistical method - problematic Not a proper statistical hypothesis test Differentiating between trusts based on differences that exceed levels of sampling variation On some indicators, this led to the assignment of too many NHS trust to the significantly good/ bad bands on some indicators

13 Working example - standardised readmission rate of patients within 28 days of initial discharge Significantly below average Below average AverageAbove average Significantly above average Total

14 Readmissions within 28 days of discharge - funnel plot (2003/04 data)

15 Mortality within 30 days of selected surgical procedures - funnel plot (2003/04 data)

16 Z scores Standardised residual Z scores are used to summarise extremeness of the indicators Funnel plot limits approximate to the naïve Z score Naïve Z score given by –Z i = (y i –t)/s i –Where y i is the indicator value, and s i is the local standard error

17 Dealing with over-dispersion Three options were considered –Use of an interval null hypothesis –Allow for over-dispersion using a multiplicative variance model –…or a random-effects additive variance model

18 Interval null hypothesis Similar to the naïve Z score or standard funnel limits Uses a judgement of what constitutes a normal range for the indicator Define normal range (eg percentiles, national rate ± x%) Funnel limits then defined as: –Upper/ lower limit = Range limit ± (x * s i 0 ) Reduces number of significant results But might be considered somewhat arbitrary Interval could be defined based on previous years data, or prior knowledge Makes minimal use of the sampling error

19 Interval null hypothesis -a funnel plot

20 Multiplicative variance model Inflates the variance associated with each observation by an over-dispersion factor ( ): – Z i 2 = Pearson X 2 – = X 2 / I Limits on funnel plot are then expanded by Do not want to be influenced by the outliers we are trying to identify Data are first winsorised (shrinks the extreme z- values in) Over dispersion factor could be provisionally defined based on previous years data Statistically respectable, based on a quasi- likelihood approach

21 Multiplicative over-dispersion -a funnel plot (not winsorised, = 21.45)

22 Multiplicative over-dispersion -a funnel plot (10% winsorised, = 13.97)

23 Winsorising Winsorising consists of shrinking in the extreme Z- scores to some selected percentile, using the following method. 1.Rank cases according to their naive Z-scores. 2.Identify Zq and Z1-q, the (100*q)% most extreme top and bottom naive Z-scores, where q might, for example, be Set the lowest (100*q)% of Z-scores to Zq, and the highest (100*q)% of Z-scores to Z1-q. These are the Winsorised statistics. This retains the same number of Z-scores but discounts the influence of outliers.

24 Winsorising Non winsorised 10% winsorised

25 Random effects additive variance model Based on a technique developed for meta-analysis Originally designed for combining the results of disparate studies into the same effect In meta-analysis terms, consider the indicator value of each trust to be a separate study Essentially seeks to compare each trust to a null distribution instead of a point Assumes that E[y i ] = i, and V[ i ] = Uses a method-of-moments method to estimate (Dersimonian and Laird, 1986) Based on winsorised estimate of

26 Random effects additive variance model If ( I ) < ( I – 1) then –the data are not over-dispersed, and = 0 –use standard funnel limits/ naïve Z scores Otherwise: Where w i = 1 / s i 2 The new random-effects Z score is then calculated as:

27 Comparing to a null distribution

28 Additive over-dispersion -a funnel plot (20% winsorised)

29 Effects on the banding of trusts - Readmissions 2002/03 data Significantl y below average Below average AverageAbove average Significantly above average Previous banding method Random-effects (20% winzorised)

30 Why we chose the additive variance method Generally avoids situations where two trusts which have the same value for the indicator get put in different bands because of precision A multiplicative model would increase the variance at some trusts more than at others – e.g. a small trust with large variance would be affected much more than a large trust with small variance By contrast, an additive model increases the variance at all trusts by the same amount Better conceptual fit with our understanding of the problem, that the factors inflating variance affect all trusts equally, so an additive model is preferable

31 References: DJ Spiegelhalter (2004) Funnel plots for comparing institutional performance. Statistics in Medicine, 24, (to appear) DJ Spiegelhalter (2004) Handling over-dispersion of performance indicators (submitted) R DerSimonian & N Laird (1986) Meta-analysis in clinical trials. Controlled Clinical Trials, 7: Acknowledgements: David Spiegelhalter Adrian Cook Theo Georghiou Thank you