1 Difficulties in analysing non-randomised trials (…and ways forward?) RCTs in the Social Sciences: challenges and prospects. York University, 13-15 Sept.

Slides:



Advertisements
Similar presentations
1 Do the Claims for Spending Billions on Crime Reduction Initiatives Stand up Do the Claims for Spending Billions on Crime Reduction Initiatives Stand.
Advertisements

1 Experimental Design In Social Research - Do we know for example what works against crime? Seminar Centre for Census and Survey Research, University of.
What works and what is worth it; Science versus the rest Paul Marchant 11 July 2007.
Meta-analysis: summarising data for two arm trials and other simple outcome studies Steff Lewis statistician.
EVAL 6970: Meta-Analysis Vote Counting, The Sign Test, Power, Publication Bias, and Outliers Dr. Chris L. S. Coryn Spring 2011.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Effect Size and Meta-Analysis
Department of Industrial Management Engineering 1.Introduction ○Usability evaluation primarily summative ○Informal intuitive evaluations by designers even.
Estimation of Sample Size
Selection of Research Participants: Sampling Procedures
Chapter 19 Confidence Intervals for Proportions.
Confidence Intervals for Proportions
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Correlation and Autocorrelation
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Sample Size Determination
Sample Size Determination Ziad Taib March 7, 2014.
Bootstrapping applied to t-tests
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Standard Error of the Mean
Are the results valid? Was the validity of the included studies appraised?
Introduction to Statistical Inferences
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Determining Sample Size
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
CORRELATION & REGRESSION
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Lecture 14 Sections 7.1 – 7.2 Objectives:
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
PARAMETRIC STATISTICAL INFERENCE
1 Crime Reduction: What Works? How do we know? Are we sure? Evidence-Based Policies and Indicator Systems. 12 July 2006 Paul Marchant Leeds Metropolitan.
How to Analyze Systematic Reviews: practical session Akbar Soltani.MD. Tehran University of Medical Sciences (TUMS) Shariati Hospital
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Evidence Based Medicine Meta-analysis and systematic reviews Ross Lawrenson.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
1 rules of engagement no computer or no power → no lesson no SPSS → no lesson no homework done → no lesson GE 5 Tutorial 5.
Understanding real research 4. Randomised controlled trials.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
Understanding Study Design & Statistics Dr Malachy O. Columb FRCA, FFICM University Hospital of South Manchester NWRAG Workshop, Bolton, May 2015.
Chapter 16 Data Analysis: Testing for Associations.
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Issues concerning the interpretation of statistical significance tests.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 11: Bivariate Relationships: t-test for Comparing the Means of Two Groups.
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Sample Size Determination
What is the Effect of Public Lighting on Public Safety? Paul Marchant 11th European Symposium for the Protection of the Night.
Effects of lack of independence in meta-epidemiology Peter Herbison Preventive and Social Medicine University of Otago.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
1 The effect of regression towards the mean in assessing crime reduction interventions using non-randomised trials Campbell Collaboration Colloquium London.
Statistics 19 Confidence Intervals for Proportions.
April Center for Open Fostering openness, integrity, and reproducibility of scientific research.
Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.
Virtual University of Pakistan
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Sample Size Determination
Selecting the Best Measure for Your Study
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
ECO 173 Chapter 10: Introduction to Estimation Lecture 5a
Presentation transcript:

1 Difficulties in analysing non-randomised trials (…and ways forward?) RCTs in the Social Sciences: challenges and prospects. York University, Sept Paul Marchant Leeds Metropolitan University (Paul Baxter from Department of Statistics, University of Leeds is involved in developing some of this work)

2 The Basic Point My thoughts, My thoughts, If Non_RCTs are used, we need a good understanding of the system being studied and a quantitative model to work out what is lost and what the effect is. If Non_RCTs are used, we need a good understanding of the system being studied and a quantitative model to work out what is lost and what the effect is. The effects being sought may be small so impact of small systematic errors can be important. The effects being sought may be small so impact of small systematic errors can be important. Probably best just to use RCTs, especially when policy implications are costly. Probably best just to use RCTs, especially when policy implications are costly.

3 The problem In crime research there is a 5 point ‘Maryland Scientific Methods Scale’ which orders trial designs (RCT is the top ) In crime research there is a 5 point ‘Maryland Scientific Methods Scale’ which orders trial designs (RCT is the top ) While the ordering may be fine there is no formal indication of what is lost by using a 4 rather than a 5. While the ordering may be fine there is no formal indication of what is lost by using a 4 rather than a 5. A large potential exists it would seem of drawing false inference. A large potential exists it would seem of drawing false inference.

4 The Randomised Controlled Trial (A truly marvellous scientific invention) Note to avoid ‘bias’: Allocation is best made tamper-proof. Allocation is best made tamper-proof. (e.g. use ‘concealment’) Use multiple blinding of: Use multiple blinding of: patients, patients, physicians, physicians, assessors, assessors, analysts … analysts … Population Take Sample Randomise to 2 groups Old Treatment Compare outcomes (averages) recognising that these are sample results and subject to sampling variation when applying back to the population New Treatment

5 Counts of those cured and not cured under the two treatments CuredNot Cured New Treatmentab Control (Standard treatment) cd By comparing the ratios of numbers ‘cured’ to ‘not cured’ in the 2 arms of the trial, the CPR= (ad)/(cb), it is possible to tell if the new treatment is better.

6 Confidence Intervals However there is sampling variability, because we don’t study everybody of interest; just our random sample. However there is sampling variability, because we don’t study everybody of interest; just our random sample. So cannot have perfect knowledge of the effect of interest, but only an estimate of it within a confidence interval (CI). So cannot have perfect knowledge of the effect of interest, but only an estimate of it within a confidence interval (CI). Need to know how to calculate the CI appropriately. This can be done under assumptions, which seem reasonable for the case of a clinical RCT and leads to a simple formula for the approximate CI (+/-1.96 standard error) of ln(CPR) Need to know how to calculate the CI appropriately. This can be done under assumptions, which seem reasonable for the case of a clinical RCT and leads to a simple formula for the approximate CI (+/-1.96 standard error) of ln(CPR) (s.e. (ln(CPR)) ) 2 = Var(ln(CPR)) = a b c d

7 Crime counts before and after in two areas one gets a CRI (4 on the Methods Scale) A similar table results. But this is not the same as the RCT set up as: 1 Not randomised, so no statistical equivalence exists at the start. 2 The unit is area, rather than crime event. BeforeAfter Treatment Area (Intervention is introduced between the 2 periods ) ab Comparison Area (Nothing is changed) cd

8 Lighting and crime There seem to be many ‘theoretical suggestions’ why lighting might increase or decrease crime. The meta-analysis, HORS251, by Farrington and Welsh suggests strongly that lighting beats crime. However my contention is that this study remains flawed and so we are ignorant of the effect of lighting on crime. (Note also HORS252 on CCTV)

9 Forest Plot as HORS 251 Meta-analysis reconstructed

10 But this can’t be right. The assumptions for calculating the CIs cannot be correct, in this case. Unit is area not crime. The events are not statistically independent. The assumptions for calculating the CIs cannot be correct, in this case. Unit is area not crime. The events are not statistically independent. Too much variation (heterogeneity) exists between individual study results compared with the uncertainty indicated by confidence intervals, (if the lighting has the same effect on crime in every study). Too much variation (heterogeneity) exists between individual study results compared with the uncertainty indicated by confidence intervals, (if the lighting has the same effect on crime in every study). Note there is great variation in crime counts between periods in the comparison areas, where nothing is changed, so the heterogeneity is inherent to the natural variation of crime. Note there is great variation in crime counts between periods in the comparison areas, where nothing is changed, so the heterogeneity is inherent to the natural variation of crime.

11 Pointing out the problem Marchant (2004), 7 page article in the British Journal of Criminology drawing attention to the problem. The formula for the CIs used must be inappropriate (also mentioning other short- comings). Marchant (2004), 7 page article in the British Journal of Criminology drawing attention to the problem. The formula for the CIs used must be inappropriate (also mentioning other short- comings). The authors of HORS251 had 20-page response on the next page, justifying the claim that lighting reduces crime. The authors of HORS251 had 20-page response on the next page, justifying the claim that lighting reduces crime. But I remain unconvinced by the claim. But I remain unconvinced by the claim.

12 Fixing the Heterogeneity Problem A way of making the problem go away is simply to increase the uncertainty, i.e. stretch the CIs. (‘A quasi- Poisson model’). A way of making the problem go away is simply to increase the uncertainty, i.e. stretch the CIs. (‘A quasi- Poisson model’). Here the CIs are stretched by a factor of 2.1. (Equivalent to reducing the events counted in every setting by a factor = 4.4. ). This adjustment has been made by the authors. Here the CIs are stretched by a factor of 2.1. (Equivalent to reducing the events counted in every setting by a factor = 4.4. ). This adjustment has been made by the authors. Problem solved.... or is it? Is such model plausible? Assumes every study should have its CI stretched by the same factor. This cannot be guaranteed. Problem solved.... or is it? Is such model plausible? Assumes every study should have its CI stretched by the same factor. This cannot be guaranteed. Only relatively few (13) studies. Only relatively few (13) studies. Need sensitivity analysis Need sensitivity analysis

13 Time Variation in Crime It appears that little is known about how crime varies on various scales. It appears that little is known about how crime varies on various scales. Much more needs to be known about the occurrence of crime events to know how to analyse them properly to be able find effects. Much more needs to be known about the occurrence of crime events to know how to analyse them properly to be able find effects. Need access to suitable data sets to examine this issue. This is on going research in which myself and colleagues are engaged. Need access to suitable data sets to examine this issue. This is on going research in which myself and colleagues are engaged. A general point: one needs to have knowledge about the system in order to understand if an intervention changes things. (And in order to design studies) A general point: one needs to have knowledge about the system in order to understand if an intervention changes things. (And in order to design studies)

14 The Bristol Study (Shaftoe 1994) Shaftoe said ‘no discernable lighting benefit’ but HORS251 said z=6.6 Note: had the data for the year immediately prior to the introduction of the relighting, i.e. periods 2 and 3, been used rather than unnaturally using periods 1 and 2 which leaves a gap of ½ year, the effect found would have been half of that claimed. (Shows large variability.)

15 Household studies In a couple of instances, instead of just counting recorded crimes a, b, c, d in the 4 cells (before, after, intervention, comparison), a household survey before and after of recalled crimes within the 2 areas (intervention, comparison) is carried out. In a couple of instances, instead of just counting recorded crimes a, b, c, d in the 4 cells (before, after, intervention, comparison), a household survey before and after of recalled crimes within the 2 areas (intervention, comparison) is carried out. One problem is that (unrecognised by authors Painter and Farrington) spatial correlation between the occurrence of crime needs to considered. Gives rise to a Design Effect familiar in clustered designs. Reduces the precision of the estimate of effect. One problem is that (unrecognised by authors Painter and Farrington) spatial correlation between the occurrence of crime needs to considered. Gives rise to a Design Effect familiar in clustered designs. Reduces the precision of the estimate of effect. Other problems, e.g. of differential change of composition between periods. Other problems, e.g. of differential change of composition between periods.

16 Lack of Equivalence between Areas Invariably it is the most crime-ridden area that gets the lighting, whereas the relatively crime-free ‘control’ area is not re-lit. So there is lack of equivalence at the start. One effect of this is to allow ‘regression towards the mean’ to operate. The name ‘Control Area’ is a misnomer. ‘Comparison Area’ is a better name.

17 Regression towards the mean X The before measurement Y The after measurement Cloud of Data Points Line of Equality Line of mean of Y for a given X

18 The response given to the lack of equivalence between the 2 areas. (RTM) Farrington and Welsh (2006) claim that RTM is a not problem because the effect in counted crimes in 250 Police ‘Basic Command Units’ going from 2002/3 to 2003/4 showed only small effect (a few %). This is hardly surprising as the areas and hence the number of crimes counted are an order of magnitude larger than in HORS251 so the year to year correlation is expected to be higher than for the small lighting study areas. Farrington and Welsh (2006) claim that RTM is a not problem because the effect in counted crimes in 250 Police ‘Basic Command Units’ going from 2002/3 to 2003/4 showed only small effect (a few %). This is hardly surprising as the areas and hence the number of crimes counted are an order of magnitude larger than in HORS251 so the year to year correlation is expected to be higher than for the small lighting study areas. Note Wrigley (1995) “This tendency for correlation coefficients to increase in magnitude as the size of the areal unit involved increases has been known since the work of Gehlke and Biehl (1934)”. Note Wrigley (1995) “This tendency for correlation coefficients to increase in magnitude as the size of the areal unit involved increases has been known since the work of Gehlke and Biehl (1934)”.

19 Log crime rates in successive periods

20 Estimating the effect of RTM On the basis of log normal crime rates it can be shown that if the intervention has no effect, the expected ln CPR = (1-ρσ y /σ x ) ln x 1 /x 2 x 1 /x 2 is the crime rate ratio; σ x, σ y the sds on the log scale and ρ the correlation on the log scale x 1 /x 2 is the crime rate ratio; σ x, σ y the sds on the log scale and ρ the correlation on the log scale variance ln CPR = 2 σ y 2 (1-ρ 2 )

21 Estimation of the effect of RTM The simple model of crime rates suggests that the high year to year correlation typically 0.95 for the BCU data, would indeed give an effect of a few %. The simple model of crime rates suggests that the high year to year correlation typically 0.95 for the BCU data, would indeed give an effect of a few %. However the smaller areas used in CRI evaluation would be expected to have lower correlation However the smaller areas used in CRI evaluation would be expected to have lower correlation Burglary data from a study of 124 areas has correlation of about 0.8 giving, all else equal, an expected effect 4 times larger comparable to the claimed lighting effect. Burglary data from a study of 124 areas has correlation of about 0.8 giving, all else equal, an expected effect 4 times larger comparable to the claimed lighting effect. Note: in general we don’t know the correlation nor rates being compared for the lighting studies. However, we do know, whereas the household crime rate ratio at the start was 1.40 for Dudley, that for Stoke was 2.51 giving a much larger expected RTM effect. Note: in general we don’t know the correlation nor rates being compared for the lighting studies. However, we do know, whereas the household crime rate ratio at the start was 1.40 for Dudley, that for Stoke was 2.51 giving a much larger expected RTM effect. Without better knowledge we can’t be definite about the impact of RTM but the indications are that the bias could be serious and uncertainty large. Without better knowledge we can’t be definite about the impact of RTM but the indications are that the bias could be serious and uncertainty large.

22 Expected natural log of CPR and its CI for a set of burglary data.

23 Potential consequences of weak methods Because there is a tendency to find ‘positive effects’ and probably even more so with less rigorous work, one is likely to end up with an even more distorted research record. Because there is a tendency to find ‘positive effects’ and probably even more so with less rigorous work, one is likely to end up with an even more distorted research record. This might lead dubious justification through flimsy cost benefit analyses justifying a bad policy. This might lead dubious justification through flimsy cost benefit analyses justifying a bad policy. While it might be possible to estimate the effect of the excess variability or the effect of RTM discussed, it would seem problematic to be confident about adequately adjusting for them. While it might be possible to estimate the effect of the excess variability or the effect of RTM discussed, it would seem problematic to be confident about adequately adjusting for them. RCTs would avoid many problems and may be very cheap relative to policy costs. RCTs would avoid many problems and may be very cheap relative to policy costs.

24 Some conclusions A ‘Methods Scale’ seems to suggest that designs weaker than RCTs might suffice, without indicating what is lost. A ‘Methods Scale’ seems to suggest that designs weaker than RCTs might suffice, without indicating what is lost. I have indicated some of the problems which result. I have indicated some of the problems which result. Need to ‘foster scepticism’ (Gorard 2002) Need to ‘foster scepticism’ (Gorard 2002) I remain to be convinced that the deficiencies can be adequately overcome through estimating quantitatively the consequences of using a weaker design. I remain to be convinced that the deficiencies can be adequately overcome through estimating quantitatively the consequences of using a weaker design. Weaker designs might be useful in preliminary research but should not be considered as adequate when there are expensive consequences. Weaker designs might be useful in preliminary research but should not be considered as adequate when there are expensive consequences. RCTs can be problematic enough! (We need registered trials, published protocols, blinding etc…..) RCTs can be problematic enough! (We need registered trials, published protocols, blinding etc…..) Evaluations of policies need to be done to a high scientific standard. Evaluations of policies need to be done to a high scientific standard.

25 References Farrington D.P. and Welsh B.C. (2002) The Effects of Improved Street Lighting on Crime: A Systematic Review, Home Office Research Study 251, Farrington D.P. and Welsh B.C. (2004) Measuring the Effects of Improved Street Lighting on Crime: A reply to Dr. Marchant The British Journal of Criminology Farrington D.P. and Welsh B.C. (2006) How Important is Regression to the Mean in Area-Based Crime Prevention Research?, Crime Prevention and Community Safety 8 50 Gorard S (2002) Fostering Scepticism: The Importance of Warranting Claims, Evaluation and Research in Education 16 3 p Marchant P.R. (2004) A Demonstration that the Claim that Brighter Lighting Reduces Crime is Unfounded The British Journal of Criminology

26 References continued Marchant P.R. (2005) What Works? A Critical Note on the Evaluation of Crime Reduction Initiatives, Crime Prevention and Community Safety Painter, K. and Farrington, D. P. (1997) The Crime Reducing Effect of Improved Street Lighting: The Dudley Project, in R.V. Clarke ed., Situational Crime Prevention: Successful case studies Harrow and Heston, Guilderland NY. Shaftoe, H (1994) Easton/Ashley, Bristol: Lighting Improvements, in S. Osborn (ed.) Housing Safe Communities: An Evaluation of Recent Initiatives 72-77, Safe Neighbourhoods Unit, London Tilley N., Pease K., Hough M. and Brown R. (1999) Burglary Prevention: Early Lessons from the Crime Reduction Programme, Crime Reduction Research series Paper1 London Home Office Wrigley N., Revisiting the Modifiable Areal Unit Problem and Ecological Fallacy pp49-71 in Gould PR, Hoare AG and Cliff AD Eds Diffusing Geography: Essays for Peter Haggett

27 The RTM problem The effect of RTM depends on the correlation (the weaker, the bigger) and increases with the size of the initial difference between groups. The effect of RTM depends on the correlation (the weaker, the bigger) and increases with the size of the initial difference between groups. Authors attempt to justify no RTM concern with large area crime data which shows only a small RTM effect. But this is wrong, as correlation won’t be as high in the smaller areas used in the trials. We also don’t know the rates in the areas in general for the 2 we do. They are quite different. (1.4X and 2.5X) Authors attempt to justify no RTM concern with large area crime data which shows only a small RTM effect. But this is wrong, as correlation won’t be as high in the smaller areas used in the trials. We also don’t know the rates in the areas in general for the 2 we do. They are quite different. (1.4X and 2.5X)