The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Interpretation of Effect Sizes.

Slides:



Advertisements
Similar presentations
A Spreadsheet for Analysis of Straightforward Controlled Trials
Advertisements

Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Meta-analysis: summarising data for two arm trials and other simple outcome studies Steff Lewis statistician.
EVAL 6970: Meta-Analysis Vote Counting, The Sign Test, Power, Publication Bias, and Outliers Dr. Chris L. S. Coryn Spring 2011.
Hypothesis Testing making decisions using sample data.
Effect Size Overheads1 The Effect Size The effect size (ES) makes meta-analysis possible. The ES encodes the selected research findings on a numeric scale.
Effect Size and Meta-Analysis
Practical Meta-Analysis -- D. B. Wilson 1 Interpreting Effect Size Results Cohen’s “Rules-of-Thumb”  standardized mean difference effect size small =
The Campbell Collaborationwww.campbellcollaboration.org Moderator analyses: Categorical models and Meta-regression Terri Pigott, C2 Methods Editor & co-Chair.
Introduction to Meta-Analysis Joseph Stevens, Ph.D., University of Oregon (541) , © Stevens 2006.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Chapter 17 Comparing Two Proportions
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Meta-analysis & psychotherapy outcome research
Meta-Analysis and Strategy Research Dan R. Dalton Kelley School of Business Indiana University.
Topic 3: Regression.
Practical Meta-Analysis -- D. B. Wilson
8-2 Basics of Hypothesis Testing
Brown, Suter, and Churchill Basic Marketing Research (8 th Edition) © 2014 CENGAGE Learning Basic Marketing Research Customer Insights and Managerial Action.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Sample Size Determination
Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
The Bahrain Branch of the UK Cochrane Centre In Collaboration with Reyada Training & Management Consultancy, Dubai-UAE Cochrane Collaboration and Systematic.
Making all research results publically available: the cry of systematic reviewers.
Are the results valid? Was the validity of the included studies appraised?
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Data Analysis and Interpretation: Computing effect sizes.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Chapter 8 Hypothesis Testing 8-1 Review and Preview 8-2 Basics of Hypothesis.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Lecture Slides Elementary Statistics Twelfth Edition
Hypothesis Testing in Linear Regression Analysis
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Propensity Score Matching
Statistics for clinical research An introductory course.
EVAL 6970: Cost Analysis for Evaluation Dr. Chris L. S. Coryn Nick Saxton Fall 2014.
Advanced Statistics for Researchers Meta-analysis and Systematic Review Avoiding bias in literature review and calculating effect sizes Dr. Chris Rakes.
8.1 Inference for a Single Proportion
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
T-TEST Statistics The t test is used to compare to groups to answer the differential research questions. Its values determines the difference by comparing.
Effect Size Estimation in Fixed Factors Between-Groups ANOVA
Effect Size Estimation in Fixed Factors Between- Groups Anova.
Meta-analysis and “statistical aggregation” Dave Thompson Dept. of Biostatistics and Epidemiology College of Public Health, OUHSC Learning to Practice.
Meta-analysis 統合分析 蔡崇弘. EBM ( evidence based medicine) Ask Acquire Appraising Apply Audit.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
1October In Chapter 17: 17.1 Data 17.2 Risk Difference 17.3 Hypothesis Test 17.4 Risk Ratio 17.5 Systematic Sources of Error 17.6 Power and Sample.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Introduction to meta-analysis.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Issues concerning the interpretation of statistical significance tests.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Chapter 6: Analyzing and Interpreting Quantitative Data
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 27 Systematic Reviews of Research Evidence: Meta-Analysis, Metasynthesis,
Hypothesis Testing and Statistical Significance
Chapter 22 Inferential Data Analysis: Part 2 PowerPoint presentation developed by: Jennifer L. Bellamy & Sarah E. Bledsoe.
Copyright © 2009 Pearson Education, Inc t LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal.
Logistic Regression: Regression with a Binary Dependent Variable.
Effect Sizes.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
Chapter 10 Analyzing the Association Between Categorical Variables
Presentation transcript:

The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Interpretation of Effect Sizes

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Why Do We Need to Interpret Effect Sizes? The importance of some intervention effects are sometimes intuitively understood – Change in earning power “ College graduates will earn $XX more in their lifetimes than non- graduates. ” – Risk ratio “… are 1.4 times more likely to …” – Grade level equivalency “ students receiving the intervention scored 5.3 GLE while students not receiving the intervention scored 4.9 GLE. ” But, most are not … – Statistically significant effect – Correlation of +.35, d = -.15 In most cases, we ’ ll be working with effects that have to be translated so people will have some idea how to interpret them

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Options for Expressing Study Results in an Understandable Metric Statistical significance – Sometimes naively used as a proxy for effect size But trivially small effects can be statistically significant And large effects can be statistically nonsignificant Remember, a p-value expresses the likelihood of observing a result at least this big, assuming a true null hypothesis

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org More on ES and Statistical Significance Some students learn that if a statistical test fails to reject the null, it means that the population effect is zero – For example, that the intervention is ineffective – This is one reason people confuse statistical significance with practical significance (as in, if it is not statistically significant it can ’ t be practically significant) – However …

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Point Estimation vs. Interval Estimation Interval estimation – Confidence intervals tell us the likely range of population values If a study has a confidence interval for IQ scores ranging from.1 to 10.1 points, that is the likely range of the treatment effect as suggested by this study Point estimation – Point estimates (e.g., the mean) tell us the most likely value of the population parameter Point estimation and interval estimation are best kept separate Asserting that the treatment effect is zero if the test is not statistically significant confounds these two activities

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Counternull Value of an Effect Size The counternull value of an effect size points out this problem – Assume a study finds d = +.30, p =.10 – Classic H 0 : Counternull H 0 : There is exactly as much evidence supporting the “ classic ” null hypothesis as there is the counternull hypothesis! (The ES is not statistically different from either 0 or +.60)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Proportion of Variance Explained Common for correlations (r 2 ), multiple regression (R 2 ) Research suggests that neither experienced researchers nor experienced statisticians have a good feel for the practical meaning of this type of effect size (Rosenthal, 1984) – Typically, even well-trained individuals underestimate the importance of results when stated in terms of proportion of variance explained – Not to mention policy makers and the general public

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org More on Proportion of Variance Explained Consider a study – Program designed to improve graduation rate among “at-risk” students – φ = +.32, φ 2 =.10 Remember, φ is a correlation with 2 dichotomous variables – Using proportion of variance as the effect size, one might be tempted to label this a small or even trivial effect, as only 10% of the variance in graduation rates can be attributed to the intervention. But …

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Binomial Effect Size Display GraduatedDid not Graduate Received Intervention6634 Control3466 φ =.32

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Physician ’ s Aspirin Study Subsequent heart attack ratesNo Heart AttackHeart Attack Aspirin10, Placebo10, φ=.03, φ 2 =.0009, p<.0001, OR=.55, Risk ratio =.55 (55% fewer men who take aspirin have a second heart attack) Fatality rates, given second heart attack Aspirin995 Placebo17118 φ=.08, φ 2 =.006, p =.16, OR=.48, Risk ratio =.51

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Computing the BESD For dichotomous outcomes, the BESD illustrates change in “ success rate ” corresponding to particular values of r – For example, the number of additional graduates Computed as (simply) Treatment group success rate =.50 + (r/2) Control group success rate =.50 – (r/2)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Risk Ratios Defined as: Events in the treatment group / treatment group n ÷ Events in the control group / control group n Interpreted as “ The ratio of risk in the treatment group relative to the risk in the control group ” – Risk ratio for having a second heart attack was.55 55% fewer men who take aspirin have a second heart attack

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Odds vs. Risk Ratios OR and RR are very similar when events are rare When events become more common, they diverge – Study 1: OR =.40 RR =.401 – Study 2: OR = 1.25 RR = 1.50 Generally, logged ORs have somewhat better properties for meta-analysis – Can convert any OR to a RR for interpretation Study 1EventNon-event Treatment21000 Control51000 Study 2EventNon-event Treatment500 Control400600

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Risk Difference Interpreted as – The difference in risks between two groups Defined as (a ÷ (a+b)) - (c ÷ (c+d)) 104 ÷ ( ) ÷ ( ) = = (or.77%) No Heart Attack Heart Attack Aspirin104 (a)10,993 (b) Placebo189 (c)10,845 (d)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Number Needed to Treat Number needed to treat (NNT) is an additional way to interpret dichotomous outcomes – How many people have to receive the intervention to produce one more positive (or, one less negative) event? Defined as 1/risk difference Here, NNT = 1/.0077 ≈ 130 – So, 130 men who have had a heart attack need to take aspirin to prevent one additional second heart attack – With the fictitious program designed to increase graduation rates among “ at-risk ” students, RD = =.32 NNT = 1/.32 = – for every people who participate in the program, an additional one person will graduate

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Interpretation of effect sizes

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Cohen ’ s Benchmarks Jacob Cohen (1988) proposed general definitions for interpreting effect size estimates: d-indexr Small Medium Large.80.50

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org More on Cohen Lipsey & Wilson (1993) analyzed 183 meta-analyses in the social sciences – 25 th percentile d =.25 – 50 th percentile d =.38 – 75 th percentile d =.62 Cohen intended these to be “ rules of thumb ”, and emphasized that they represent average effects from across the social sciences – Cautioned that in some areas, smallish effects may be more typical due to: Measurement error Relative weakness of interventions – He did not intend these to stand for estimates of practical significance!

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Yet Another Cohen Metric U 3 – See Cooper, pp (esp. table on p. 130) dU 3 (%)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org More on U 3 – “ What percentage of scores in the lower- meaned group was exceeded by the average score in the higher-meaned group? ” – “ What is the probability that a randomly selected member of the treatment group will outperform a randomly selected member of the control group? ” – Example: For HS students, homework has a d of Imagine two high school with exactly 100 students. – If the average student in the homework high school moved to the high school with no homework, her rank would improve from 50 to 42 (from the 50 th percentile to the 58 th percentile). – If you were to randomly select one student from the homework high school and one from the non-homework high school, and do that a bunch of times, you ’ d expect the homework HS student to outscore the non-homework HS student 58% of the time.

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Converting Back to Original Metric It can sometimes be helpful to use the mean difference to translate back into a metric people are more accustomed to working with – Example Assume we did a research synthesis and meta-analysis of the effects of homework on achievement among HS students. Outcomes included standardized test scores such as the SAT and ACT, and chapter tests. Assume overall result was d = +.20, and that type of outcome was not a moderator of effect sizes. – SAT average = 500, SD = 100 – ACT average = 21, SD = 5 » “ The overall effect suggests, for example, that the average student doing homework would see an increase in SAT scores from 500 to 520, or in ACT scores from 21 to 22. ” – Cautions Comparing different constructs (e.g., math achievement vs. attendance) is difficult to impossible Even when tests are highly similar, if their distributions are different the comparisons can be misleading

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Basic Strategy for Comparing Effect Sizes Holding intervention constant, are there differential effects across outcomes? – Does summer school help math more than reading? Holding outcome constant, are there differential effects across interventions (or intervention components)? – Does mentoring affect graduation rates more than tutoring?

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Other Considerations When Comparing Effect Sizes Are some important outcomes completely missing from the evidence base? Are some interventions or intervention components missing from the evidence base? Is there covariation between interventions and study methodology? Is there covariation between interventions and outcome choice? – Caution about comparing different mediating variables

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Reporting 1. Narrative 2. Tables Characteristics of included studies Excluded studies: specific reasons for exclusion Results of any multivariate analyses 3. Graphs Forest plots: study-level effects, pooled effects, homogeneity tests Funnel plots, trim & fill analysis

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Interpretation of results 1. Quality of available evidence (number of studies in the review, risk of bias) 2. Precision of study-level effects 3. Homogeneity of effects across studies 4. Pooled effects a. magnitude and direction of point estimate b. precision (confidence intervals) c. statistical and clinical significance d. potential sources of bias 5. Moderator analysis

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Guidelines & Standards 1. Conduct of systematic reviews Cochrane Handbook(s) for systematic reviews of – Intervention effects (Higgins & Green, 2008) – Diagnostic test accuracy 2. Reporting PRISMA (Moher et al., 2009) - preferred reporting items for systematic reviews and meta-analysis APA reporting guidelines (2008) 3. Assessing methodological quality of SRs AMSTAR (Shea et al., 2007)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Conclusion: Review methods matter Systematic reviews can provide more accurate syntheses of empirical evidence than traditional reviews and stand-alone meta-analyses – Ought to be used (along with other information) to inform policy and practice Instead of traditional reviews & stand-alone meta- analyses – Should follow current guidelines and standards (Higgins & Green, 2008; Moher et al., 2009)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Informing practice and policy: The MST story continues Evidence from Cochrane/Campbell review – MST not more or less effective than alternatives – Findings of no difference mean that policy decisions must be made other grounds MST continues in Sweden – practitioners and administrators like MST structure, documentation MST discontinued in Ontario – more expensive than equally- effective alternatives Both decisions are based on best available evidence

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org Evidence for practice and policy Adapted from: Gibbs (2003), Davies (2004)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org The story continues MST review – update is underway now – New studies – New follow-up data on previous studies – Results could change (in either direction) or not Early studies may over-estimate effects – Novelty effects in cumulative meta-analysis (e.g., Trikalinos et al., 2004)

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org The story continues: The science of research synthesis is rapidly evolving Cochrane Handbook is a “living” document (available at New journal from the Society for Research Synthesis Methodology & Wiley/Blackwell: Research Synthesis Methods

C2 Training Materials – Oslo – May 2011www.campbellcollaboration.org The future of research synthesis On the horizon: Semi-automated screening of titles and abstracts (Wallace et al., 2010) Integrated software to manage all stages of the SR process Better access to data to counteract reporting and publication biases (e.g., WHO global platform for prospective registers) Better tests and corrections for publication bias (e.g., Moreno et al., 2009) Advances in meta-analysis for multivariate data, diagnostic and prognostic tests Adjustments for bias in primary studies (Turner et al., 2009) Qualitative and mixed methods syntheses Beyond the horizon… ???