Effect size: Why what we teach psychologists is wrong Dr Thom Baguley, Psychology, Nottingham Trent University

Slides:



Advertisements
Similar presentations
Effect Sizes and Power Review
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Lecture 6 Outline – Thur. Jan. 29
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Comparing r and b How to Choose, Moving From One to the Other, and Sampling Distributions.
INDEPENDENT SAMPLES T Purpose: Test whether two means are significantly different Design: between subjects scores are unpaired between groups.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Topics: Inferential Statistics
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Meta-analysis & psychotherapy outcome research
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN 1 This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Standard error of estimate & Confidence interval.
Bootstrapping applied to t-tests
Chapter 7 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Chapter 7: The t Test for Two Independent Sample Means To conduct a t test for.
Chapter 8 Introduction to Hypothesis Testing
Chapter 7 Estimation: Single Population
Chapter 1: Introduction to Statistics
Comparing Two Proportions
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Confidence Interval Estimation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Statistical Bootstrapping Peter D. Christenson Biostatistician January 20, 2005.
Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Effect Size Estimation in Fixed Factors Between-Groups ANOVA
Effect Size Estimation in Fixed Factors Between- Groups Anova.
Probability.  Provides a basis for thinking about the probability of possible outcomes  & can be used to determine how confident we can be in an effect.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Wim Van den Noortgate Katholieke Universiteit Leuven, Belgium Belgian Campbell Group Workshop systematic reviews.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Some Alternative Approaches Two Samples. Outline Scales of measurement may narrow down our options, but the choice of final analysis is up to the researcher.
Analysis of Experimental Data; Introduction
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall 2015 Room 150 Harvill.
Lecturer’s desk Physics- atmospheric Sciences (PAS) - Room 201 s c r e e n Row A Row B Row C Row D Row E Row F Row G Row H Row A
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Hypothesis Testing and Statistical Significance
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Making Inferences About Effects Seminar presented at Leeds Beckett and Split universities, March 2016 This slideshow consists of part of the lecture on.
Mediation: Assumptions David A. Kenny davidakenny.net.
Inference about the slope parameter and correlation
Statistical Inference
Comparing Two Proportions
Interval Estimation Part II
6-1 Introduction To Empirical Models
Review for Exam 2 Some important themes from Chapters 6-9
Comparing Two Proportions
Tutorial 9 Suppose that a random sample of size 10 is drawn from a normal distribution with mean 10 and variance 4. Find the following probabilities:
Presentation transcript:

Effect size: Why what we teach psychologists is wrong Dr Thom Baguley, Psychology, Nottingham Trent University

2 0. Overview 1. Introduction 2. Standardized effect size (or, I do not think it means what you think it means) 3. We aren’t selling T-shirts here 4. Never mind the quantity feel the width 5. Big isn’t always better 6. Conclusions

3 1. Introduction Statistical significance does not imply practical significance (e.g., Rosenthal, 1994; Kirk, 1996) The practical significance depends (but by no means exclusively) on the magnitude of the effect Advice (e.g., from the APA) is to report effect size (e.g., alongside results of a significance test)

4 2. Standardized effect size (or, I do not think it means what you think it means) Simple (unstandardized) effect size uses the original units of measurement e.g.,unstandardized regression slope (b) simple difference in group means (M 1 - M 2 ) Standardized effect size replaces the original units with standard deviation units (or equivalents such as the variance) e.g.,standardized regression slope (  or r) Cohen’s d = (M 1 - M 2 )/ SD pooled

5 Problems with standardized effect size -Standardized units are tricky to interpret because the original context is lost (in particular for applied research – see Baguley, 2004) - Standardized units confound the magnitude of an effect with its variability Q. Why is the latter a problem? A. The variability of an effect is not stable …

6 Baguley (in press) discusses some of the factors that influence the variability of an effect: i)reliability (measurement error) ii)range restriction iii)design of the study

7 Attenuation due to unreliability According to classical test theory an observed correlation r xy depends on the reliability with which X and Y are measured: It follows that standardized effect size is distorted (and usually reduced by) measurement error Simple effect size is robust with respect to reliability (for analyses with orthogonal predictors)

8 (a) The unstandardized slope between two normal, random variables: X and Y; Y = X. (b) The unstandardized slope, selecting only the upper and lower quartiles of X; Y = X. (c) The standardized slope of X and Y (r 99 =.605). (d) The standardized slope of X and Y selecting only the upper and lower quartiles of X (r 49 =.735).

9 Study design Aspects of a study’s design (such as sample characteristics) also influence variability Consider a study on negative priming effects comparing young and old people: (Adapted from Buchner & Mayr, 2004; Experiment 1) YoungOld M 1 - M 2 53 ms 79 ms SD 64 ms 136 ms Cohen’s d

10 3. We aren’t selling T-shirts here Cohen (1988) labeled effect sizes in the behavioural sciences as: Cohen originally intended them as a last resort in sample size calculations ‘small’‘medium’‘large’ d r

11 ‘T-shirt’ effect sizes Lenth (2001) has called them ‘canned’ effect sizes or (more recently) ‘T-shirt’ effect sizes These labels are dangerous because they ignore so many important factors (e.g., Glass et al., 1981; Lenth, 2001; Baguley, 2004; 2008) Comments about the absolute magnitude of an effect (e.g., that it was ‘large’) can mislead (Robinson et al., 2003)

12 4. Never mind the quantity feel the width We focus too much on the point estimate of an effect size (whether standardized or simple) The uncertainty in the point estimate needs to be considered when interpreting an effect Confidence intervals (CIs) offer a convenient way to do this e.g., for a Normal distributed a 95% CI of the mean would be +/ SEs

13 Example: CI for a correlation Reporting NHST for correlation: r(49) =.168, p >.05 Reporting 95% CI for correlation: r(49) =.168 (-.116,.455) Unlike the NHST it is obvious from the CI that it is implausible that r is exactly or very close to zero

14 5. Big isn’t always better Supposedly ‘small’ effects can be impressive too e.g., consider the classic Salk vaccine trial data r =.0106 (r 2 = ≈ 0.01%) … but the odds ratio = 3.48 (2.36, 5.12) (The odds of getting polio are 3 to 4 times higher for unvaccinated children) polio +polio - vaccine 33200,712 placebo115201,114

15 Likewise ‘big’ effects can be unimpressive in some contexts or be cause for suspicion e.g., impossibly large correlations in social neuroscience studies (Vul et al.,in press) Others have argued that unusually large effects in high impact journals are particularly likely to be false (e.g., due to publication bias) (e.g., Ionaddes,2008; Young et al., 2008)

16 Social neuroscience correlations reported by Vul et al. (in press) by methods of calculation. Correlations above 0.7 are implausibly high † (except through sampling error) and observed r around 0.5 or 0.60 would be pretty impressive! (Even the independent method probably overestimates r) † Because the reliablity of these measures is rarely > 0.7

17 6. Conclusions -Psychologists overemphasize standardized effect size -Reporting and interpreting research findings isn’t like selling T-shirts (We can’t cram everything into three sizes) -Effect sizes are imprecise estimates -Determining the practical or theoretical importance of a study is highly context dependent

18 A final thought As teachers of psychology we emphasize the importance of critical thinking to our students Statistics seems to be an exception to this More often than not we teach ritualized methods Why is this?