Evaluating importance: An overview (emphasis on Cohen’s d) Measures size and direction of an effect ( attempts to address practical significance ) Measures.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Critical review of significance testing F.DAncona from a Alain Morens lecture 2006.
II. Potential Errors In Epidemiologic Studies Random Error Dr. Sherine Shawky.
Statistical vs. Practical Significance
T-Tests For Dummies As in the books, not you personally!
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Decision Errors and Statistical Power Overview –To understand the different kinds of errors that can be made in a significance testing context –To understand.
Statistical Issues in Research Planning and Evaluation
Please turn in your signed syllabus. We will be going to get textbooks shortly after class starts. Homework: Reading Guide – Chapter 2: The Chemical Context.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
PSY 307 – Statistics for the Behavioral Sciences
Behavioural Science II Week 1, Semester 2, 2002
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Variability and statistical tests. Where the variability comes from? Instrumental measurements Biology –Genotype –Environment –Ootype –Experimental factors.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 11: Power.
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
The t Tests Independent Samples.
Introduction to Testing a Hypothesis Testing a treatment Descriptive statistics cannot determine if differences are due to chance. A sampling error occurs.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
AM Recitation 2/10/11.
Hypothesis Testing II The Two-Sample Case.
Chapter 8 Introduction to Hypothesis Testing
Basic Statistics. Basics Of Measurement Sampling Distribution of the Mean: The set of all possible means of samples of a given size taken from a population.
Statistical Analysis A Quick Overview. The Scientific Method Establishing a hypothesis (idea) Collecting evidence (often in the form of numerical data)
Chapter 7 Statistical Issues in Research Planning and Evaluation.
Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
When trying to explain some of the patterns you have observed in your species and community data, it sometimes helps to have a look at relationships between.
CONFIDENCE INTERVAL It is the interval or range of values which most likely encompasses the true population value. It is the extent that a particular.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 1): Two-tail Tests & Confidence Intervals Fall, 2008.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Statistics 101 Chapter 10 Section 2. How to run a significance test Step 1: Identify the population of interest and the parameter you want to draw conclusions.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Elang 273: Statistics. Review: Scientific Method 1. Observe something 2. Speculated why it is so and form hypothesis 3. Test hypothesis by getting data.
© Copyright McGraw-Hill 2004
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Testing Hypotheses II Lesson 10. A Directional Hypothesis (1-tailed) n Does reading to young children increase IQ scores?  = 100,  = 15, n = 25 l sample.
Introduction to Testing a Hypothesis Testing a treatment Descriptive statistics cannot determine if differences are due to chance. Sampling error means.
Science Practice 2: The student can use mathematics appropriately. Science Practice 5: The student can perform data analysis and evaluation of evidence.
Evaluating importance: An overview Size (magnitude) of effect (a.k.a. practical significance) Size (magnitude) of effect (a.k.a. practical significance)
Sampling Distribution (a.k.a. “Distribution of Sample Outcomes”) – Based on the laws of probability – “OUTCOMES” = proportions, means, test statistics.
Hypothesis test flow chart
Chapter 13 Understanding research results: statistical inference.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Chapter 7 Statistical Issues in Research Planning and Evaluation.
Chapter ?? 7 Statistical Issues in Research Planning and Evaluation C H A P T E R.
Hypothesis Testing and Statistical Significance
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Tests of Significance -how to put them in writing.
Chapter 9 Introduction to the t Statistic
Critical Appraisal Course for Emergency Medicine Trainees Module 2 Statistics.
Testing of Hypothesis. Test of Significance A procedure to assess the significance of a statistics that a sample statistics would differ from a given.
Hypothesis Testing I The One-sample Case
Statistical Inference about Regression
What’s next Quiz.
P-VALUE.
Tests of Significance Section 10.2.
Chapter 7: Statistical Issues in Research planning and Evaluation
How do you know if the variation in data is the result of random chance or environmental factors? O is the observed value E is the expected value.
Type I and Type II Errors
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Presentation transcript:

Evaluating importance: An overview (emphasis on Cohen’s d) Measures size and direction of an effect ( attempts to address practical significance ) Measures size and direction of an effect ( attempts to address practical significance ) Cohen’d (sometimes equated to a correlation r) Cohen’d (sometimes equated to a correlation r) Functional significance or clinical significance Functional significance or clinical significance e.g., change in blood pressure, weight, etc. e.g., change in blood pressure, weight, etc. Remember CI’s for the estimated effect can also be used for this! Remember CI’s for the estimated effect can also be used for this!

Practical vs statistical significance Statistical significance (alpha level; p- value) reflects the odds that a particular finding could have occurred by chance. Statistical significance (alpha level; p- value) reflects the odds that a particular finding could have occurred by chance. If the p-value for a difference between two groups is 0.05, it would be expected to occur by chance just 5 times out of 100 (thus, it is likely to be a “real” difference). If the p-value for a difference between two groups is 0.05, it would be expected to occur by chance just 5 times out of 100 (thus, it is likely to be a “real” difference). If the p-value for the difference is 0.01, it would be expected to occur by chance just one time out of 100 (thus, we can be even more confident that the difference is real rather than random). If the p-value for the difference is 0.01, it would be expected to occur by chance just one time out of 100 (thus, we can be even more confident that the difference is real rather than random).

Practical significance Reflects the magnitude, or size, of the difference, not the probability that the observed result could have happened by chance. Reflects the magnitude, or size, of the difference, not the probability that the observed result could have happened by chance. Arguably much more important than statistical significance, especially for clinical questions Arguably much more important than statistical significance, especially for clinical questions Measures of effect size (ES) quantify practical significance of a finding Measures of effect size (ES) quantify practical significance of a finding

Effect size The degree to which the null hypothesis is false, e.g., not just that two groups differ significantly, but how much they differ (Cohen, 1990) The degree to which the null hypothesis is false, e.g., not just that two groups differ significantly, but how much they differ (Cohen, 1990) Several measures of ES exist; use “whatever conveys the magnitude of the phenomenon of interest appropriate to the research context” (Cohen, 1990, p. 1310) Several measures of ES exist; use “whatever conveys the magnitude of the phenomenon of interest appropriate to the research context” (Cohen, 1990, p. 1310) IQ and height example (Cohen, 1990) IQ and height example (Cohen, 1990)

The height-IQ correlation: Cohen’s (1990) example on statistical and practical significance A study of 14,000 children ages 6-17 showed a “highly significant” (p <.001) correlation of r =.11) between height and IQ A study of 14,000 children ages 6-17 showed a “highly significant” (p <.001) correlation of r =.11) between height and IQ What does this p indicate? What does this p indicate? What’s the magnitude of this correlation? What’s the magnitude of this correlation? –Accounts for 1% of the variance –Based on an r this big, you’d expect that increasing a child’s height by 4 feet would increase IQ by 30 points, and that increasing IQ by 233 points would increase height by 4 inches (as a correlation, the predicted relationship could work in either direction)

2 main types of ES measures Variance accounted for Variance accounted for – a squared metric reflecting the percentage of variance in the dependent variable explained by the independent variable –e.g., squared correlations, odds ratios, kappa statistics Standardized difference Standardized difference –scales measurements across studies into a single metric referenced to some standard deviation –d the most common and the easiest conceptually: our focus today

Effect size APA (2001) Publication Manual mandates:...it is almost always necessary to include some index of effect size or strength of relationship…provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship (pp ). APA (2001) Publication Manual mandates:...it is almost always necessary to include some index of effect size or strength of relationship…provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship (pp ).

APA guidelines (2001) mandate inclusion of ES information (not just p-value information) in all published reports. APA guidelines (2001) mandate inclusion of ES information (not just p-value information) in all published reports. Until that happy day, if ES information is missing, readers must estimate ES for themselves. Until that happy day, if ES information is missing, readers must estimate ES for themselves. When group means and SDs are reported, you often can estimate effect size quickly and decide whether to keep reading or not. When group means and SDs are reported, you often can estimate effect size quickly and decide whether to keep reading or not.

Finding, estimating, & interpreting d in group comparison studies d = Difference between the means of the two groups, divided by the standard deviation (SD). d = Difference between the means of the two groups, divided by the standard deviation (SD). Interpret as size of group difference in SD units. Interpret as size of group difference in SD units. When average mean difference between tx and control groups is 0.8 to 1 SD, practical significance has been defined as “high”. When average mean difference between tx and control groups is 0.8 to 1 SD, practical significance has been defined as “high”.

Estimating d Find group means, subtract them, and divide by the standard deviation. Find group means, subtract them, and divide by the standard deviation. When SDs for the groups are identical, hooray. When not, arguments have been made for using the control group SD, or the average of the two SDs. When SDs for the groups are identical, hooray. When not, arguments have been made for using the control group SD, or the average of the two SDs. –My preference is the second, which is more conservative and strikes me as more appropriate when dealing with the large variability we see in many groups of patients with disorders

Exercise 1: Calculating effect size, given group means and SDs Data from Arnold et al. (2004) study comparing scores on SNAP composite test after four types of treatment for ADHD Data from Arnold et al. (2004) study comparing scores on SNAP composite test after four types of treatment for ADHD (Scores on SNAP composite; lower = better): Treatment groupMean (SD) Combined 0.92 (0.50) Medical management 0.95 (0.51) Behavioral 1.34 (0.56) Community care1.40 (0.54) Assessment-Procedure.htmlhttp:// Assessment-Procedure.html (Link to the SNAP Assessment)

d demonstration, comparing SNAP performance in Combined and Medical Management groups Combined0.92 (0.50) Medical management0.95 (0.51) d = /0.505 = -.03/.505 = d = /0.505 = -.03/.505 = Interpretation: The Combined group scored about 6/100s of a standard deviation better (lower) than the Medical Mgt group (an extremely tiny difference; these treatment approaches resulted in virtually the same outcomes on the SNAP measure)

d for Combined vs. Community Care treatment groups Combined0.92 (0.50) Community care1.40 (0.54) d = /0.52 = -0.48/.52 = d = /0.52 = -0.48/.52 = Interpretation: The Combined group scored nearly a whole standard deviation better than the Community care group; this is a large effect size. Combined treatment is substantially better than Community care.

d for Medical Management vs. Behavioral treatment Medical management0.95 (0.51) Behavioral1.34 (0.56) d = ? d = ?Interpretation? Assuming equal sample sizes the pooled variance is average of the sample variances, from which we can calculate the pooled SD estimate (s pooled ). d = ( )/.536 =.39/.536 =.728 d = ( )/.536 =.39/.536 =.728

d for Medical Management vs. Behavioral treatment Medical management0.95 (0.51) Behavioral1.34 (0.56) d = = -.39/.535 = = -.73: d = = -.39/.535 = = -.73: Interpretation: The Medical Mgt group scored about 3/4s of a SD better than the behavioral group. This is a solid effect size suggesting that Medical Mgt treatment was substantially more effective than Behavioral treatment.

Exercise 1: Interpreting d in the happy cases when it’s reported Treatment-difference effect sizes (Cohen’s d) from Arnold et al., 2004 (Table II, p. 45) Treatment-difference effect sizes (Cohen’s d) from Arnold et al., 2004 (Table II, p. 45) Combined vs Medical Management 0.06 Combined vs Behavioral0.79 Combined vs Community Care0.92 Medical Management vs Behavioral0.728 Medical Mgt vs Community Care0.85 Behavioral vs Community Care0.11 Note that our calculated d’s match these. Note that our calculated d’s match these.

An Overview of Evaluating Precision Precision is reflected by the width of the confidence interval (CI) surrounding a given finding Precision is reflected by the width of the confidence interval (CI) surrounding a given finding Any given finding is acknowledged to be an estimate of the “real” or “true” finding Any given finding is acknowledged to be an estimate of the “real” or “true” finding CI reflects the range of values that includes the real finding with a known probability CI reflects the range of values that includes the real finding with a known probability A finding with a narrower CI is more precise (and thus more clinically useful) than a finding with a broader CI A finding with a narrower CI is more precise (and thus more clinically useful) than a finding with a broader CI The standard error (SE) is the key. The standard error (SE) is the key.

Evaluating Precision CI’s are calculated by adding and subtracting a multiple of the standard error (SE) for a finding/value (e.g., estimate SE(estimate) to determine the 95% CI) CI’s are calculated by adding and subtracting a multiple of the standard error (SE) for a finding/value (e.g., estimate x SE(estimate) to determine the 95% CI) As we have seen the standard error depends on sample size and reliability/variability; larger samples and higher reliability (smaller variability) will result in narrower CIs, all else being equal. As we have seen the standard error depends on sample size and reliability/variability; larger samples and higher reliability (smaller variability) will result in narrower CIs, all else being equal.

Finding and interpreting evidence of precision CI’s for difference between means of 206 children receiving early TTP and 196 receiving late TTP for OME (Paradise et al. 2001) CI’s for difference between means of 206 children receiving early TTP and 196 receiving late TTP for OME (Paradise et al. 2001) EarlyLate 95% CI PPVT 92 (13) 92 (15)-2.8 to 2.8 NDW 124 (32) 126 (30)-7.6 to 4.8 PCC-R 85 (7) 86 (7)-2.1 to 0.7 CI’s are narrow thanks to large sample CI’s are narrow thanks to large sample

Contrast with risk estimates for low PCC-R from smaller samples of children with (n=15) and without (n=47) OME-associated hearing loss. (Shriberg et al., 2000) Contrast with risk estimates for low PCC-R from smaller samples of children with (n=15) and without (n=47) OME-associated hearing loss. (Shriberg et al., 2000) Estimated risk was 9.60 (i.e., children with hearing loss were 9.6 times more likely to have low PCC-R at age 3 than children without. Estimated risk was 9.60 (i.e., children with hearing loss were 9.6 times more likely to have low PCC-R at age 3 than children without. But 95% confidence interval was – meaning that this increased risk was somewhere between none and a lot. Not very precise! But 95% confidence interval was – meaning that this increased risk was somewhere between none and a lot. Not very precise!

Predict precision In one study, children with histories of OME (n=10) had significantly lower scores on a competitive listening task than children without OME histories (n=13) In one study, children with histories of OME (n=10) had significantly lower scores on a competitive listening task than children without OME histories (n=13) OME+ OME- p -6.8 (2.8)-9.7 (2.6).016 How could you quantify importance? How could you quantify importance? What would you predict about precision? What would you predict about precision?

When multiple studies of a question are available, meta-analysis Quantitative summary of effects across a number of studies addressing particular question, usually in the form of a d (effect size) statistic Quantitative summary of effects across a number of studies addressing particular question, usually in the form of a d (effect size) statistic In EBP evidence reviews, the highest quality evidence comes from meta- analysis of studies with strong validity, precision, and importance In EBP evidence reviews, the highest quality evidence comes from meta- analysis of studies with strong validity, precision, and importance

A meta-analysis of OME and speech and language (Casby, 2001) Casby (2001) summarized results of available studies of OME and children’s language. Casby (2001) summarized results of available studies of OME and children’s language. For global language abilities, the effect size for comparing mean language scores from children with and without OME histories was d = For global language abilities, the effect size for comparing mean language scores from children with and without OME histories was d = Interpretation and a graphic representation. Interpretation and a graphic representation.

An informative graphic for meta-analyses Shows d from each study as well as associated 95% CI. Shows d from each study as well as associated 95% CI.

d and 95% CI boundaries for OME and vocabulary comprehension (Casby, 2001) Teele 84 Lous 88 Teele 90 Roberts 91l Roberts 91m Lonigan 92 Black 93 Paradise 00 Upper 95% CI d Lower 5% CI Worse with OME Better with OME Study Overall d =.001

Evidence levels for evaluating quality of treatment studies BestIa Meta-analysis of >1 randomized controlled trial (RCT) trial (RCT) Ib Well-designed randomized controlled study IIa Well-designed controlled study without randomization randomization IIb Well-designed quasi-experimental study III Well-designed non-experimental studies, III Well-designed non-experimental studies, i.e., comparative, correlational, and case i.e., comparative, correlational, and case studies studies Worst IV Expert committee report, consensus conference, clinical experience of conference, clinical experience of respected authorities respected authorities

Summary of Effect Size Cohen’s d can be “equated” to a correlation. Cohen’s d can be “equated” to a correlation. Provides a standardized measure of effect size. Provides a standardized measure of effect size. Best way to discern practical significance from statistical significance is direct quantification of effect size using a CI. Best way to discern practical significance from statistical significance is direct quantification of effect size using a CI. For meta-analysis however, we need a standardized ES in order to combine study results. For meta-analysis however, we need a standardized ES in order to combine study results.