Conducting a Meta-Analysis Jose Ramos Southern Methodist University Paper presented at the annual meeting of the Southwest Educational Research Association,

Slides:



Advertisements
Similar presentations
Statistical vs. Practical Significance
Advertisements

Hypothesis testing 5th - 9th December 2011, Rome.
Comparing One Sample to its Population
Bivariate Analyses.
Statistical Issues in Research Planning and Evaluation
Effect Size and Meta-Analysis
RIMI Workshop: Power Analysis Ronald D. Yockey
One way-ANOVA Analysis of Variance Let’s say we conduct this experiment: effects of alcohol on memory.
Review: What influences confidence intervals?
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
PSY 307 – Statistics for the Behavioral Sciences
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Chapter 3 Analysis of Variance
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Lecture 9: One Way ANOVA Between Subjects
Social Research Methods
Heterogeneity in Hedges. Fixed Effects Borenstein et al., 2009, pp
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
The t Tests Independent Samples.
Chapter 9: Introduction to the t statistic
Relationships Among Variables
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Are the results valid? Was the validity of the included studies appraised?
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Advanced Statistics for Researchers Meta-analysis and Systematic Review Avoiding bias in literature review and calculating effect sizes Dr. Chris Rakes.
Single Sample Inferences
The t Tests Independent Samples. The t Test for Independent Samples Observations in each sample are independent (not from the same population) each other.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
1 Why do we need statistics? A.To confuse students B.To torture students C.To put the fear of the almighty in them D.To ruin their GPA, so that they don’t.
Statistics for the Behavioral Sciences Second Edition Chapter 11: The Independent-Samples t Test iClicker Questions Copyright © 2012 by Worth Publishers.
t(ea) for Two: Test between the Means of Different Groups When you want to know if there is a ‘difference’ between the two groups in the mean Use “t-test”.
Hypothesis Testing Using the Two-Sample t-Test
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Statistical Applications for Meta-Analysis Robert M. Bernard Centre for the Study of Learning and Performance and CanKnow Concordia University December.
Cooperative Learning Statistical Significance and Effect Size By: Jake Eichten and Shorena Dolaberidze.
Testing Hypotheses about Differences among Several Means.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
The Campbell Collaborationwww.campbellcollaboration.org C2 Training: May 9 – 10, 2011 Introduction to meta-analysis.
Effect Size Calculation for Meta-Analysis Robert M. Bernard Centre for the Study of Learning and Performance Concordia University February 24, 2010 February.
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Chapter 10 The t Test for Two Independent Samples
Data Analysis.
Correlation & Regression Analysis
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Chapter 13 Understanding research results: statistical inference.
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Oneway ANOVA comparing 3 or more means. Overall Purpose A Oneway ANOVA is used to compare three or more average scores. A Oneway ANOVA is used to compare.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 9 Introduction to the t Statistic
Logic of Hypothesis Testing
MEASURES OF CENTRAL TENDENCY Central tendency means average performance, while dispersion of a data is how it spreads from a central tendency. He measures.
Dependent-Samples t-Test
Social Research Methods
Calculating Sample Size: Cohen’s Tables and G. Power
What are their purposes? What kinds?
Presentation transcript:

Conducting a Meta-Analysis Jose Ramos Southern Methodist University Paper presented at the annual meeting of the Southwest Educational Research Association, San Antonio, TX, February 2-4, 2011.

Outline Development & Theory Other Methods: Simpson’s Paradox Methodological Considerations Calculating Effect Sizes Combining Effect Sizes Across Studies Analyzing Variance in Effect Sizes Activity

Development & Theory Development of Meta-Analysis  Research synthesists’ methods employed idiosyncratic rules to reach conclusions  Individuality did not allow for a streamlined method of properly synthesizing data (i.e., no standardization)  Narrative approach to synthesizing literature was subjective What is a meta-analysis?  A standardized secondary analysis of primary data results from different studies that share same hypothesis  A quantitative aggregation of findings during a research synthesis  Calculating a standardized effect size for multiple studies

Development & Theory What is an effect size?  It is “how much?”  Called the magnitude of difference or relationship strength (i.e., dependent variable)  The degree to which the phenomenon is present in the population  * Is some specific NON-ZERO value in the population  “The greater this value, the greater the degree to which the phenomenon under study is manifested” (Cooper, 2010, p. 162).  How does this differ from just saying that the results were “statistically significant”?

Development & Theory Why do we need meta-analyses?  Literature expansion in social science research  Allows researchers the ability to statistically combine countless studies to increase power  Allows us to measure “how much” of a relationship exists, rather than just whether a relationship exists  Allows us to account for the variation in results between similar studies based on procedural characteristics of individual studies

Voting Method Voting method was commonly employed for aggregation of studies before the conception of meta-analysis Procedure:  Studies with a dependent variable and a specific independent variable are examined  Studies are dichotomized as either statistically significant or not statistically significant  Classification with higher tally is considered to be the “true” relationship between variables

Voting Method Flaws:  Bias in favor of large-sample studies Why is this a problem?  No weighting of sample size  Tells us nothing about strength of relationship  Does not control for variation between studies

Voting Method Illustration:  Researcher A is conducting a study on the effects of RtI on a group of 1 st graders’ fluency rate.  In A’s study, which has a sample size of n=180, 110 children are given RtI and 70 children are given traditional instruction. After 12 weeks of instruction, children are dichotomized as either “pass” or “fail” on a reading measure.  The improvement rate for the RtI group is.45 vs..43 for the control group.

Voting Method Illustration:  Researcher B conducts the same study at a different site  In B’s study, which has a sample size of n=230, 90 children receive RtI and 140 receive traditional instruction  Again the improvement rate for the RtI group is.67 vs..64 for the control group.  That’s 2-0 for the experimental group!

Aggregation of Raw Data Suppose another researcher aggregates the data from the same studies by summing the raw data instead of employing the voting method Illustration:  Add the number of subjects in both studies that received treatment and control: n=200 received RtI and n=210 received traditional instruction  When dichotomized into “pass” or “fail”, the improvement rate for the treatment group is now ? vs. ? for the control group!  This is known as Simpson’s Paradox

Simpson’s Paradox Occurs when different methods of aggregation are employed on the same studies Leads to opposing conclusions Has nothing to do with statistical significance Can occur in a meta-analysis So, why does this happen?  Unbalanced experimental designs

Methodological Considerations Was an appropriate method used to combine and compare results across studies? Were the variables and relationship of interest defined? Are the studies included address an identical conceptual hypothesis Was an extensive search of the literature done? Where could we look?

Methodological Considerations Determine the statistic of interest to calculate individual study effect sizes:  Is your hypothesis assessing the relationship between a dichotomous and continuous variable? Two continuous variables? Two dichotomous variables?  What do the preponderance of your studies report as an effect size, if any?  Based on this information you will choose one standardized effect size: r, d, or odds-ratio in your meta-analysis

Calculating Effect Sizes d-index:  Appropriate to use when the difference between two means is being compared; a dichotomous and continuous variable  Typically employed in association with t- or F- tests, based on a comparison of two conditions  Expresses the distance between the two group means in relation to their common SD

Calculating Effect Sizes d-index formula:, where:

Calculating Effect Sizes So, if you were to calculate the standardized mean difference in the fluency rate of the following two groups in an RtI study, what would you get as the effect size?  Group 1 (experimental): M 1 = 80, SD 1 =10, n 1 = 250  Group 2 (control): M 2 = 65, SD 2 = 20, n 2 =230  Effect size = ? What if you had three groups?

Calculating Effect Sizes What if the means and SDs aren’t reported and you only have a t-value?  Formula for d-index when only t-value is reported: where: df error = (n 1 +n 2 -2) What if you have the F-value for two means?  Formula for d-index when the F-value of two means is reported:

Calculating Effect Sizes What is the F-value is based on 3 or more means and I want to get the d-index?  You must first consider the dispersion of the means to identify the pattern of variability: minimum, intermediate, or maximum.  Less variability between means will give you a lower F-value.  This will dictate which of three formulas you will use.

Calculating Effect Sizes Formula for minimum variability: Formula for intermediate variability:  Formula for maximum variability:  d = 2f (even number of means)  (odd number of means)

Calculating Effect Sizes r-index  The correlation coefficient tells you about the strength of the relationship between two variables  Most appropriate metric for expressing an effect size when interested in the relationship strength of two continuous variables  Most common in correlational studies  Usually reported when appropriate  EX: relationship between years of schooling and yearly salary

Calculating Effect Sizes What if you only have a t-value?  Formula for r-index when only t-value is reported:,where: df error = (n 1 +n 2 -2) What if you only have an F-value?  In an ANOVA, Pearson’s r is similar to η (eta)  In an ANOVA, Pearson’s r 2 is similar to η 2 (eta-squared)  Recall that η 2 (eta) tells us the variance in the dependent variable that is explained by the independent variables

Calculating effect size Formula to convert F-value to η (i.e., r): Formula to convert F-value to η 2 (i.e., r 2 ):

Calculating Effect Sizes Think back to the previous RtI study on slide 16. The effect size was d = The difference between the control/experimental group. Suppose you want to convert this d-index into an r-index:, where  What do you get?  r = ?  What could this correlation represent? Or vice-versa:

Calculating Effect Sizes Odds-Ratio (OR)  Applicable when both variables are dichotomous  The relationship between two sets of odds  EX: Suppose a study measures the effects of RtI on whether students in two groups (e.g., experimental/control ) “pass” or “fail” a math test.  RtIControl Pass75 (a)40 (b) Fail5 (c)25 (d)

Calculating Effect Sizes Of n=80 in RtI, the ratio of passing is 15 to 1. Of n =65 in control, the ratio of passing is 1.6 to 1. Calculate the odds ratio: OR = ad/bc = ?

Combining Effect Sizes Once individual study effect sizes have been calculated, the next step involves combining them to provide an average effect size. You must weight the individual effect sizes.  What do you base this weight on?

Combining Effect Sizes Suppose you have 7 d-indexes and group ns that compares the effect of homework vs. no homework on a measure of academic achievement:

Combining Effect Sizes Step One: Weighting Formula: EX: Study 1 w i = 2( ) 259* 265/2( ) * 265*.02 2 =

Combining Effect Sizes Calculations:

Combining Effect Sizes Step Two: Multiply each weighted effect size and original d-index Formula: d i w i EX: What is the answer for Study 1?

Combining Effect Sizes Calculations:

Combining Effect Sizes Step Three: Divide the sum of these products by the sum of the weights. Formula: EX: d. = 62.56/ = (average ES)

Combining Effect Sizes Step Four: Computing Confidence Intervals Formula: EX:  Thus, we expect 95% of estimates of this effect to fall between.031 and.199. Do we reject the null?

Combining Effect Sizes Suppose that you have 6 r-indexes and ns that show the relationship between the amount students spend on homework and their score on an achievement test. Step One: Transform the r-indexes into a z-scores because as r gets larger the distribution gets more skewed. Formula:

Combining Effect Sizes Step Two: Weighting Formula: n i - 3 EX: Study 1 3,505-3 = 3,502

Combining Effect Sizes Calculations:

Combining Effect Sizes Step Three: multiply the weight and the effect size (i.e., z-score) Formula: (n i – 3) z i EX: Study 1 (3,502).06 =

Combining Effect Sizes Calculations:

Combining Effect Sizes Step Four: Divide the sum of these products by the sum of the weights. Formula: EX: z. = /26,372 = (average ES)

Combining Effect Sizes Step Five: Computing Confidence Intervals Formula: EX: CI z95% =.207 ± 1.96/ √26,372 =.207 ±.012  Thus, we expect 95% of estimates of this effect to fall between.195 and.219. Do we reject the null?

Analyzing Variance in ES Fixed-effect model vs. random-effect model when accounting for error (i.e., variation) between ES Random-effect model  Two sources of variability in outcomes of similar studies  sampling of participants will always vary from true population  Differences in how studies are conducted  Remember, you sample studies for your meta-analysis from a population of all possible studies  EX: studies on academic achievement based on the level of homework a student does are conducted with students at different grade levels, with different types of tests, in classrooms with different subject matter

Analyzing Variance in ES Homogeneity Analyses  Compares the observed variance of the effect sizes to that expected from sampling error  calculates the probability that the variance exhibited by the effect sizes would be observed if only sampling error was making them different  So the question is “Is the observed variance in effect sizes statistically significantly different from that expected by sampling error alone?” (Cooper, 2010, p. 185)  If it is, you search for moderator variables

Activity Suppose your question is: “What is the effect of RtI on the vocabulary development of ELLS?” and you find the following studies: Study 1: Data on the effects of RtI on the vocabulary development of 3 rd grade ELLs:  Study 1  n 1 = 120 (RtI); n 2 = 115 (control)  M 1 = 65; M 2 = 50  SD 1 = 15; SD 2 = 30  ES = ?

Activity Study 2: The relationship between the length of time 2 nd grade ELLs students spend in RtI and the number of vocabulary words they learn:  Study 2  r =.30  n 1 = 100; n 2 = 110  ES = ? Study 3: A t-value on the difference between an RtI group (experimental) vs. a control group in 4 th grade bilingual classroom:  Study 3  t-value: 3.2  n 1 = 80; n 2 = 85  ES = ?

Activity Study 4: The standardized mean difference of two 4 th grade ELL groups on vocabulary development, one in a RtI treatment and one in a control group:  Study 4  d =.55  n 1 = 65; n 2 = 60  ES = ? Study 5: Data on the effects of RtI on the vocabulary development of 7 th grader ELLs:  Study 5  n 1 = 150 (RtI); n 2 = 140 (control)  M 1 = 25; M 2 = 18  SD 1 = 5; SD 2 = 10  ES = ?

Activity Step 1: Decide which standardized effect size is most appropriate to calculate an average effect size. Step 2: Standardize effect sizes Step 3: Calculate average effect size Step 4: Calculate 95% Confidence intervals Step 5: Do we reject the null?

References Blyth, C. R. (1972). On Simpson’s paradox and the sure-thing principle. Journal of the American Statistical Association, 67, Cohen, J. (1988). Statistical power analysis for the behavioral sciences. (2 nd Ed.). New Jersey: LEA. Cooper, H. (2010). Research synthesis and meta-analysis: A step-by- step approach. (4 th Ed.). New York: SAGE. Cooper, H., Hedges, L. V., & Valentine, J.C. (1994). The handbook of research synthesis and meta-analysis. (2 nd Ed.). New York: SAGE. Glass, G. V (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3-8. Glass, G.V (1977). Integrating findings: The meta-analysis of research. Review of Research in Education, 5,

Answer Studyn1n1 n2n2 didi wiwi diwidiwi Σ