Download presentation
Presentation is loading. Please wait.
Published byRosanna Charles Modified over 6 years ago
1
CS 594: Empirical Methods in HCC Introduction to Meta Analysis
Dr. Debaleena Chattopadhyay Department of Computer Science debaleena.com hci.cs.uic.edu
2
Agenda What is a meta-analysis? Why study meta-analysis in HCI?
When to do a meta-analysis? How to do a meta-analysis?
3
Reality check Consider the following results:
Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p < Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01. How would your interpretation about the results differ in each case? (about which menu is more efficient)
4
Reality check Consider the following results:
Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p < Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01. All is interpreted the same. That there is a high probability that Flow menu and Marking menu will be significantly faster than Finger Count menu when used by a participant randomly chosen from the population. The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested. P is also described in terms of rejecting H0 when it is actually true, however, it is not a direct probability of this state. The null hypothesis is usually an hypothesis of "no difference" e.g. no difference between blood pressures in group A and group B. Define a null hypothesis for each study question clearly before the start of your study.
5
p-value does NOT measure “how much”
Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p < Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01. 1 would NOT mean that Flow menu is more efficient than Marking menu compared with Finger Count menu .
6
Beyond p-value Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p <. 001 with a high effect size d = 0.8. Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05 with a low effect size d = 0.2. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01 with a low effect size d = 0.3. How would your interpretation about the results differ in each case? (about which menu is more efficient)
7
Beyond p-value Planned contrasts revealed that Flow menu (3466 ms) took significantly more time than the Finger Count menu (3095 ms), p <. 001 with a high effect size d = 0.8. Planned contrasts revealed that Marking menu (3646 ms) took significantly more time than the Finger Count menu (3395 ms), p <. 05 with a low effect size d = 0.2. Planned contrasts revealed that both Flow menu (3466 ms) and Marking menu (3646 ms) took significantly more time than the Finger Count menu (3335 ms), p <. 01 with a low effect size d = 0.3. 1 would mean that Flow menu is more efficient than Marking menu compared with Finger Count menu. Or more formally, the differences between FM and FCM is more or stronger than the differences between MM and FCM. 2 would not.
8
What is a meta-analysis?
9
What is a meta-analysis?
Meta-analysis refers to the statistical synthesis of results from a series of studies. While the statistical procedures used in a meta-analysis can be applied to any set of data, the synthesis will be meaningful only if the studies have been collected systematically. If the effect size is consistent across the series of studies, these procedures enable us to report that the effect is robust across the kinds of populations sampled, and also to estimate the magnitude of the effect more precisely than we could with any of the studies alone. Meta-analyses are conducted to synthesize evidence on the effects of interventions and to support evidence-based policy or practice.
10
Narrative reviews to systematic reviews to meta-analysis
Prior to the 1990s, the task of combining data from multiple studies had been primarily the purview of the narrative review. An expert in a given field would read the studies that addressed a question, summarize the findings, and then arrive at a conclusion. One limitation is the subjectivity inherent in this approach, coupled with the lack of transparency. A second limitation of narrative reviews is that they become less useful as more information becomes available. Beginning in the mid 1980s and taking root in the 1990s, researchers in many fields have been moving away from the narrative review, and adopting systematic reviews and meta-analysis. For systematic reviews, a clear set of rules is used to search for studies, and then to determine which studies will be included in or excluded from the analysis. Not all systematic reviews are meta-analysis.
11
Narrative reviews to systematic reviews to meta-analysis
Unlike the narrative review, where reviewers implicitly assign some level of importance to each study, in meta-analysis the weights assigned to each study are based on mathematical criteria that are specified in advance. While the reviewers and readers may still differ on the substantive meaning of the results (as they might for a primary study), the statistical analysis provides a transparent, objective, and replicable framework for this discussion. Met-analysis is commonly used in medicine, pharmaceutical studies, education, psychology, criminology, and business. For example, In the field of education, meta-analysis has been applied to topics as diverse as the comparison of distance education with traditional classroom learning. HCI is catching up…
12
What is a meta-analysis?
“A statistical analysis which combines the results of several independent studies considered by the analyst to be ‘combinable’” --- Huque, 1988 “The statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings” --- Glass, 1976
13
Meta-Analysis Presentation: Forest Plot
14
Meta-Analysis Presentation: Forest Plot
How do we choose these studies? How do we calculate the weight? How do we calculate the summary estimate?
15
Why study meta-analysis?
What is the direction of effect? Antman et al. (1992). JAMA, 268: 240–248. What is the size of effect? Is the effect consistent across studies? What is the strength of evidence for the effect?
16
Why study meta-analysis?
Antman et al. (1992). JAMA, 268: 240–248.
17
Why study meta-analysis in HCI?
Although HCI or HCC is not equivalent to medicine, i.e., we do not deal with life and death on the daily basis, computing technologies are becoming more integral to daily life than before. Consider the autonomous car, the wearables to supplement drug therapy, or immersive technologies for improving educational outcomes. Evidence based adoption will become more important than ever before in human-centered computing—very soon.
18
When to do a meta-analysis?
When you want to know: strength of evidence combine results quantitatively When more than one study has estimated an effect When the differences in the study characteristics are unlikely to affect the intervention effect When the treatment effect have been measured and reported in similar ways (or when the data are available)
19
When not to do a meta-analysis?
A meta-analysis is only as good as the studies in it Beware of reporting biases Studies must address the same question. Though the question can, and usually must, be broader “Mixing apples with oranges” Not useful for learning about apples or oranges, although useful for learning about fruit! An analysis of all your prior studies *only* is not a meta-analysis.
20
How to do a meta-analysis?
Overview of steps Frame your research question Develop search protocol Run search strategy Retrieve and de-duplicate citations Screen titles/abstracts Conduct qualitative synthesis Conduct meta-analyses Write report of systematic review
21
How to do a meta-analysis?
Broadly speaking Identify research question Search and identify a set of studies Conduct qualitative synthesis Conduct quantitative meta-analyses
22
Identify research question
Conceptualize Operationalize RQ
23
Identify research question
Conceptualize Operationalize RQ Type question Example Prevalence What is incidence of autonomous vehicle use by older adults in urban areas compared to rural areas? Intervention Are fitness trackers effective in reducing obesity? Diagnosis Are avatar based screening tests effective in detecting social anxiety disorder? Etiology Is social media uses causally associated with teen depression? “far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. “
24
Search and identify a set of studies
Ways similar studies can differ: User population Intervention composition/ timing Outcome definition Experimental design and execution Analysis Components of well-constructed questions: Population Intervention Comparison group(s) Outcome Time Settings E.g., For preschool children with mild to moderate visual acuity impairment, are glasses or spectacles and patching effective in improving visual acuity compared with glasses alone or no treatment?
25
Search and identify a set of studies
26
Search and identify a set of studies
27
Conduct qualitative synthesis
Document your protocol
28
Conduct quantitative meta-analyses
First, we work with effect sizes (not p-values) to determine whether or not the effect size is consistent across studies. The terms treatment effects and effect sizes are used in different ways by different people. You will need to know the effect size and the precision of the observed effect size.
29
How to choose an effect size?
The effect sizes from the different studies should be comparable to one another in the sense that they measure (at least approximately) the same thing. Estimates of the effect size should be computable from the information that is likely to be reported in published research reports. That is, it should not require the re-analysis of the raw data (unless these are known to be available). The effect size should have good technical properties. For example, its sampling distribution should be known so that variances and confidence intervals can be computed.
30
What parameters to use? If the summary data reported by the primary study are based on means and standard deviations in two groups, the appropriate effect size will usually be either the raw difference in means, the standardized difference in means, or the response ratio. If the summary data are based on a binary outcome such as events and non-events in two groups the appropriate effect size will usually be the risk ratio, the odds ratio, or the risk difference. If the primary study reports a correlation between two variables, then the correlation coefficient itself may serve as the effect size.
31
Effect sizes based on Means
Use Cohen’s d or Hedge’s g; The standardized mean difference (d or g) transforms all effect sizes to a common metric Factors affecting precision of the effect size: Sample size Study design Studies that yield more precise estimates of the effect size carry more information and are assigned more weight in the meta-analysis.
32
Fixed effect vs. Random Effects
A study’s true effect size is the effect size in the underlying population, and is the effect size that we would observe if the study had an infinitely large sample size (and therefore no sampling error). A study’s observed effect size is the effect size that is actually observed. Under the fixed-effect model we assume that there is one true effect size (hence the term fixed effect) which underlies all the studies in the analysis, and that all differences in observed effects are due to sampling error. Under the random-effects model we allow that the true effect could vary from study to study. For example, the effect size might be higher (or lower) in studies where the participants are older, or more educated, or healthier than in others, or when amore intensive variant of an intervention is used, and so on. The effect sizes in the studies that actually were performed are assumed to represent a random sample of these effect sizes (hence the term random effects).
33
Fixed effect vs. Random Effects
Keep in mind that the summary effect is nothing more than the mean of the effect sizes, with more weight assigned to the more precise studies. A study’s true effect size is the effect size in the underlying population, and is the effect size that we would observe if the study had an infinitely large sample size (and therefore no sampling error). A study’s observed effect size is the effect size that is actually observed.
34
Fixed effect vs. Random Effects
under the random-effects model the goal is not to estimate one true effect, but to estimate the mean of a distribution of effects. If the number of studies is very small, then theT2 estimate of the between-studies variance (T2) will have poor precision. A fixed-effect meta-analysis estimates a single effect that is assumed to be common to every study, while a random-effects meta-analysis estimates the mean of a distribution of effects. Some have adopted the practice of starting with a fixed-effect model and then switching to a random-effects model if the test of homogeneity is statistically Significant (test of the null hypothesis that the between studies variance is zero.). However, even if the between-studies variance does not meet the criterion for statistical significance (which may be due simply to low power) we should still take account of this variance when assigning weights.
35
Measuring the weight for each study (RE)
36
Measuring the weight for each study (RE)
tau-squared is the between-studies variance the observed effect Yi
37
Measuring the weight for each study (RE)
38
Measuring the summary effect
39
Measuring the summary effect
40
Meta-Analysis Presentation: Forest Plot
How do we choose these studies? How do we calculate the weight? How do we calculate the summary estimate?
41
More to read… Introduction to Systematic Review and Meta-Analysis
Introduction to Systematic Review and Meta-Analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.