1 Who’s Afraid of External Validity? Nancy Cartwright Error/Evidence Project Error Conference June 2010.

Slides:



Advertisements
Similar presentations
Reason and Argument Induction (Part of Ch. 9 and part of Ch. 10)
Advertisements

Ron Bass, J.D., AICP, Senior Regulatory Specialist Jones & Stokes Common NEPA Mistakes and How to Avoid Them January 17, 2008 Oregon Department of Transportation.
Theories of Knowledge Knowledge is Justified-True-Belief Person, S, knows a proposition, y, iff: Y is true; S believes y; Y is justified for S. (Note:
External validity: to what populations do our study results apply?
Chapter 7 - Evidence. Evidence Evidence – the support for a claim that the arguer discovers from experience or outside authority: examples, statistics.
A2 Ethics How to assess arguments and theories. Aims  To discuss various methods of assessing arguments and theories  To apply these methods to some.
Authority 2. HW 8: AGAIN HW 8 I wanted to bring up a couple of issues from grading HW 8. Even people who got problem #1 exactly right didn’t think about.
Dr Ian Abrahams Combining randomised control trials with qualitative research approaches: The best of both worlds York
Social Research Methods
A systematic review of interventions for children with cerebral palsy: state of the evidence Rohini R Rattihalli
Reading the Dental Literature
Chance, bias and confounding
Correlation AND EXPERIMENTAL DESIGN
What’s EPB? What’s so good about RCTs? (And what are we reading?) Philosophy of Social Science Phil 152 Winter 2011 Week 8.
Howard White Theory Based Evaluation Impact Evaluation Howard White International Initiative for Impact Evaluation.
2.6 The Question of Causation. The goal in many studies is to establish a causal link between a change in the explanatory variable and a change in the.
Evaluation.
Statistics Micro Mini Threats to Your Experiment!
Writing tips Based on Michael Kremer’s “Checklist”,
Writing Good Software Engineering Research Papers A Paper by Mary Shaw In Proceedings of the 25th International Conference on Software Engineering (ICSE),
EXPERIMENTS AND OBSERVATIONAL STUDIES Chance Hofmann and Nick Quigley
Research problem, Purpose, question
Using Statistics in Research Psych 231: Research Methods in Psychology.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Introduction to the Design of Experiments
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
The Best Start In Life For Every Child By Eric-Alain ATEGBO UNICEF Niger University of Pennsylvania July 2012 The Best Start In Life For Every Child By.
INF Lecture 12th of January Silverman on quality Aim of this course is to instill in you a ”methodological awareness” Question: Should we.
Regulations Relating to Foodstuffs for Infants and Young Children (Foodstuffs, Cosmetics and Disinfectants Act, 1972) Briefing to the Portfolio Committee.
Hypothesis Testing II The Two-Sample Case.
Moving from Development to Efficacy & Intervention Fidelity Topics National Center for Special Education Research Grantee Meeting: June 28, 2010.
The problem of sampling error It is often the case—especially when making point predictions—that what we observe differs from what our theory predicts.
Effect of Timed and Targeted Counseling in Changing IYCF Practices in Ethiopia Mesfin Beyero MD MPH 1, Kathryn Reider MS 2, Yared Mekonnen PhD 3 1 World.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
EVIDENCE BASED MEDICINE Health economics Ross Lawrenson.
35 Evaluating Economic Development & Interfaith Peacebuilding: Challenges in Attribution.
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
LT 4.2 Designing Experiments Thanks to James Jaszczak, American Nicaraguan School.
Health & Education Elise Wood Contemporary Health 1 Monday 7:25-9:55pm.
Systematic reviews to support public policy: An overview Jeff Valentine University of Louisville AfrEA – NONIE – 3ie Cairo.
Howard White International Initiative for Impact Evaluation (3ie)
Testing Theories: The Problem of Sampling Error. The problem of sampling error It is often the case—especially when making point predictions—that what.
Methodology Matters: Doing Research in the Behavioral and Social Sciences ICS 205 Ha Nguyen Chad Ata.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Approaching the Research Proposal Before you can start writing your proposal you need to clarify exactly what you will be doing, why, and how. This is.
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Introduction to Scientific Research. Science Vs. Belief Belief is knowing something without needing evidence. Eg. The Jewish, Islamic and Christian belief.
Finding shared laws: Ladders Philosophy 152 Philosophy of Social Science Week 10 Winter 2011.
SCIENCE The aim of this tutorial is to help you learn to identify and evaluate scientific methods and assumptions.
Anne Matthews, Health & Society, School of Nursing and Human Sciences, DCU The paradox of ‘low quality evidence; strong recommendation’: An analysis of.
Finishing up: Statistics & Developmental designs Psych 231: Research Methods in Psychology.
What evidence can help practice decisions about what works and what doesn’t? Elizabeth Waters Chair in Public Health School of Health and Social Development,
The Scientific Method ©2013 Robert Chuckrow. The Purpose of Science The Purpose of Science is to explain, predict, and establish cause and effect (e.g.,
Cautions About Correlation and Regression Section 4.2.
Florence M. Turyashemererwa Lecturer- Makerere University
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Warsaw Summer School 2014, OSU Study Abroad Program Sampling Distribution.
Evidence-Based Mental Health PSYC 377. Structure of the Presentation 1. Describe EBP issues 2. Categorize EBP issues 3. Assess the quality of ‘evidence’
Scientific Method Vocabulary Observation Hypothesis Prediction Experiment Variable Experimental group Control group Data Correlation Statistics Mean Distribution.
Welcome to the Unit 5 Seminar Kristin Webster
Approaches to social research Lerum
Section Testing a Proportion
Evidence-based Medicine
Chapter 13- Experiments and Observational Studies
If You Aren’t Dong Arguments, You Aren’t Doing Evidence
How EBM brings the connection between evidence and measurement into focus. Benjamin Smart.
Hypothesis Testing A hypothesis is a claim or statement about the value of either a single population parameter or about the values of several population.
Professor Deborah Baker
Chapter 4 Summary.
Presentation transcript:

1 Who’s Afraid of External Validity? Nancy Cartwright Error/Evidence Project Error Conference June 2010

2 Topic: The long road from it-works-somewhere to it-will work for us

3 Context Evidence-based medicine (EBM) and evidence- based public health policy and practice (EBPHPP). We spend a lot of effort and money buying well- conducted RCTs in the belief they will be relevant as evidence (E) to policy prediction (H). The RCT structure can (in the ideal) ensure the RCT conclusion is highly credible (P(E) is high). What shows that this matters to H? So: When are we entitled to treat an RCT as evidentially relevant to our policy prediction?

4 Topic: RCTs, evidential relevance and policy prediction External validity: the conclusion drawn from the study holds in the target population --  If the two populations are ‘sufficiently similar’ or  If the two share certain abstract sets of conditions on causal and probabilistic structure. Evidential relevance: the conclusion drawn from the study is evidentially relevant to a policy prediction in the target population --  Under conditions that can be discovered by good searches Horizontally and Vertically.

5 Policy prediction: Two kinds of causal claims 1. It-worked-somewhere claims: T caused O somewhere under some conditions (e.g. in study population Φ administered by method M).  Ideal RCTs can clinch these kinds of claims. 2. It-will-work-for-(at-least-some-of)-us claims: T will cause O in some units of our population administered as it will be administered given policy P.  RCTs can be evidentially relevant, conditional on theoretical evidence and empirical evidence that horizontal and vertical search results are correct. NB: I pick a weak policy conclusion to minimize the relevance requirements.

6 Two slogans assumed throughout Achinstein: Evidential relevance = explanatory relevance.  This works well for evidence for policy predictions.  The explanations are the ones that would hold were the prediction true. Cartwright: It ain’t evidence without evidence it’s evidence.  For evidence-based policy you can’t just say it’s evidence; you have to produce reasons, and good ones.

7 Explanatory relevance E is explanatorily relevant to H iff  Direct explanation: E explains H, H explains E. Or  Indirect explanatory relevance: Correct explanations for E and H have some feature in common. This is an objective relation. Subjective: Your entitlement to take E as evidence in favour of H depends on the good reasons you have to believe E is explanatorily relevant to H.

8 Indirect explanatory relevance E: RCT conclusion for φ H: Policy prediction for θ U(E): Unshared elements of explanation in φ U(H): Unshared elements of explanation in θ X(H,E): Shared element of explanations in φ, θ

9 Flow of evidential support E: RCT conclusion for φ H: Policy prediction for θ U(E): Unshared elements of explanation in φ U(H): Unshared elements of explanation in θ X(H,E): Shared element of explanations in φ, θ

10 First lesson The relevance of E to H flows through the common explanatory element: X(E,H). This route is not available unless the unshared element of the explanation for H (U(H)) obtains. SO:  E is evidentially relevant to H only conditional on the unshared element U(H) obtaining.  You are entitled to treat E as relevant evidence for H only to the extent that you have good reasons to support that U(H) obtains. Advice on how to find U(H)…

11 Pies, pancakes and unshared elements Causes are INUS conditions: Insufficient Nonredundant parts of Unnecessary but Sufficient conditions (taking ‘sufficiency’ in Anscombe’s sense of ‘enough’ and contributions not total effects as outcomes) A pie: a set of causes that are jointly enough for a contribution to the effect:

12 Smoking – C3 – causes lung cancer. But not all smokers develop lung cancer. Genetic and environmental factors contribute. Sufficient Cause A is a ‘pie’ of slices, including smoking, that together cause lung cancer. And people develop lung cancer without smoking. Sufficient Cause B is a ‘pie’ of factors, not including smoking (no C3), that together cause lung cancer. Working in a coal mine is C8. Sufficient Cause A. Sufficient Cause B.

13 The treatment variable T is never enough by itself to promote O – it is only a slice of a pie. It won’t work without the other slices. And it won’t work if there are any pies of sufficient strength operating to prevent O.

14 So, the least that’s needed to ensure that T produces O is a set that includes:  All the helping factors in a pie for T.  The negation of at least one factor from every pie in any combination of pies strong enough to prevent O. Call such a set N: N is a set of necessary complementary factors sufficient for T to produce O. Thus the principle: T&N c  O

15 Flow of evidential support RCT conclusion: Pφ(O/T)>Pφ(O/-T) Policy prediction: T helps some units in θ N holds in φ N holds in a subset of θ T&N c  O

16 Conclusion 1 The unshared explanatory element is N – a set of complementary factors sufficient to ensure T produces O. Happily, N is smaller than a full set of confounders. But if N does not obtain, the RCT is not evidentially relevant to policy prediction. And without good reason to think N holds in your population, you do not have good reason to count the RCT as evidence.

17 Pancakes: baking powder, RCTs and evidential relevance Given flour, milk and eggs – baking powder can turn the whole mix into pancakes. + =

18 But you can’t make pancakes without flour, eggs and milk no matter how much baking powder you pour in the bowl. + ≠ RCTs can’t make evidence without support for the necessary complementary factors.

19 Horizontal v vertical search Horizontal searches look for complementary factors. Vertical searches survey levels of abstraction, usually to find shared explanatory principles.  Since explanations involving features of various levels of abstraction can all be true at once. Ex: straight lines, great circles and Euclidean lines. On a sphere an object subject to no forces traverses a great circle; on a sphere an object subject to no forces traverses a straight line. On a sphere traversing a great circle is what it is to traverse a straight line.

20 What counts as a straight line?

21 Cautions Citing the wrong level of factor in an RCT conclusion can limit the scope of its evidential relevance. Worse:  You may correctly surmise that populations φ and θ share a common causal structure – same causal principles.  But incorrectly identify the level of abstraction of the features in those principles.  Then you make bad predictions from taking results in φ as evidence for predictions in θ.

22 EX: The Bangladesh Integrated Nutrition Program (BINP) BINP provided pregnant women with nutritional counseling. This had ‘worked’ elsewhere. But analysis by the World Bank’s Operations Evaluation Department found no significant impact on infants’ nutritional status in the Bangladesh study population. This despite reasonable horizontal search: knowledge alone is not enough; resources are needed too. Hence a supplemental feeding programme.

23 Failure of evidential relevance Supposition: Explanations for the results elsewhere and for Bangladesh success had it occurred would share a common principle: Better nutritional knowledge in mothers plus supplemental feeding improves the nutritional status of their children. But the two populations did not share this principle.

24 Failure of vertical search Two problems:  Food leakage  Men and mothers-in-law. Howard White : The program targeted the mothers of young children. But mothers are frequently not the decision makers, and rarely the sole decision makers, with respect to the health and nutrition of their children. For a start, women do not go to market in rural Bangladesh; it is men who do the shopping. And for women in joint households – meaning they live with their mother-in-law – as a sizeable minority do, then the mother-in-law heads the women’s domain. Indeed, project participation rates are significantly lower for women living with their mother-in-law in more conservative parts of the country.

25 Genuinely shared principle Better nutritional knowledge results in better nutrition for a child in those who 1. have sufficient resources to use that knowledge to improve the child’s nutrition, 2. control what food is procured with those resources, 3. control how food gets dispensed, and 4. hold the child’s interests as central in performing 2. and 3.

26 In the Bangladesh programme  Supplementary food did not generally constitute sufficient resources.  The feature ‘being a mother’ did not in general constitute the more abstract features in 2.&3. There is a shared principle at a higher level of abstraction but the horizontal complementary factors required by that principle are lacking. Earlier study results are not evidentially relevant without these requisite complements. Evidential relevance requires  correct horizontal factors,  correct vertical factors and  a correct match between the two.

27 Flow of evidential support RCT conclusion: Pφ(O/T)>Pφ(O/-T) Policy prediction: T helps some units in θ N holds in φ N holds in a subset of θ T&N c  O

28 Epistemic demands EBPHPP Cartwright dictum: E isn’t evidence for you unless you have evidence that it’s evidence. To stop the regress --  Cochrane and Campbell reviews are reasonable stopping points for policy analysts in judging quality/credibility of E (P(E) is high). But who polices the evidential relevance of E?

29 Whence the evidence for relevance? You can’t argue you’ve got the right vertical level and the right horizontal factors without invoking lots of theory and lots of other empirical results. You may not need these to argue for quality, but you can’t avoid them in defending relevance. And without relevance what’s the point?

30 What I have done 1. Reminded you that quality/credibility is not by a long shot all that matters about evidence. 2. Replaced the focus from external validity to evidential relevance. 3. Equated evidential and explanatory relevance. 4. Insisted that it’s not evidence without evidence it’s evidence. 5. Used INUS pies to characterize a set of minimal conditions that must be satisfied to turn RCT results into evidence. 6. Stressed the importance of the abstract and the concrete. This results in….

31 Bad news To be justified in taking a result as evidence requires a great deal of theory, local knowledge and empirical results. These go far beyond the trial methodology that we have mastered so well. Without these taking trial results as evidence is pure guesswork.

32 Good news The focus on evidential and explanatory relevance provides a structure within which to organize what’s needed:  The theory of causes as INUS conditions provides an abstract characterization of what you need from a horizontal search.  And warnings about the relations of the abstract and the concrete show the importance of vertical search.

33 Good news You don’t have to know everything. You have a catalogue for search.