Presentation is loading. Please wait.

Presentation is loading. Please wait.

Narrowing the evaluation gap

Similar presentations


Presentation on theme: "Narrowing the evaluation gap"— Presentation transcript:

1 Narrowing the evaluation gap
Session 4 Eight challenges facing the quantitative social sciences in being relevant for public policy John Jerrim 2018 EEF Evaluators’ Conference Narrowing the evaluation gap #EEFeval18 @EducEndowFound

2 Eight challenges facing the quantitative social sciences in being relevant for public policy……

3 1. The prevalence of overly-complicated methods…. 2
1. The prevalence of overly-complicated methods….. 2. How slow everything is…… 3. The problem of peer review….. 4. Lack of real quality assurance….. 5. Publication bias……. 6. Open access data vs pre-registration of studies…… 7. Over-reliance upon hypothesis-testing / statistical inference….. 8. Why does the EEF toolkit and RCTs look so different?

4 1. The prevalence of overly-complicated methods
Beauty of RCTs = Just compare mean scores across groups. Other methods are getting increasingly complicated…… Frankly, many (most?) researchers using many of these methods don’t really understand them! Example – structural equation modelling (SEM). - Now widely used in sociology/psychology……. - Actually, can be quite a complex method…... - Many people don’t really understand the numbers being produced (fit stats!) Very hard to communicate the methods/results as things get complex. - Hard to review/quality assure. Not transparent. - Hard to communicate!

5 Example of overly complicated social science results………
Saunders (2010: Figure 1). Social Mobility Myths.

6 2. How slow everything is…..
If I come up with a policy-relevant research idea, here is what I have to do: Month 0 = Come up with idea. Month 3 = Research proposal written up and submitted. Month 9 = Grant decision made. Month 10 = Start work. Month 13 = Complete first draft of paper. Submit to journal. Month 17 = Decision from journal. Revise and resubmit. Month 19 = Submit back to journal. Month 21 = Decision from journal. Accept. Month 22 = Proofs from journal. Month 24 = Publication. I have taken almost half of a parliamentary to go from my initial idea to getting evidence out there…… ….but I could get this out in around 3 months if I just get on and do it!

7 3. The problem of peer-review…..
A large chunk of this time-lag is due to peer-review. - Grant review process (6 months) - Journal review process (6 months) Would be ok if peer-review in academic worked well. It doesn’t. Lots of bad practise: - ESRC does not blind reviewers to the applicant. - ESRC sent me a peer review to do – of my own PhD student! - ‘Special issues’ = You become the editor and give your mates an easy ride. - Journal sent me a paper to review of my co-author. - Reviews are very subjective. - Not at all transparent… If you fail to get accepted, just publish paper elsewhere. - It will get published somewhere eventually!

8 4. Lack of real quality assurance procedures.
Peer-review of papers in academic journals a very low bar. - If at first you don’t succeed, try try again! More prestigious journals = higher quality articles? - Possible, but debatable! - Many poor papers still get into top journals. - Policymakers don’t know prestigious journals from any other! - Only way to judge the quality is to read it yourself! No-one actually checks people’s workings….. - No one checks the code (or typically even asks) for errors…… - But errors happen! We are all human. Many journals still do not require code to be published….. - Don’t have to make freely available how you reached your conclusion….

9 What can we do it change/improve/instead of peer-review?
Improve transparency of reviews (BMJ approach) - Publish reviewers comments and author responses. - Publish all iterations of papers (first submission through to final article). - Mandatory open publication of code Get rid of journals entirely and publish everything online in working paper series? - Happens anyway! - Lots cheaper. - Why do we really need academic journals anyway? Fund people rather than specific projects? - E.g. Fund people for renewable 5-year periods. - Let people get on with things. - Stop wasting time on funding applications rather than actual research.

10 5. Publication bias….. More severe issue in social than medical sciences? Think about RCTs…….. - Researchers already very heavily invested by the time results come. - Writing up relatively small piece of marginal effort. - High “sunk costs”. Think about social science research using survey data. - Very quick to do rough estimations/analysis….. - Get idea of answer within a few days…. - Very low sunk cost…. - Not a lot lost from not writing up! A lot of zero findings in the social sciences will not be written up!

11 6. Is pre-registration/protocols a solution?
Great idea when doing primary data collection (EEF trials)….. …but most quantitative social science still based upon secondary data (e.g. birth cohorts) Such resources are (quite rightly) open access….. But trade off between open access and pre-registration! Pre-registration/protocols not a viable option for most QSS research! - What else can we do to make sure zero/small results are written up?

12 7. What on earth is a confidence interval?
1. The probability that the true mean is greater than 0 is at least 95 %. 2. The probabilitythatthetruemeanequals0issmallerthan 5 %. 3. The “null hypothesis” that the true mean equals 0 is likely to be incorrect. 4. There is a 95% probability that the true mean lies between 0.1 and 0.4. 5. Wecanbe95% confident that the true mean lies between 0.1 and 0.4. 6. If we were to repeat the experiment over and over, then 95 % of the time the true mean falls between 0.1 and 0.4 In groups….. Which of these statements are true, and which are false!?

13 7. What on earth is a confidence interval?
Professor Gorard conducts an experiment and reports: “The 95% confidence interval for the mean ranges from 0.1 to 0.4” In groups, decide which of the following statements are true and which are false? 1. The probability that the true mean is greater than 0 is at least 95%. 2. The probability that the true mean equals 0 is smaller than 5%. 3. The “null hypothesis” that the true mean equals 0 is likely to be incorrect. 4. There is a 95% probability that the true mean lies between 0.1 and 0.4. 5. We can be 95% confident that the true mean lies between 0.1 and 0.4. 6. If we were to repeat the experiment over and over, then 95 % of the time the true mean falls between 0.1 and 0.4

14 How did a (convenience) sample of psychology students and researchers respond?

15 7. Over-reliance upon statistical inference
P-values / confidence intervals / statistical significance are overused! What is a 95% confidence interval? If you were to repeat the same random sampling process 20 times, on 19 occasions your estimate of the true population parameter would fall between the upper and lower bound It gives you an indication of the uncertainty of the ‘true’ value in the population based upon the sampling procedure…… …It tells you nothing about importance, magnitude, policy relevance …Nor does it tell you about any other kind of uncertainty (e.g. missing data) When is it important to report such things? When we have truly random samples from a well-defined population (e.g. PISA)…… …BUT even then it should be secondary to estimates of magnitude (e.g. effect size)

16 Should p-values be used to decide what is a ‘promising project’?
Texting-parents intervention included within the “promising project” group…….. Results below…… Discussion point Should p-values be used to decide what is a ‘promising project’? Based upon the results above, do people think this is a promising project? What criteria should EEF use to define a promising project?

17 8. Why does the EEF toolkit and RCTs look so different?
Lot of effect sizes of 0.4 in the toolkit….. ….likewise in well-known studies (e.g. Hattie) BUT the average EEF RCT effect size = 0.05ish….. Discussion points 1. Why do people think there are such big differences? 2. Should the EEF update the toolkit to reflect this difference? If so, how!?


Download ppt "Narrowing the evaluation gap"

Similar presentations


Ads by Google