Improving Openness and Reproducibility of Scientific Research Brian Nosek University of Virginia -- Center for Open Science http://briannosek.com/ -- http://cos.io/ My general substantive interest in the gap between values and practices. The work that I am discussing today is a practical application of this interest to the gap between scientific values and practices. In particular, how can I best advance knowledge and my career at the same time? Challenges I face when working to advance scientific knowledge and my career at the same time. And, how my scientific practices can be adapted to meet my scientific values.
Norms Counternorms Communality Secrecy Open sharing Closed Communality – open sharing with colleagues; Secrecy Universalism – research evaluated only on its merit; Particularism – research evaluated by reputation/past productivity Disinterestedness – scientists motivated by knowledge and discovery, not by personal gain; self-interestedness – treat science as a competition with other scientists Organized skepticism – consider all new evidence, theory, data, even if it contradicts one’s prior work/point-of-view; organized dogmatism – invest career in promoting one’s own most important findings, theories, innovations Quality – seek quality contributions; Quantity – seek high volume
Norms Counternorms Communality Universalism Secrecy Particularlism Open sharing Universalism Evaluate research on own merit Secrecy Closed Particularlism Evaluate research by reputation Communality – open sharing with colleagues; Secrecy Universalism – research evaluated only on its merit; Particularism – research evaluated by reputation/past productivity Disinterestedness – scientists motivated by knowledge and discovery, not by personal gain; self-interestedness – treat science as a competition with other scientists Organized skepticism – consider all new evidence, theory, data, even if it contradicts one’s prior work/point-of-view; organized dogmatism – invest career in promoting one’s own most important findings, theories, innovations Quality – seek quality contributions; Quantity – seek high volume
Norms Counternorms Communality Universalism Disinterestedness Secrecy Open sharing Universalism Evaluate research on own merit Disinterestedness Motivated by knowledge and discovery Secrecy Closed Particularlism Evaluate research by reputation Self-interestedness Treat science as a competition Communality – open sharing with colleagues; Secrecy Universalism – research evaluated only on its merit; Particularism – research evaluated by reputation/past productivity Disinterestedness – scientists motivated by knowledge and discovery, not by personal gain; self-interestedness – treat science as a competition with other scientists Organized skepticism – consider all new evidence, theory, data, even if it contradicts one’s prior work/point-of-view; organized dogmatism – invest career in promoting one’s own most important findings, theories, innovations Quality – seek quality contributions; Quantity – seek high volume
Norms Counternorms Communality Universalism Disinterestedness Open sharing Universalism Evaluate research on own merit Disinterestedness Motivated by knowledge and discovery Organized skepticism Consider all new evidence, even against one’s prior work Secrecy Closed Particularlism Evaluate research by reputation Self-interestedness Treat science as a competition Organized dogmatism Invest career promoting one’s own theories, findings Communality – open sharing with colleagues; Secrecy Universalism – research evaluated only on its merit; Particularism – research evaluated by reputation/past productivity Disinterestedness – scientists motivated by knowledge and discovery, not by personal gain; self-interestedness – treat science as a competition with other scientists Organized skepticism – consider all new evidence, theory, data, even if it contradicts one’s prior work/point-of-view; organized dogmatism – invest career in promoting one’s own most important findings, theories, innovations Quality – seek quality contributions; Quantity – seek high volume
Norms Counternorms Communality Universalism Disinterestedness Open sharing Universalism Evaluate research on own merit Disinterestedness Motivated by knowledge and discovery Organized skepticism Consider all new evidence, even against one’s prior work Quality Secrecy Closed Particularlism Evaluate research by reputation Self-interestedness Treat science as a competition Organized dogmatism Invest career promoting one’s own theories, findings Quantity Communality – open sharing with colleagues; Secrecy Universalism – research evaluated only on its merit; Particularism – research evaluated by reputation/past productivity Disinterestedness – scientists motivated by knowledge and discovery, not by personal gain; self-interestedness – treat science as a competition with other scientists Organized skepticism – consider all new evidence, theory, data, even if it contradicts one’s prior work/point-of-view; organized dogmatism – invest career in promoting one’s own most important findings, theories, innovations Quality – seek quality contributions; Quantity – seek high volume
Anderson, Martinson, & DeVries, 2007
Incentives for individual success are focused on getting it published, not getting it right Nosek, Spies, & Motyl, 2012
Problems Low power Flexibility in analysis Selective reporting Ignoring nulls Lack of replication Examples from: Button et al – Neuroscience Ioannidis – why most results are false (Medicine) GWAS Biology Two possibilities are that the percentage of positive results is inflated because negative results are much less likely to be published, and that we are pursuing our analysis freedoms to produce positive results that are not really there. These would lead to an inflation of false-positive results in the published literature. Some evidence from bio-medical research suggests that this is occurring. Two different industrial laboratories attempted to replicate 40 or 50 basic science studies that showed positive evidence for markers for new cancer treatments or other issues in medicine. They did not select at random. Instead, they picked studies considered landmark findings. The success rates for replication were about 25% in one study and about 10% in the other. Further, some of the findings they could not replicate had spurred large literatures of hundreds of articles following up on the finding and its implications, but never having tested whether the evidence for the original finding was solid. This is a massive waste of resources. Across the sciences, evidence like this has spurred lots of discussion and proposed actions to improve research efficiency and avoid the massive waste of resources linked to erroneous results getting in and staying in the literature, and about the culture of scientific practices that is rewarding publishing, perhaps at the expense of knowledge building. There have been a variety of suggestions for what to do. For example, the Nature article on the right suggests that publishing standards should be increased for basic science research. [It is not in my interest to replicate – myself or others – to evaluate validity and improve precision in effect estimates (redundant). Replication is worth next to zero (Makel data on published replications; motivated to not call it replication; novelty is supreme – zero “error checking”; not in my interest to check my work, and not in your interest to check my work (let’s just each do our own thing and get rewarded for that) Irreproducible results will get in and stay in the literature (examples from bio-med). Prinz and Begley articles (make sure to summarize accurately) The Nature article by folks in bio-medicine is great. The solution they offer is a popular one in commentators from the other sciences -- raise publishing standards. Sterling, 1959; Cohen, 1962; Lykken, 1968; Tukey, 1969; Greenwald, 1975; Meehl, 1978; Rosenthal, 1979
Figure credit: fivethirtyeight.com Silberzahn et al., 2015
http://compare-trials.org/
Median effect size (d) = .29 % p < .05 = 63% Reported Tests (122) Median p-value = .02 Median effect size (d) = .29 % p < .05 = 63% Unreported Tests (147) Median p-value = .35 Median effect size (d) = .13 % p < .05 = 23% We find that about 40% of studies fail to fully report all experimental conditions and about 70% of studies do not report all outcome variables included in the questionnaire. Reported effect sizes are about twice as large as unreported effect sizes and are about 3 times more likely to be statistically significant. N = 32 studies in psychology Unreported tests (147) Median p-value = .35 Median d = .13 % significant = 23% Reported tests (N = 122) Median p = .02 Median d = .29 % sig p<.05 = 63% Franco, Malhotra, & Simonovits, 2015, SPPS
Positive Result Rate dropped from 57% to 8% after preregistration required.
Barriers Perceived norms (Anderson, Martinson, & DeVries, 2007) Motivated reasoning (Kunda, 1990) Minimal accountability (Lerner & Tetlock, 1999) I am busy (Me & You, 2016) We can understand the nature of the challenge with existing psychological theory. For example: 1. The goals and rewards of publishing are immediate and concrete; the rewards of getting it right are distal and abstract (Trope & Liberman) 2. I have beliefs, ideologies, and achievement motivations that influence how I interpret and report my research (motivated reasoning; Kunda, 1990). And, even if I am trying to resist this motivated reasoning. I may simply be unable to detect it in myself, even when I can see those biases in others. 3. And, what biases might influence me. Well, pick your favorite. My favorite in this context is the hindsight bias. 4. What’s more is we face these potential biases in a context of minimal accountability. What you know of my laboratory work is only what you get in the published report. … 5. Finally, even if I am prepared to accept that I have these biases and am motivated to address them so that I can get it right. I am busy. So are you. If I introduce a whole bunch of new things that I must now do to check and correct for my biases, I will kill my productivity and that of my collaborators. So, the incentives lead me to think that my best course of action is to just to the best I can and hope that I’m doing it okay.
MEANS REWARDS Novel, Positive, Clean Transparency, Reproducibility Outcomes Process MEANS Research Content Data and Materials Publication REWARDS
Signals: Making Behaviors Visible Promotes Adoption Badges Open Data Open Materials Preregistration Psychological Science (Jan 2014) Kidwell et al., 2016
40% 30% % Articles reporting data available in repository 20% 10% 0%
PREREGISTRATION Context of Justification Confirmation Data independent Hypothesis testing Context of Discovery Exploration Data contingent Hypothesis generating p-values interpretable p-values NOT interpretable PREREGISTRATION Presenting exploratory as confirmatory increases publishability of results at the cost of credibility of results Study 1 Study 1 Study 2
Positive Result Rate dropped from 57% to 8% after preregistration required.
Are you okay with receiving treatment based on clinical trials that were not preregistered? “oh but clinical trials are important” Why would I waste my time on something that isn’t important enough to be worth doing the best that I can do it?
Preregistration Challenge http://cos.io/prereg
Registered Reports PEER REVIEW Design Collect & Analyze Report Publish Review of intro and methods prior to data collection; published regardless of outcome Beauty vs. accuracy of reporting Publishing negative results Conducting replications Peer review focuses on quality of methods http://osf.io/8mpji, Committee Chair: Chris Chambers
Registered Reports 39 journals so far AIMS Neuroscience Attention, Percept., & Psychophys Cognition and Emotion Cognitive Research Comp. Results in Social Psychology Cortex Drug and Alcohol Dependence eLife Euro Journal of Neuroscience Experimental Psychology Human Movement Science Int’l Journal of Psychophysiology Journal of Accounting Research Journal of Business and Psychology Journal of Euro. Psych. Students Journal of Expt’l Political Science Journal of Personnel Psychology Journal of Media Psychology Leadership Quarterly Nature Human Behaviour Nicotine and Tobacco Research NFS Journal Nutrition and Food Science Journal Perspectives on Psych. Science Royal Society Open Science Social Psychology Stress and Health Work, Aging, and Retirement Review of intro and methods prior to data collection; published regardless of outcome Beauty vs. accuracy of reporting Publishing negative results Conducting replications Peer review focuses on quality of methods http://osf.io/8mpji, Committee Chair: Chris Chambers
Mundane and Big Challenges for Reproducibility Forgetting Losing materials and data
http://osf.io
Tyranny of the publication Solution = Separate publication and evaluation Nosek & Bar-Anan, 2012
Technology to enable change Training to enact change Incentives to embrace change Improving scientific ecosystem
UNIVERSITIES ecosystem PUBLISHING FUNDERS SOCIETIES
What can you do? Try OSF, http://osf.io/ Prereg Challenge, http://cos.io/prereg/ Share a preprint, http://osf.io/preprints/ Editors: Badges, Registered Reports, TOP Departments: OSF-Reproducibility workshops, hiring and promotion criteria Individuals: COS Ambassador Email: Support@cos.io or nosek@virginia.edu
http://improvingpsych.org/
So is there a crisis? Nature survey. Who are these respondents?
And why are they saying that – pervasive challenge across disciplines in reproducing prior results. A fundamental feature of science is that scientific claims gain credibility compared to other kinds of claims based on the potential to independently reproduce the evidence. But, of course, whether there is a reproducibility crisis or not isn’t really an a relevant question. The real question is whether there are cultural norms and practices that are undermining the efficiency and effectiveness of knowledge building, and are there ways that we can do better?
Conclusion Psychology is not under threat, it is leading the way to more open, reproducible science Start with conclusion…. Why such focus on psych? We are getting the attention not because we have bigger problems, but because we are bothering to face them and we have productive ideas and are taking steps to deal with them. Think about how much time you spend not caring about what is happening in physics, chemistry, earth sciences, and biology. Turns out, that is just about the same amount of time that physicists, chemists, earth scientists, and biologists spend not caring about us. That’s why the “reproducibility crisis” is actually an opportunity for us, not a threat. Other areas of science will care about us more and more not when we solve our challenges, but when we solve theirs. And, that’s already underway…
What can you do? Try out OSF, http://osf.io/ Prereg Challenge, http://cos.io/prereg/ Share a preprint, http://osf.io/preprints/ Join SIPS, http://improvingpsych.org/ These slides are shared at: http://osf.io/bq4kn [take a picture of this slide] Email: Support@cos.io or nosek@virginia.edu
http://cos.io/top
TOP Guidelines Data citation Design transparency Research materials transparency Data transparency Analytic methods (code) transparency Preregistration of studies Preregistration of analysis plans Replication
Data sharing 1 2 3 Article states whether data are available, and, if so, where to access them Data must be posted to a trusted repository. Exceptions must be identified at article submission. Data must be posted to a trusted repository, and reported analyses will be reproduced independently prior to publication.
749 Journals, 62 Organizations AAAS/Science American Academy of Neurology American Geophysical Union American Heart Association American Meterological Society American Society for Cell Biology Association for Psychological Science Association for Research in Personality Association of Research Libraries Behavioral Science and Policy Association BioMed Central Committee on Publication Ethics Electrochemical Society Frontiers MDPI Nature Publishing Group PeerJ Pensoft Publishers Public Library of Science The Royal Society Society for Personality and Social Psychology Society for a Science of Clinical Psychology Ubiquity Press Wiley
Open Access Open Data Open Workflows Outcomes Content Process Improving scientific ecosystem Make outcomes more accessible Make research content more accessible Make research process more accessible
OpenSesame
Dreber et al., in press, PNAS
Dreber et al., in press, PNAS
Dreber et al., in press, PNAS Fig. 3. Probability of a hypothesis being true at three different stages of testing: before the initial study (p0), after the initial study but before the replication (p1), and after replication (p2). “Error bars” (or whiskers) represent range, boxes are first to third quartiles, and thick lines are medians. Initially, priors of the tested hypothesis are relatively low, with a median of 8.8% (range, 0.7–66%). A positive result in an initial publication then moves the prior into a broad range of intermediate levels, with a median of 56% (range, 10–97%). If replicated successfully, the probability moves further up, with a median of 98% (range, 93.0–99.2%). If the replication fails, the probability moves back to a range close to the initial prior, with a median of 6.3% (range, 0.01–80%). Dreber et al., in press, PNAS
98% 56% 8.8% 6.3% Dreber et al., in press, PNAS Fig. 3. Probability of a hypothesis being true at three different stages of testing: before the initial study (p0), after the initial study but before the replication (p1), and after replication (p2). “Error bars” (or whiskers) represent range, boxes are first to third quartiles, and thick lines are medians. Initially, priors of the tested hypothesis are relatively low, with a median of 8.8% (range, 0.7–66%). A positive result in an initial publication then moves the prior into a broad range of intermediate levels, with a median of 56% (range, 10–97%). If replicated successfully, the probability moves further up, with a median of 98% (range, 93.0–99.2%). If the replication fails, the probability moves back to a range close to the initial prior, with a median of 6.3% (range, 0.01–80%). 8.8% 6.3% Dreber et al., in press, PNAS
97% 37% xx Open Science Collaboration, 2015, Science
Mr = .396 Mr = .198 Open Science Collaboration, 2015, Science
New Reproducibility Projects Economics Cancer Biology Many Babies Michael Frank Computer Science Tropical Ecology Health Sciences RPCB, Economics – Camerer at CalTech Where are we going? -Support teams in many disciplines to adopt the Reproducibility Project model -Christian Collberg and Todd Proebsting have proposing a replication project in computer science -Emilio Bruna: Replication Project: Tropical Ecology -Michael Frank: Many Babies -Leslie McIntosh and Cynthia Hudson-Vitale: Electronic health records proposal: An internal replication at University of Washington, st. louis to replicate a study that was performed with a data set that is not accessible to the general public. How do we get there? -Focus on evaluating our initiatives and interventions KEVIN M. ESTERLING professor of political science at UC Riverside Christian Collberg Todd Proebsting Emilio Bruna Leslie McIntosh Cynthia Hudson-Vitale