Practical Steps for Increasing Openness and Reproducibility Courtney Soderberg Statistical and Methodological Consultant Center for Open Science
COMMUNITY METASCIENCE INFRASTRUCTURE so, now that I’ve hopefully convinved you that analysis plans, pre-registration, and clearly distinguishing between confirmatory and exploratory analyses are important things to do, how do you actually implement these changes, and what is out there to help you make these changes? Talk is cheap, and researchers are very busy people who often have workflows that they really like, so how can we make these changes in a way that is the least disruptive and time consuming as possible? At the COS, we’re interested in finding ways to support researchers in implementing better research practices and to increase the openness, integrity, transparency, and the reproducibility of scientific research. We have three main areas of operation. We build free, open source infestructure to support open research practices, building a community that helps to incentives good practices, and performing meta-science research to investigate who changes in scientific practice affect publishing, results, and scientific findings. overall, stress that it’s community driven, all free, all open source, we have no intellectual property, eager to collaborate with anyone who shares same values and vision (scientisits, associations, governmetn, publishers, everyone) this whole esction shoudl be an invetation and ploy to get people to ask to work wiht us (in addition to people wanting to use our services) how to pitch my new services (maybe do that as the closer) - 2 to 3 slides about the services that I bring to the COS (what I can uniquely bring to the table - this is completely a sales pitch, but a needed sales pitch) - what I do is more about making research more reproducable, I’m year round, I can do remote sessions, group sessoins, or I can come to you, your lab (let us know what you want, what are you not getting from what’s being offered elsewhere right now) - could collabore with university stats labs to help them incorporate some of the reproducible stuff we do into the consulting they give (this is our mission, this is specifically what we’re suppose to be doing) Talking points: 2) We have three main areas of activity to address the problems with incentives and personal barriers which we mentioned previously: 1) enabling metascience research, 2) building community, and 3) developing infrastructure in the form of OSF. details to have on hand Change the incentive structure without creating more hurdles Increase accountability to self and others Increase discoverability of materials, data, and workflow Respect existing workflow infrastructure - OSF major arm of COS Workflow software to help researchers organize, archive, and connect Built flexibly so nothing is mandated: tools to support your individual workflow provides altmetrics community: transparent practices Offer a place to use and improve programming skills while making an important impact Designed badges to recognize open practices: indicates that they are valued; offer these freely to any journal or organization that wants to sign on Idea is to foster a diverse and inclusive community that discusses and improves its own practices we will provide the technical support and venue: we will connect you and you do whatever you’re best at metascience collaborative projects investigating the replicability of current scientific research: What is the problem, exactly? (RP:P, RP:CB, Wysktra-replicability in social science) COMMUNITY METASCIENCE INFRASTRUCTURE
Scientific Ideals - Innovative ideas - Reproducible results - Accumulation of knowledge 3) accumulate knowledge: science as a whole is slowly accumulating knowledge. Individual studies are pieces of evidence that may support or undermine a given theory, and overtime these individual pieces of evidence should accumulate into a greater understanding of what theories and phenomena are true, what is really going on in teh world around us? We want to believe that science is accumulating knowledge about interesting, real phenomena But is that really the case? How much can we trust the knowledge that has been accumulated based on published findings?
Unfortunately, it has become apparently over the last few years that perhaps the answer to that question is not all that much. Now, there have been some very prominent cases in the past few years of outright fraud, where people have completely fabricated their data, but I’m not talking about those case. What I’m talking is the general sense that many scientific findings in a wide variety of fields don’t replicate, and that the published literature has a very high rate of false-positives in it. So if a large proportion of our published results aren’t replicable, and are potential false positives, Are we actually accumulating knowledge about real phenomena? I would suggest that hte answer to this question is ‘no’, or at least we haven’t accumulated as much knowledge as we would like to believe.
What is reproducibility? Computation Reproducibility: If we took your data and code/analysis scripts and reran it, we can reproduce the numbers/graphs in your paper Methods Reproducibility: We have enough information to rerun the experiment or survey the way it was originally conducted Results Reproducibility/Replicability: We use your exact methods and analyses, but collect new data, and we get the same statistical conclusion
There’s more to it than sharing of discrete objects There’s more to it than sharing of discrete objects. Think about using this as an opportunity to increase transparency by capturing the entire workflow, and to do so while connecting tools and services that make up the parts of the workflow, not requiring people to change all of their practices at once, and providing immediate efficiencies and value to the researcher AS they comply with requirements. Easy, right? Obviously not.
Why should you care? Your own work less efficient Hard to build off our own work, or work of others in our lab We may not have the knowledge we think we have Hard to even check this if reproducibility low
Current Barriers Statistical Transparency Ignoring null results Researcher degrees of freedom Transparency Poor documentation Lack of openness
Steps Create a structured workspace
Open Science Framework https://osf.io contact@cos.io or my email if you want to talk to an actual person also mention videos on website all our data stored on amazon web services S3; security practices: all cloud storage, security measures in place (physical security for physility, who gets access to databases adn servies, so very limited who gets access to data about users (about you, about your behavior, about your private projects); we encrypt our data, measures in place to find if hacking is going on; sensitive confidential data, don’t put it there, because we aren’t currently complient with those requirment (currently working on a work around for HIPPA) but we would like to explore this with any who has expertise on this
Steps Create a structured workspace Create a research plan Pre-registration
Pre-registration Before conducting a study registering: The what of the study: General information about what you are investigating and how Research question Population and sample size General design Variables you’ll be collecting, or dataset you’ll be using
Pre-registration Study pre-registration decreases file-drawer effects Helps with discovery of unpublished, usually null findings
Figure 1. Positive Results by Discipline. Fanelli D (2010) “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e10068. doi:10.1371/journal.pone.0010068 http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0010068
Steps Create a structured workspace Create a research plan Pre-registration
Steps Create a structured workspace Create a research plan Pre-registration Pre-analysis plan for confirmatory research
Pre-analysis plan Like a pre-registration Detail the analyses planned for confirmatory hypothesis testing Decrease researcher degrees of freedom
Researcher Degrees of Freedom All data processing and analytical choices made after seeing and interacting with your data Should I collect more data? Which observations should I exclude? Which conditions should I compare? What should be my main DV? Should I look for an interaction effect?
False positive inflation this is also assuming your initial false positive rate was .05, which may not be true given that we often work with somewhat unreliable measures, low powered studies, and don’t often treat sitmuli as random factors which can all also increase false positive rates Now you may be saying to yourself, well,people don’t really do this do they? Simmons, Nelson, & Simonsohn (2012)
Solution: Pre-registered analyses Before data is collected, specify Sample size Data processing and cleaning procedures Exclusion criterion Statistical Analyses Registered in a read-only format so it can’t be changed Decreases RDF, so p-values are more face valid Registration holds you accountable to your self and to others; similar to model used in clinical trials
Exploratory vs. Confirmatory Analyses Interested in exploring possible patterns/relationships in data to develop hypotheses Confirmatory Have a specific hypothesis you want to test Pre-registration of analyses clarifies which are exploratory and which are confirmatory
Steps Create a structured workspace Create a research plan Pre-registration Pre-analysis plan for confirmatory research Archive materials from study
Steps Create a structured workspace Create a research plan Pre-registration Pre-analysis plan for confirmatory research Archive materials from study Analyze and document analyses
Steps Create a structured workspace Create a research plan Pre-registration Pre-analysis plan for confirmatory research Archive materials from study Analyze and document analyses Share study data, code, materials
What to share Sharing is a continuum Data underlying just results reported in a paper Data underlying publication + information about other variables collected Data underlying publication + embargo on full dataset All data collected for that study
Why you might want to share Journal/Funder mandates Increase impact of work Recognition of good research practices
Signals: Making Behaviors Visible Promotes Adoption Badges Open Data Open Materials Preregistration Psychological Science (Jan 2014)
Also have an initiave to get journals ot adopt review of pre-registered reports of publications, so not even publication decisions are data dependent, but are made instead on the solidness of theory, study design, and analyses
The $1,000,000 Preregistration Challenge Endorse TOP Guidelines Badges for Open Practices Registered Reports Another incentive for researchers to try out preregistration.
https://cos.io/prereg
Registered Reports Design Collect & Analyze Report Publish PEER REVIEW Review of intro and methods prior to data collection; published regardless of outcome Beauty vs. accuracy of reporting Publishing negative results Conducting replications Peer review focuses on quality of methods
Registered Reports Design Collect & Analyze Report Publish PEER REVIEW Review of intro and methods prior to data collection; published regardless of outcome Beauty vs. accuracy of reporting Publishing negative results Conducting replications Peer review focuses on quality of methods
Registered Reports 43 journals so far Adv in Mthds & Practices in Psyc Sci AIMS Neuroscience Attention, Percept., & Psychophys Cognition and Emotion Cognitive Research Comp. Results in Social Psychology Cortex Drug and Alcohol Dependence eLife Euro Journal of Neuroscience Experimental Psychology Human Movement Science Infancy Int’l Journal of Psychophysiology Journal of Accounting Research Journal of Business and Psychology Journal of Cogn. Enhancement Journal of Euro. Psych. Students Journal of Expt’l Political Science Journal of Personnel Psychology Journal of Media Psychology Leadership Quarterly Management & Org Review Nature Human Behaviour Nicotine and Tobacco Research NFS Journal Nutrition and Food Science Journal Perspectives on Psych. Science Royal Society Open Science Social Psychology Stress and Health Work, Aging, and Retirement Review of intro and methods prior to data collection; published regardless of outcome Beauty vs. accuracy of reporting Publishing negative results Conducting replications Peer review focuses on quality of methods https://cos.io/rr/, Committee Chair: Chris Chambers
Steps Create a structured workspace Create a research plan Pre-registration Pre-analysis plan for confirmatory research Archive materials from study Analyze and document analyses Share study data, code, materials
How to make this more efficient? Have conversations with collaborators early What is our data management plan? What/when will we share? Be consistent across studies If an entire lab has the same structure, then it’s easier to find things Document from the beginning
Where to get help: Reproducible Research Practices? The OSF? stats-consulting@cos.io The OSF? support@cos.io Have feedback for how we could support you more? contact@cos.io feedback@cos.io
Registered Reports Committee COS helps coordinate a committee of editors and others interested in the format - committee led by Chris Chambers (University of Cardiff) The RR’s committee maintains best practices for RR’s and tracks use of RR’s by journals- Variations include an open submission format for any preregistered research, RR’s focused on replication with open calls for participating labs, and special issues focused on RR’s Committee also provides resources to the community: guidelines + templates for editors; comparisons of features by journals for authors + researchers. Goal: easy adoption.
Early Adopters AIMS Neuroscience Attention, Perception & Psychophysics Comparative Political Studies Comprehensive Results in Social Psychology Cortex Drug and Alcohol Dependence eLife Experimental Psychology Frontiers in Cognition Journal of Business and Psychology Nutrition and Food Science Journal Perspectives on Psychological Science Social Psychology Working, Aging, and Retirement
Signals: Making Behaviors Visible Promotes Adoption Badges Open Data Open Materials Preregistration Psychological Science (Jan 2014)
Also have an initiave to get journals ot adopt review of pre-registered reports of publications, so not even publication decisions are data dependent, but are made instead on the solidness of theory, study design, and analyses
Data Availability in Psychological Science 10x increase from 2013 to 2015
Connecting silos to enable increased discovery
http://osf.io/share
https://api.osf.io/v2/docs/ Localized connections are now possible with the new OSF API. API Docs https://api.osf.io/v2/docs/
http://osf.io/ free, open source Share data, share materials, show the research process – confirmatory result make it clear, exploratory discovery make it clear; demonstrate the ingenuity, perspiration, and learning across false starts, errant procedures, and early hints – doesn’t have to be written in painstaking detail in the final report, just make it available. http://osf.io/ free, open source
Put data, materials, and code on the OSF
Manage access and permissions
Automate versioning With hashes
Share your work
OSF Extend beyond core features by connecting to other tools and services in the workflow. This allows for more incremental change while making significant gains in automation, efficiency, and reproducibility. Immediate value from this reinforces more changes.. Now
OSF Authoring New storage – some that can run under your desk Analytic workflow Lab notebook Data management plan Institutional platforms like vivo Publishing platforms like OJS and Ubiquity Press OpenSesame Soon 29 grants to develop open tools and services: https://cos.io/pr/2015-09-24/
Get a persistent identifier https://osf.io/tvyxz/
See the impact File downloads
http://osf.io/share
https://api.osf.io/v2/docs/ Localized connections are now possible with the new OSF API. API Docs https://api.osf.io/v2/docs/
Awaiting Instruction…