Download presentation
Presentation is loading. Please wait.
1
Workshop to Support Data Science Workflows
Practical steps for increasing the openness and reproducibility of data science Supporting Research Workflows in Data Science UVA November 11, 2016 Natalie Meyers
2
Objectives Session 1 Understanding reproducible research
Setting up a reproducible project Keeping track of things Session 2 Containing bias Sharing your work
3
https://osf.io/ezcuj/
© COPYRIGHT FIRST LOOK MEDIA 2016 Psychology Repeat After Me by Maki Naro, Oct 6, 2016 in The Nib Used with Permission URL=
4
© COPYRIGHT FIRST LOOK MEDIA 2016
Repeat After Me by Maki Naro, Oct 6, 2016 in The Nib Used with Permission URL=
5
Technology to enable change Training to enact change
Incentives to embrace change Cultural change is crucial to increasing adoption of reproducible principles and practices. The Centre for Open Science facilitates a range of communities to achieve the aim of increasing the inclusivity and transparency of research by shifting incentives and practices to align more closely with scientific values.
6
Badges for Open Practices Reproducibility Projects
OSF TOP Guidelines Badges for Open Practices Reproducibility Projects PreReg & Registered Reports SHARE Centre for Open Science communities form around particular products or services to define specifications, maximize applicability, promote adoption, and facilitate evaluation and improvement, to improve scientific practices by increased reproducibility.
7
What is the problem? Scientific method
Best way to learn how the world works Replication of findings is highest standard of evaluating evidence Replication of methods allows reuse and extension of new knowledge Observation Question Hypothesis Prediction Testing Analysis Replication Researchers try to figure out how the world works: asking questions + testing hypotheses = discover new knowledge findings become credible if they are reproducible reuseable methods allow others to build upon new knowledge
8
What is the problem? How do we share this new knowledge?
Published scientific literature: what is "known" should include all discoveries, new evidence, and be reproducible BUT the published literature is not as reliable as we would hope. Why? methodological, statistical, and reporting practices result in a published literature more beautiful and tidy than reality organizational practices result in unavailable, lost, or difficult to use data, code, and materials these practices happen across all scientific disciplines Not talking about fraud. Fraud only small part. This workshop: Discuss standard research practices that are detrimental to scientific ideals Offer practical solutions to improve reproducibility and impact of one's own work.
9
What is reproducibility?
Scientific method Computational reproducibility Observation Question Hypothesis Prediction Testing Analysis Replication Reproducibility = broad term Computational Reproducibility Data + code Reran it Can reproduce your numbers + figures Surprisingly tricky: Quarterly Journal of Political Science requires a ‘replication package’ (data and code) 58% of submissions not computationally reproducible
10
What is reproducibility?
Scientific method Computational reproducibility Empirical reproducibility Observation Question Hypothesis Prediction Testing Analysis Replication Empirical Reproducibility Can rerun the original experiment the way it was originally conducted Draw the same conclusions
11
What is reproducibility?
Scientific method Computational reproducibility Empirical reproducibility Conceptual reproducibility Observation Question Hypothesis Prediction Testing Analysis New data Replication Conceptual reproducibility An attempt to validate the interpretation of the original study Collect new data on the same conceptual variables Make the same interpretations 3 types important need first 2 to achieve conceptual reproducibility This workshop: what you can do to improve all definitions of reproducibility end goal of making scientific studies reproducible computationally, directly, and conceptually
12
What are the barriers? Disseminate Acquire data Evaluate Design
Prepare data Explore Model Evaluate Design Many barriers. This workshop: Focus on those normative research practices that act as barriers One of the barriers: Lack of documentation + transparency across the whole research lifecycle: Only part of research lifecycle we can normally access: published report Not only not shared, usually not documented as well Why a problem? We will NOT REMEMBER what we did. You will NOT REMEMBER why we did it. By documenting what we did, we are communicating what we did with our future selves + future researchers Not documenting means we cannot: Pass our knowledge on to other lab personnel or other researchers Defend our methods + process Build on it ourselves Published report is not enough for the scientific method to work efficiently: Published reports lack important details Published literature biased Null findings less likely to be published = publication bias and reporting bias Published report is static No indication of evolution of question, methods, or analysis strategies No distinction between confirmatory + exploratory analyses No information on protocol deviations Need this information to evaluate the findings For published literature to be reproducible: Must better document the lifecycle of all research studies, not just published ones Will provide science with less biased + more complete information about the research being conducted
13
What are the barriers? Statistical Low power
Researcher degrees of freedom Transparency Poor documentation Poor reporting Lack of sharing Not exhaustive list of barriers to reproducibility. Today we will focus on these barriers to reproducibility.
14
Why practice reproducibility?
The idealist The pragmatist Shoulders of giants! Validates scientific knowledge Allows others to build on your findings Improved transparency Increased transfer of knowledge Increased utility of your data + methods Increased efficiency Reduces false leads based on irreproducible findings Data sharing citation advantage (Piwowar 2013) “It takes some effort to organize your research to be reproducible… the principal beneficiary is generally the author herself.”- Schwab & Claerbout Improving reproducibility important at the community + individual level: 1. Effect on research community We don't have the knowledge we think we have Can't check to see if findings will replicate Slows down progress of science wasted resources redoing studies that were unpublished poor decision making based on potentially non- replicable lines of research 2. Effect on our own work Makes your research less efficient Makes transitions within labs and between labs difficult You can view this issues as an idealist: Improving the reproducibility of your research is good for science. But you can also view these issues as a pragmatist: Improving the reproducibility of your research will benefit you.
15
How can you make your research reproducible?
1. Plan for reproducibility before you start Create a study plan Set-up a reproducible project 2. Keep track of things Documentation Version control 3. Contain bias Registration Reporting What will happen in this workshop? learn ways to increase documentation + transparency learn about tools that can help you work through example research study simulated research groups (PI, graduate student, and research assistant) collaborate to building an open, transparent research project learn good project management practices using the Open Science Framework Group set-up Create groups of 3 PI, graduate student, research assistant 4. Archive + share your materials
16
1. Plan for reproducibility before you start
Create a study plan How? Create a study plan before you gather your data Begin documentation early Shows evolution of study Research questions + hypotheses Study design Type of design Sampling Power and sample size Randomization? Variables measured Meaningful effect size Variables constructed Data processing Data management Analyses Sharing First step in planning for reproducibility: Create a study plan Sent you files yesterday: Materials from 2012 Annual National Election Survey in the US Easily understood for all backgrounds In your groups, begin to create your study plan: Open ‘Questionnaire’ Sample of 12 questions that respondents answered in the survey Each group will decide: what their research question will be When you come up with your question, make sure it is one that you know how to analyze quickly Simple is good: “will look at a graph, scatterplot, means” Our question: Is there a difference in political conservatism between males and females?
17
1. Plan for reproducibility before you start
Set-up a reproducible project How? Set-up a centralized location for project management Organization is especially important for collaboration Easily find the most recent file version Eases transition between lab members Allows for back-up and version control The next step in planning for reproducibility is to organize your project. We will be using the Open Science Framework to set up a centralized location for project management Documenting is difficult, especially after the fact Centralized location makes it easier Avoids multiple versions, multiple places (Dropbox, Google Drive, threads) Hard to: find most recent version include a new lab member back-up + track changes to the project Login to your OSF account Don’t have one, set one up. Log in The OSF is: Free Open source Customizable online framework for researchers Why use the OSF: Completed a systematic review with a team of researchers Different versions of files in different places: Dropbox some versions in threads Google drive Made it difficult Know where everything was Create back-ups Share our data + materials
18
https://osf.io/institutions/uva/
19
https://accounts.osf.io/login?campaign=institution
20
Institutional Login
21
1. Create an OSF project Create an OSF project
This is the project dashboard Create each new project What is an OSF project? Anything: lab group organization, grant line of research individual experiment Top level of a nested structure Can nest many things underneath First I will show you how to create project, when I am finished, I will give instructions to PI. Create the project: Let's create a project for our study Create a project titled Is there a difference in political conservatism between males and females? PI in your group: Click on the ‘create project’ button Title Description
22
1. Wiki, file tree, components, citation, GUID
PI should now see something similar to my screen Others follow along This is project overview page New projects look like this, bare Customize to fit your project or workflow The project overview page has a few different sections Public / private button Wiki Collaborative editing space Important overview information about the project File tree Upload and navigate to files Most file types are accepted Components How you customize structure and nesting Citation widget Globally Unique Identifier Permanent, unique identifier Will always point back to this page
23
1. Giving contributors access
Check the GUID Grad students or RAs Type in the GUID of the project your PI created Question: What are GS & RAs seeing after typing in GUID? Screen says you don’t have access Because all projects on the OSF are set to private by default. Only people who have been added as contributors to a private project have access PI is only contributor because he/she created it. Need to give the other two members of your team access to the project Add your team members as contributors Search for contributors Click the + icon to add them Select permissions for each contributor Read See + download files Cannot add files or edit content Read + Write Add + modify files Cannot change settings (new contributors or privacy settings) Administrator access Can change settings Add contributors, make public, delete components PIs Add contributors Decide permissions (everyone must be able to upload files) Grad students and RAs Try the GUID again Should now be able to see the project page Check your , you will also notice that you have received an alert Check the citation widget All people are listed Create a non-bibliographic contributor Give someone access to your project but no authorship credit, you can do this via the 'sharing' tab as well Perhaps RA needs access but not authorship: uncheck the ‘bibliographic contributor’
24
1. Creating a wiki We want to document the evolution of our project.
Need to write down important information Research question Hypotheses Background If we track changes to this important information = easy to retrace the project history On the OSF, a good place to put this type of information is in the wiki Real-time, collaborative editor Whole group can work on it at once Markdown Version control Create many wikis Access the wiki Clicking on the widget Clicking on the ‘edit’ button Research question + hypotheses Everyone Enter research question + hypotheses Any other study information
25
1. Adding organizational structure - components
Add structure Now flat Want to structure by adding sections to group related files Keep data separate from protocols, for example Keep lab members materials independent Keep separate experiments separate Components Add component Name component (materials, data, protocol, IRB, etc.) Choose a category Inside Data component contributors Can choose to keep same contributors Structure same as higher level project Own file trees, wikis, contributors, privacy settings Can nest components within components Organize different types of files Separate sensitive files from those that can be shared publicly Allows you to set up sections with different privacy settings or contributor lists Fine grained control over access to different parts of a project Examples Sensitive data vs. Anonymized data Copyrighted material vs. reuseable material Rough versions vs. good copies Contributors’ access (Raw data) I am going to create Data component Analysis component Materials component while you organize your project Everyone: Organize your project Use components and folders to give project structure Think of the types of files you will have: Methods, Data, Code/Analyses Think of how permissions and sharing of those files might differ No right or wrong, can move things later Create all the components
26
How can you make your research reproducible?
1. Plan for reproducibility before you start Create a study plan Set-up a reproducible project 2. Keep track of things Documentation Version control 3. Contain bias Registration Reporting Create a study plan Set-up a reproducible project Next step to planning for reproducibility is to Keeping track of things. 4. Archive + share your materials
27
2. Keep track of things Documentation Document everything done by hand
Document your software environment (eg, dependencies, libraries, sessionInfo () in R) Everything done by hand or not automated from data and code should be precisely documented: README files Make raw data read only You won’t edit it by accident Forces you to document or code data processing Document in code comments Great! We are now well on our way to creating a reproducible project To review: Want to start documentation of our project before you collect data Things you do by hand, such as edit an excel file, should be written down Record your software environment and dependencies A trick Make your raw data read-only Forces you to code your data processing Or remember to document
28
2. Keep track of things Version control Version control for data
Track your changes Everything created manually should use version control Tracks changes to files, code, metadata Allows you to revert to old versions Make incremental changes: commit early, commit often Git / GitHub / BitBucket Metadata should be version controlled More review: Track your changes Allows you to revert to older versions Commit your changes early Commit your changes often Use the OSF or Git Anyone in the room use GitHub? Great for version control And is integrated with OSF!
29
2. Version control Now, we’ve created some research materials
Going to store them on the OSF Upload the raw data files Upload the data dictionary Everyone Upload raw data file and data dictionary Now that we have some data, it is time to analyze In R, analyze the data to answer the research question Do a few steps Save it Upload it Now want to make some changes to my analysis Add some comments My files, such as these scripts, change over time Need to track those changes Called version control Version control on the OSF works in a few different ways. Open the analysis script For text files, you can edit through the OSF See the versions I already uploaded Can save new versions I create Click ‘edit’ Edits will be saved as a new version Works for CSV, R files, text files Upload your .R analyses scripts Edit them through the OSF Add comments
30
2. Version control The OSF keeps all files under version control.
Non-text files cannot be edited within OSF, still under version control. Version control for non-text: Open the file on your personal computer Make any changes that you want Save the file with the exact same name on your computer On your computer, you’ve just over-written the older version of the file Upload the file with the same name to the same component Now 2 versions of the file Can toggle between them Everyone Edit their projects Upload the rest of your materials Make changes to files Upload new versions Now we can use the wiki to add navigational information Similar to a README What are our files Where can we find things If information is missing or private, why? Kept under version control Documents the evolution of the project Begin to create table of contents to project Create a table of contents README to navigate the project
31
1. Add-ons Add-ons When collaborating, you will have a variety of tools already in use by your team: Dropbox GitHub Google Drive figshare Instead of moving files, we can integrate tools Store your code on GitHub How to add GitHub add-on Create a new component called ‘Code’ Settings Confirm Apply Username and password Will see dropbox folders, select yours Save Go back to Code Now have this folder in Code two-way door between the OSF and GitHub changes we make to that Gitbhub folder in OSF, vice versa
32
Objectives Session 1 Understanding reproducible research
Setting up a reproducible project Keeping track of things Session 2 Containing bias Sharing your work
33
How can you make your research reproducible?
1. Plan for reproducibility before you start Create a study plan Set-up a reproducible project 2. Keep track of things Documentation Version control 3. Contain bias Registration Reporting Planned for reproducibility Are keeping track of things Next step to planning for reproducibility is to Containing bias. 4. Archive + share your materials
34
3. Contain bias Share important moments in your study
Create public registrations of your study Improves transparency Improves accountability Counters selective reporting and outcome reporting bias Preregistration of all study plans helps counter publication bias What is preregistration? permanent, read-only copy of a project before data collection preregistrations eventually become public can be embargoed while conducting until published report Spectrum Ranges from simple study plan prereg AsPredicted questions Clinicaltrials.gov More in-depth study plan Full methods and background Full preregistration All that is missing is the data and the conclusion Includes a preregistered analysis script Why? Improves transparency A priori design distinguished from post hoc decisions Basic information about unpublished studies is public Increases accountability Switching primary with secondary outcomes visible Assess whether all outcomes were reported, or just significant ones Step towards countering publication bias What is publication bias? Next slide
35
Publication bias What is publication bias? Null results are less likely to be published than significant, positive results Results in a published literature of mostly positive results across all scientific disciplines range from 70% - 92% positive results How do we know? Neuroscience Remember that the median power of neuroscience studies is 30% If all published findings in neuroscience were true, we would expect about 30% of studies to have positive results However, 85% of neuroscience studies had positive results even if we always had a true hypothesis still wouldn’t have this percentage of positive results Published literature is too good to be true. Reasons Null results are not being published Researchers are incentivized to report significant, novel results Making data-driven choices in design + analyses that can increase the likelihood of a false positive Selectively reporting their findings Switching primary + secondary outcomes based on their significance Only reporting significant results Not submitting null results to publication How prereg counters publication bias? Unearths all the research that is being produced, not just published findings Example: A prominent journal publishes a novel, exciting, significant finding for cancer treatment Even without results, can help assess the likelihood of the study to be a false positive based on how many previous unpublished studies asked the same question. Can assess important changes from the study design to the report Outcome switching, selective reporting When paired with publishing and reporting policies, can boost the % of null results that are reported Fanelli D (2010) “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e10068.
36
3. Register your study AsPredicted: Preregistration made easy
Have any data been collected for this study already? Hypothesis. What's the main question being asked or hypothesis being tested in this study? Dependent variable. Describe the key dependent variable(s) specifying how they will be measured. Conditions. How many and which conditions will participants be assigned to? Begin the preregistration process Remember, a prereg is a spectrum Variety of ways to preregister Will show you how to preregister on the OSF later Everyone: Prereg activity Answer the first 4 “as predicted” questions word doc or wiki
37
3. Contain bias Analysis plan How? Define data analysis set
Register your analysis plan Defines your confirmatory analyses Decreases researcher degrees of freedom Define data analysis set Statistical analyses Primary Secondary Exploratory Missing data Outliers Multiplicity Subgroups + covariates (Adams-Huet and Ahn, 2009) Purpose of preregistration of your study design Improve transparency to important design information Can take this a step further Preregistration of your analysis plan Distinguishes the analysis decisions you made in advance of looking at your data Confirmatory analyses from analysis decisions you made to explore your data Exploratory analyses Creating and registering an analysis plan NOT to discourage or diminish the importance of exploratory work Exploratory work is vital for developing hypothesis! Just to distinguish Why is this distinction important? Next slide
38
Researcher degrees of freedom
any decision a researcher makes after looking at the data these decisions seem reasonable can increase the chance of a false positive No problem with making analysis decisions based on data The problem is when these decisions are not reported as data-driven decisions Need to distinguish between confirmatory analyses and exploratory, data-driven analyses Best way to do this Preregister your analysis plan along with your study plan Simmons, Nelson, & Simonsohn (2012)
39
3. Contain bias SAMPL Reporting How? Report transparently + completely
Transparently means: Readers can use the findings Replication is possible Users are not misled Findings can be pooled in meta-analyses Completely means: All results are reported, no matter their direction or statistical significance SAMPL Avoid HARKing: Hypothesizing After the Results are Known Report all deviations from your study plan Report which decisions were made after looking at the data An easy way to contain bias is to communicate transparently with the research community. Report your research transparently + completely Report your deviations Report which decisions were a priori Report which were made after seeing the data
40
3. Register your analysis
AsPredicted: Preregistration made easy 5. Analyses. Specify exactly which analyses you will conduct to examine the main question/hypothesis. 6. More analyses. Any secondary analyses? 7. Sample Size. How many observations will be collected or what will determine sample size? 8. Other. Anything else you would like to pre-register? (e.g., data exclusions, variables collected for exploratory purposes, unusual analyses planned?) Preregister your analysis plan: Also a spectrum Can be simple AsPredicted questions - open-ended Can be complete Whole analysis script Includes processing from raw to processed data analysis to presentation Everyone: Prereg activity Answer the rest of “as predicted” questions word doc or wiki How do you register a project? I am going to show how you register a project. Do not do this example because registrations are not reversible. Just watch me. The first thing is to click on the ‘registration’ tab on the top of the project. Note that this will register the part of the project you’re currently in, and all components and folders nested within this part of the project. Clicking register will take you to a registration page. There are a few different ‘registration templates’ that you can choose from that record some metadata about the registration. If you are unsure which to select, just choose the ‘Open-ended Registration’. In the open-ended box, you can say why you are registering the project. This is important, as you can have multiple registrations of a project. The description will help distinguish them. Registrations must become public, but you can either make them public immediately or you can place them under embargo for up to four years. If you choose to embargo, the project will be immediately registered, but the registration will not be made public until after the embargo date that you specify. You can end an embargo early, but you cannot extend it further. A few things to notice about the registration. Firstly, the registration GUID is different from the original project, so you can links directly to the project or the registration, whichever is more appropriate. For this registration, I chose to make it public immediately. Just like the actual project and registration have different GUIDs, they also have different privacy settings. So, even though the registration is public, the actual project is still private, and I can make changes to that project which non-contributors will not be able to see. Since this is a public registration, you’ll notice that you can request a DOI/ARK ID for the registration. DOI's are popular identifiers that are used frequently in citation of research products - like articles or datasets. For embargoed registrations, once the embargo ends and they become public, you can obtain a DOI for them.
41
3. How to register Now I will show you how to preregister
study plan + analysis plan on the OSF Do not do this example Registrations are not reversible + are permanent. Just watch me. To preregister Click ‘registration’ Will register the part of the project you’re currently in AND all components and folders nested within this part of the project Different ‘registration templates’ Record different metadata about the registration ‘Open-ended Registration’ Distinguishes multiple registrations Registrations must become public Make them public immediately Place them under embargo for up to four years Project will be immediately registered Registration will not be public until embargo date Can end embargo early, cannot extend Project can remain private forever, even when registrations are public Show example of completed registration Go to project Click ‘registrations’ Here you will see all the registrations Go to registration Things to note about the registration Registration GUID different from original project Can link to the project or registration, whichever is more appropriate Can continue to update + add to project Will not change registration Changes will not be public When public, can request DOI (Digital Object Identifier)
42
How can you make your research reproducible?
1. Plan for reproducibility before you start Create a study plan – Begin documentation at study inception Set-up a reproducible project – Centralize and organize your project management 2. Keep track of things Documentation – Document provenance, your environment, + everything done by hand Version control – Track your changes 3. Contain bias Registration – Share important moments in your study Reporting – Report transparently + completely Planned for reproducibility Are keeping track of things Are Containing Bias Next step to planning for reproducibility is to Archive and Share. 4. Archive + share your materials
43
© COPYRIGHT FIRST LOOK MEDIA 2016
Repeat After Me by Maki Naro, Oct 6, 2016 in The Nib Used with Permission URL=
44
Thomas Harriot’s “release dates” timeline
Harriot’s accurate observations of the 1607 comet used by Friedrich Wilhelm Bessel who is computing the comet's orbital elements & realizes that the 1607 comet was "Halley's comet". Thomas Harriot (1560–1621) Corresponded with Tycho Brahe, Johannes Kepler and Galileo Galilei Was the first to observe the moon using a telescope and to record his observations Harriot’s observational work recommended for publication by Count DeBruhl Rigaud Supplement including Harriot’s Observations Published Artis Analyticae Praxis Published in Latn Artis Analyticae Praxis Published in English Notebooks Made Open Online Recorded Observations of Comet Observed Sunspots Unlike his famous Italian contemporary Galileo, Harriot did not publish any of his scientific results, except for his report on his voyage to the New World. Harriot's work therefore has to be reconstructed from his manuscripts. Harriot (variously spelled as Harriot, Hariot or Harriott) was born in Oxfordshire, England, about 1560, the son of a commoner 1607 26 July 1609 1610 1631 1784 1785 1833 2007 2012 Galileo observes moon w/Telescope late Nov or Dec, Publishes in 1610
45
Thomas Harriot’s excuses
An Excerpt from Allan Chapman’s Thomas Harriot: The First Telescopic Astronomer Too Busy? Too Paranoid? Not Lazy! Not Timid! If you don’t trust your community transparency and openness are set by the wayside A researcher’s relationship to sponsors, political climate, and contemporaries can foster collaboration, openness and rapid re-use or secrecy and delay. Tell timeline of Harriot’s life.
46
Funder Mandates for Where to Archive + share
Where doesn’t matter* UNLESS YOU HAVE A GRANT THEN PAY ATTENTION!! If you submitted a Data Management Plan with your proposal, follow it! Deposit where your funder requires or recommends * CHECK what your funder mandates Why share? Your Funder Requires it Your Data Management Plan you submitted with your proposal may have already stated when you were going to share and where Many places to check: Funder Website CENDI Library’s Data Management Team Your University’s Office of Research
47
4. Archive + share your materials
Open Science Framework Where doesn’t matter*. That you share matters. Get credit for your code, your data, your methods Increase the impact of your research *BUT CHECK what your funder mandates Why share? Now have created our project Uploaded all the materials Documented our workflow We have lots to share! Many reasons: Get credit Mandate of a journal or a funder Increase the impact Data citation advantage (see Piwowar et. al., 2007) Sharing materials and data Allows others to reproduce and build off your work Core principle of science is transparency making available the basis of scientific claims for others to review, critique, reuse, or extend
48
4. Share your work Sharing - You can share your work right on the OSF, in your project or through registration Everything on the OSF is private by default Make the entire project public Just subsections of project Just click the Public button for each component Need to decide what can be public Sensitive data Legal or ethically sensitive Organize project to keep sensitive files together, private Our data has personal location information Can’t make the raw_data file public Created anonymized data file, which we can make public Both files in ‘data’ component Can’t make one public other private right now Everyone: Question: What could I do so that I could share my clean data file but not reveal my raw data file with the geolocation variable in it? Create a new component in Data called Raw Data Move my raw data into Raw Data component I’ve decided to share the top level of my project as well as my materials component Click on the public button on my project Will show me all the sections of my project Check that parts that I want to make public Demo Project and Materials Discuss how much of your project you can/want to make public. Reorganize any parts of your project to group public / private files Make those sections of your project public
49
4. Increasing discoverability
Making things public makes them discoverable Search on the OSF for “conservatism” to see your project Public projects on the OSF are also indexed by Google Search for Daniel Lakens effect size on Google. Click on OSF project. When researchers share their research we want credit for their work People can cite your project using the GUID OSF tracks other types of impacts Analytics OSF also supports indicators of impact beyond citation counts Look at Analytics page of Daniel Lakens effect size Related article published in Frontiers (258 citations) Views and downloads give additional information about impact not captured by citations alone Click on Files tab Can see the download counts for all the files Measure impact + reach in concrete ways
50
OSF for Meetings free poster + presentation service
Sharing for workshops and small conferences can be done easily on the OSF too. Show DASPOS example free poster + presentation service
51
Open Proceedings: Use OSF for Meetings + an OSF Project
Sharing for workshops and small conferences can be done easily on the OSF too. Show DASPOS example
52
OSF for Institutions integration with local research services
Starting with basic features, building a foundation to support integration with more specialized local services. integration with local research services
53
How to learn more Literate programming Version control
Reproducible Science Curriculum by Jenny Bryan e-science-curriculum/ Literate programming Literate Statistical Programming by Roger Peng ch?v=YcJb1HBc-1Q Version control Version Control by Software Carpentry carpentry.org/v4/vc/ Data management Data Management from Software Carpentry by Orion Buske carpentry.org/v4/data/mgmt.ht ml How to learn more 23 Things Libraries for Research Data Practical Steps for Increasing the Openness and Reproducibility of Research Data by Natalie Meyers
54
Reproducibility training
free stats + methods training
55
Transparency and Openness Promotion (TOP) Guidelines
The Transparency and Openness Promotion (TOP) Committee met at the Center for Open Science in Charlottesville, Virginia, in November 2014 to address one important element of the incentive systems: journals’ procedures and policies for publication. The committee consisted of disciplinary leaders, journal editors, funding agency representatives, and disciplinary experts largely from the social and behavioral sciences. By developing shared standards for open practices across journals, we hope to translate scientific norms and values into concrete actions and change the current incentive structures to drive researchers’ behavior toward more openness. Although there are some idiosyncratic issues by discipline, we sought to produce guidelines that focus on the commonalities across disciplines. Reproducibility of research can be improved by increasing transparency of the research process and products. The TOP Guidelines provide a template to enhance transparency in the science that journals publish. With minor adaptation of the text, funders can adopt these guidelines for research that they fund. The guidelines are the output of a meeting held in November 2014, organized by the Berkeley Initiative for Transparency in the Social Sciences, SCIENCE Magazine, and the Center for Open Science. The TOP Guidelines Committee, sponsored by the Center for Open Science, maintains an information commons for transparency standards, serves as an advisory group for journals and funders, evaluate guidelines’ effectiveness, and manages guideline updating to maximize quality and interdisciplinary applicability. Updates to standards are recorded with version number and date. Adopting journals and funders can denote the version number that they adopt to facilitate tracking and updating of standards over time. Problem There are no universal policies and procedures for incentivizing transparency about the research process. Nudging scientific practices toward greater openness requires complementary efforts from universities, granting agencies, societies, and publishers. The TOP Guidelines provide a modular set of standards for stakeholders to endorse and adopt to shift researcher incentives toward transparency. Strategy There is tremendous diversity in scientific questions and methodologies, but shared understanding of scientific norms. Shared language and expectations is a powerful mechanism for promoting cultural change. Across disciplines, research communities can speak a common language for transparency. Journals and funders adopt TOP guidelines to define expectations for their authors and grantees. Other stakeholders endorse TOP to express shared values. The TOP Guidelines have a low barrier to entry, are modular, and are applicable to all scientific disciplines. A low barrier to entry facilitates ease of adoption. The modular design is flexible. Journals and funders choose standards and levels of stringency most appropriate for their communities. TOP, as a discipline-agnostic set of guidelines, brings domain-specific efforts into a common framework.
56
Modular standards Low barrier to entry Discipline agnostic
There are eight standards in the TOP guidelines; each moves scientific communication toward greater openness. These standards are modular, facilitating adoption in whole or in part. However, they also complement each other, in that commitment to one standard may facilitate adoption of others. Moreover, the guidelines are sensitive to barriers to openness by articulating, for example, a process for exceptions to sharing because of ethical issues, intellectual property concerns, or availability of necessary resources. First, two standards reward researchers for the time and effort they have spent engaging in open practices. Citation standards extend current article citation norms to data, code, and research materials. Regular and rigorous citation of these materials credit them as original intellectual contributions. Replication standards recognize the value of replication for independent verification of research results and identify the conditions under which replication studies will be published in the journal. To progress, science needs both innovation and self-correction; replication offers opportunities for self-correction to more efficiently identify promising research directions Second, four standards describe what openness means across the scientific process so that research can be reproduced and evaluated. Reproducibility increases confidence in results and also allows scholars to learn more about what results do and do not mean. Design standards increase transparency about the research process and reduce vague or incomplete reporting of the methodology. Research materials standards encourage the provision of all elements of that methodology. Data sharing standards incentivize authors to make data available in trusted repositories such as Dataverse, Dryad, the Interuniversity Consortium for Political and Social Research (ICPSR), the Open Science Framework, or the Qualitative Data Repository. Analytic methods standards do the same for the code comprising the statistical models or simulations conducted for the research. Many discipline-specific standards for disclosure exist, particularly for clinical trials and health research more generally (e.g., Finally, two standards address the values resulting from preregistration. Standards for preregistration of studies facilitate the discovery of research, even unpublished research, by ensuring that the existence of the study is recorded in a public registry. Preregistration of analysis plans certify the distinction between confirmatory and exploratory research, or what is also called hypothesis-testing versus hypothesis-generating research. Making transparent the distinction between confirmatory and exploratory methods can enhance reproducibility ( 3, 13, 14).
57
Level 1 Disclose Level 2 Require Level 3 Verify http://cos.io/top
Not all of the standards are applicable to all journals or all disciplines. Therefore, rather than advocating for a single set of guidelines, the TOP Committee defined three levels for each standard. Level 1 is designed to have little to no barrier to adoption while also offering an incentive for openness. Level 2 has stronger expectations for authors but usually avoids adding resource costs to editors or publishers that adopt the standard. Level 3 is the strongest standard but also may present some barriers to implementation for some journals.
58
“The policy of the __ is to publish papers where authors indicate whether the data, methods used in the analysis, and materials used to conduct the research will be made available to any researcher for purposes of reproducing the results or replicating the procedure. Authors must, in acknowledgments or the first footnote, indicate if they will or will not make their data, analytic methods, and study materials available to other researchers. If an author agrees to make materials available, the author must specify where that material will be available.” Not all of the standards are applicable to all journals or all disciplines. Therefore, rather than advocating for a single set of guidelines, the TOP Committee defined three levels for each standard. Level 1 is designed to have little to no barrier to adoption while also offering an incentive for openness. Level 2 has stronger expectations for authors but usually avoids adding resource costs to editors or publishers that adopt the standard. Level 3 is the strongest standard but also may present some barriers to implementation for some journals.
59
“The policy of the ___________ is to publish papers only if the data used to conduct the research are clearly and precisely documented and are maximally available to any researcher for purposes of reproducing the results or replicating the procedure. Details of: What must be shared Legal and Ethical Exceptions – Disclosure at onset of review Using trusted repositories” Not all of the standards are applicable to all journals or all disciplines. Therefore, rather than advocating for a single set of guidelines, the TOP Committee defined three levels for each standard. Level 1 is designed to have little to no barrier to adoption while also offering an incentive for openness. Level 2 has stronger expectations for authors but usually avoids adding resource costs to editors or publishers that adopt the standard. Level 3 is the strongest standard but also may present some barriers to implementation for some journals.
60
“... are maximally available to any researcher for purposes of reproducing the results or replicating the procedure. All materials supporting the claims made by the author must be made available to the journal prior to publication. The journal, or an entity acting on behalf of the journal, will verify that the findings are replicable using the author’s data and methods of analysis. Failure to replicate at this stage may result in the paper not being published.” Not all of the standards are applicable to all journals or all disciplines. Therefore, rather than advocating for a single set of guidelines, the TOP Committee defined three levels for each standard. Level 1 is designed to have little to no barrier to adoption while also offering an incentive for openness. Level 2 has stronger expectations for authors but usually avoids adding resource costs to editors or publishers that adopt the standard. Level 3 is the strongest standard but also may present some barriers to implementation for some journals.
61
752 Journals 63 Organizations
Impact The TOP Guidelines were published in May As of March 2016, 538 journals and 58 organizations are signatories -- expressing support for the principles TOP outlines and committing to conduct a full review for potential adoption. TOP signatories include Science, BioMedCentral, and Wiley Publishers. Next steps A TOP Coordinating Committee representing stakeholders across disciplines is forming to promote, evaluate, update, and sustain the TOP Guidelines. COS will provide operational support to that committee with the following initiatives: Signatories Journals, publishers, societies, repositories, and other organizations with a stake in science are encouraged to join as signatories of the TOP Guidelines. Journal signatories are: Expressing their support of the principles of openness, transparency, and reproducibility Expressing interest in the guidelines and commit to conducting a review within a year of the standards and levels for potential adoption Organization signatories are: If relevant, encouraging associated journals to conduct a review of the standards and levels for potential adoption. Evaluating TOP Guidelines: COS and the committee will gather evidence about the effectiveness of TOP Guidelines for meeting its objectives. Such evidence will inform revisions and improvements to TOP. Journal/funder scorecard: TOP provides a basis for evaluating journal and funder transparency policies. A scorecard will facilitate promoting funders and journals that are leading examples for promoting research transparency, and stimulate new commitments to transparency. For journals, the scorecard could become an means of demonstrating journal quality on process, as a counterweight to the reviled Impact Factor that rewards journals on outcome. Best Practices: COS and the committee can use accumulating experience with TOP Guidelines to support newly adopting journals and funders for efficient and effective integration into their policies and monitoring procedures. We will foster an information commons and sharing of knowledge across stakeholders and research communities. Expanding TOP adoption: We aim to obtain adoption of TOP guidelines by the large majority of funders and journals supporting scientific research.
62
Badges for Open Practices
Other incentives for sharing: Low risk, low cost rewards (who doesn’t love a gold star) Voluntary A badge to signal open practices Evidence that journals that use these have more data sharing article signals open behaviour
63
DOI: http://dx.doi.org/10.1145/2994031
ACM: Incentivizing Reproducibility The ACM Task Force on Data, Software, and Reproducibility in Publication Ronald F. Boisvert Incentivizing reproducibility. Commun. ACM 59, 10 (September 2016), 5-5. DOI: Result and Artifact Review and Badging. ACM. June 8,
64
article signals open behaviour
Case Study: Psychological Science article signals open behaviour
65
Where to get help reproducibility training osf support work with COS
osf support work with COS Slides: All Materials Workshop Materials
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.