Dryad UK discussion meeting Mark Patterson, Director of Publishing April 27, 2010 Committed to making the world’s scientific and medical literature a public resource
Why share data? Complete picture of the work Reliability of the conclusions/recommendations Developing alternative interpretations Reusing the data for new analyses –Data may be unique/precious Human participants deserve it –Whilst preserving confidentiality
Consequences of not sharing data Misunderstanding Uncorrected errors Misrepresentation Duplication of effort Limits research impact
..at least 70 structures demonstrated to be falsified… …the current problems could not have been easily discovered without the availability of the structure-factor files
…the full data must be accessible for scrutiny by the scientific community.
Barriers to (effective) data sharing Technical barriers –Lack of infrastructure (database) –Lack of standards (formats) –Too much data Administrative and legal barriers –Lack of clarity of reuse terms –Lots of files to organize and process –Publishers don’t make it easy enough Cultural barriers –Sharing is not the norm –Insufficient incentives –Maximizing credit via publication encourages hoarding of data
The role of publishers Policy requiring data sharing as a condition of publication Quality control of data Providing incentives to share data
Challenges to policy development Discipline-specific differences –Data sharing tradition/behaviour –Availability of an established database –Enforcing the right standards at the right time –Privacy/confidentiality issues Technical issues –Quantity of data –CC Zero Waiver Policing the policy –Making sure restrictions are clear before publication –Appropriate action after publication
Quality control - image manipulation Images screened for inappropriate manipulation Most frequent problem is that original files cannot be found Should all raw data be submitted?
Incentives Provide a forum for ‘data papers’ Indicators for the impact of datasets –Make sure that datasets are properly cited
PLoS Currents: Influenza Workflow Google Knol: Author(s) assemble content and control access and editing. Authors submit content to PLoS Currents. PLoS Currents: Moderators control posting of content, commenting and version control. PubMed Central: Immediate transfer from PLoS Currents site; stable identifier and permanent archiving.
PLoS Currents Influenza Very fast Very cheap Moderated by experts Citable Version control Archived at PubMed Central Indexed in PubMed
(
“Article-level metrics” could be applied to datasets in Dryad