DATA MANAGEMENT PLANS & THE DATA CITATION INDEX NIGEL ROBINSON 19 MAY 2014
©2010 Thomson Reuters THE INCREASING VISIBILITY OF DATA Grant funding agencies Journal publishers Publisher website Electronic articles Data repositories & registration agencies “Data is the new gold” – Neellie Kroes, EU Digital Agenda Commissioner
©2010 Thomson Reuters DIGITAL SCHOLARSHIP Very visible within the literature as a concept Articles, projects, university labs all devoted to digital scholarship in various ways Digital Scholarship Authors/researchers Research administrators Librarians, data archivists Publishers Grant funding organizations Interested Parties Discipline-specific and multidisciplinary content Needs and requirements vary by discipline Diverse content formats, with few standards Content
©2010 Thomson Reuters OVERVIEW Emerging landscape Opportunities Citation and attribution
©2010 Thomson Reuters NIH (2003) Data Sharing Policy that all funding applications of $500,000 or more per year are expected to address data-sharing in their application. NSF (2011) All funding proposals submitted on or after January 18, 2011, must include a “Data Management Plan” describing how the proposal will conform to NSF policy on the dissemination and sharing of research results. THE EMERGENCE OF FUNDING MANDATES
©2010 Thomson Reuters DATA MANAGEMENT REQUIREMENTS EXTEND ACROSS THE GLOBE Aug 2011… “expectation that all our funded researchers should maximise access to their research data with as few restrictions as possible. …. submit a data management and sharing plan as part of the application process.” 2007… “Researchers are to retain research data and primary materials, manage storage of research data and primary materials, maintain confidentiality of research data and primary materials.”
©2010 Thomson Reuters DATA MANAGEMENT REQUIREMENTS EXTEND ACROSS THE GLOBE “A further new element in Horizon 2020 is the use of Data Management Plans (DMPs) detailing what data the project will generate, whether and how it will be exploited or made accessible for verification and re-use, and how it will be curated and preserved. The use of a Data Management Plan is required for projects participating in the Open Research Data Pilot. Other projects are invited to submit a Data Management Plan if relevant for their planned research.”
©2010 Thomson Reuters IMPACT ON RESEARCH LIBRARIES 8
©2010 Thomson Reuters FUNDING MANDATES BECOMING STRONGER January 14, 2013… “failure to provide the requisite Data Management Plan will result in the application being rejected or terminated.”
©2010 Thomson Reuters WHY SHARE DATA? Verification - Findings can be verified Extend original findings, address new questions Reduce costs Training – data reuse Increased primary publications A. Pienta, G. Alter, J. Lyle (2010). The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data. Data sharing leads to more science & more knowledge
©2010 Thomson Reuters DATA SHARING The level and timing of data sharing varies 28% only share prior to publication 35% only share after publication 25% share before and after publication
©2010 Thomson Reuters INCREASED CITATION WITH SHARED DATA 35% to 69% more citations courtesy of Jon Sears (AGU) Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi: /journal.pone Bibliometrics
©2010 Thomson Reuters DEPOSITION OF DATA BY RESEARCHERS 13
©2010 Thomson Reuters RESEARCHERS NOT RECEIVING CREDIT 14 Barriers to creating and sharing data: Researchers are hesitant to spend time and effort to create and share data because they don’t feel the work is adequately exposed or accredited Researchers find it difficult to expose data they have produced because data repositories do not have clear standards or mechanisms in place for doing so
©2010 Thomson Reuters RESEARCHER PROBLEMS Access & discovery Citation standards Lack of willingness to deposit and cite Lack of recognition / credit
©2010 Thomson Reuters DATA MANAGEMENT PLAN BENEFITS Repository must hold data Repository must provide access to data Data deposit Material added/updated Provide statistics on deposited data Actively curate data in the archive Active Persistent IDs, DOIs or other permanent ID Contacts available for confirmation of interpretation Indication of intention to preserve data or provide access over the long term Contingency if repository was to cease to operate Make data accessible (or state licensing terms) Sustainable Funding information available for repository and deposited data Persistence Links to literature Citation in literature databases Data reuse
©2010 Thomson Reuters CHALLENGES Metadata –Resources –Expertise Citable data source Metadata quality –Unique & persistent identifiers –Consistency Data repositories are not static –How is version control handled? Partnerships
©2010 Thomson Reuters DATA CITATION Current citation style (in full text of article as informal citations) Desired/future citation style (as formally cited references) U.S. Dept. of Justice, Bureau of Justice Statistics (1996): MURDER CASES IN 33 LARGE URBAN COUNTIES IN THE UNITED STATES, Version 1. Inter-university Consortium for Political and Social Research. Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular a- synuclein. Gene Expression Omnibus. SE11574
©2010 Thomson Reuters DATA CITATION Lee, Seung-Jae; Lee, He-Jin; Cho, Ji-Hoon; Rho, Sangchul; Hwang, Daehee (2008): GSE11574: The responses of astrocytes stimulated by extracellular a- synuclein. Gene Expression Omnibus. SE11574 Data Citation Index New data metrics Scientific literature Published data sets
©2010 Thomson Reuters DATA CITATION INDEX AIMS Launched October M data records Enable the discovery of data repositories, data studies and data sets in the context of traditional literature Link data to research publications Help researchers find data sets and studies and track the full impact of their research output Provide expanded measurement of researcher and institutional research output and assessment Facilitate more accurate and comprehensive bibliometric analyses
©2010 Thomson Reuters INDEXING A DATA REPOSITORY ON WEB OF SCIENCE Repository/Source: Comprises data studies, data sets and/or microcitations. Stores and provides access to the raw data. Data Study: Descriptions of studies or experiments with associated data which have been used in the data study. Includes serial or longitudinal studies over time. Data Set: A single or coherent set of data or a data file provided by the repository, as part of a collection, data study or experiment. Microcitation: (nanopublication) An assertion about concepts that have been found to be linked by scientific enquiry, and can be uniquely identified and attributed to its author. Made up of three separate parts: a subject, a predicate and an object. 21 Record Types Descriptive metadata feed from repository Repository raw metadata is analysed Metadata added Repository Data study Data set Micro- citation
©2010 Thomson Reuters Search Results within the Data Citation Index present the powerful Web of Knowledge options for exploring a body of information. Data becomes discoverable alongside literature
Data deposition makes it possible to show related data from the repository
Because data are accessible and able to be cited, they can be linked to publications describing research which uses them
Link out directly to the original item, in this case a Data Study.
Start to build citation maps associated with data through the association of data and literature
Provide assistance in how to associate data and literature through citation
©2010 Thomson Reuters DATA CITATION INDEX & DATA MANAGEMENT PLANS Discovery of data most important to scholarly research Data linked to published research literature Measures of data citation, use and reuse with attribution assisted by identifiers New metrics for digital scholarship
©2010 Thomson Reuters Thank you Nigel Robinson
©2010 Thomson Reuters As we evaluate repositories for inclusion, some of the things we consider are: Editorial Content - ensuring that material is desirable to the research community. Persistence and stability of the repository, with a steady flow of new information. Thoroughness and detail of descriptive information. Links from data to research literature. REPOSITORY SELECTION & EVALUATION
©2010 Thomson Reuters DATA REPOSITORIES Over 1000 repositories identified
©2010 Thomson Reuters TYPES OF DATA BY DISCIPLINE ART & HUMANITIES CULTURAL HERITAGE LANGUAGE CORPUS IMAGE COLLECTIONS RECORDINGS SOCIAL SCIENCES POLL DATA ECONOMIC STATISTICS LONGITUDINAL DATA NATIONAL CENSUS PUBLIC OPINION SURVEYS SCIENCE & TECHNOLOGY MAPS ALGORITHMS GENOMICS SKY SURVEYS ASTROPHYSICS REMOTE SENSING MUSEUM SPECIMENS