Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outstanding Metadata Issues Affecting Data Citation Accuracy

Similar presentations


Presentation on theme: "Outstanding Metadata Issues Affecting Data Citation Accuracy"— Presentation transcript:

1 Outstanding Metadata Issues Affecting Data Citation Accuracy
RDA 10th Plenary BoF Meeting Moderators: Megan Force & Nigel Robinson Panelists: Varsha Khodiyar, Fiona Murphy September 2017

2 Introduction - Clarivate Analytics
We have a 60-year legacy of curating the most authoritative knowledgebase, including the Web of Science, a custodian of 100 years’ worth of research. Over 7,000 leading research institutions and scholarly publishers use our market-leading products

3 Journal Citation Reports Specialist Literature
Web of Science platform covers nearly 33,000 journals and other scholarly output including books, conference proceedings, datasets, and patents Web of Science – the most accurate and comprehensive resource for the world The Web of Science provides access to an unrivalled breadth of global research literature linked to a rigorously selected core of world class journals, ensuring a unique combination of discovery through meticulously captured metadata and citation connections, coupled with quality, impact and neutrality. Web of Science Journal Citation Reports 32,840 Total journals in Web of Science* ~11,500 journals in over 230 science and social sciences disciplines Regional Collections Total Cites Journal Impact Factor (JIF) Five-Year Journal Impact Factor Immediacy Index Cited Half-Life Citable Items JIF Percentile Eigenfactor® Metrics 18,245 Core Collection* 8,888 SCIE 3,256 SSCI 1,785 AHCI 5,369 ESCI 9,950 Specialist Literature* 5,358 BIOSIS Citation Index 4,941 Zoological Record 1,038 FSTA 4,338 INSPEC 5,533 Medline 7,945 CABI 4,646 Regional Collections* Specialist Literature Core Collection 18,245 journals from recognized globally significant journal lists and emerging sources Core Collection Databases: SCIE – Science Citation Index Expanded (included in JCR) SSCI – Social Sciences Citation Index (included in JCR) AHCI – Arts & Humanities Citation Index ESCI – Emerging Sources Citation Index * Unique journal count across databases Conferences 191K+ Books 81K+ Datasets 7M+ Patent Records 70M+

4 Much work to do! Typical Data citation
Introduction – data Lack of consensus across data repositories Varying quality across data repositories Evidence Data Citation index content from 350 data repositories 7M data records 6.5M citations Differences between disciplines Lack of curation Much work to do! Typical Data citation Armenteras, Dolors; Gibbes, Cerian; Anaya, Jesus; Davalos, Liliana (2017): R script to compare models to forest loss alert system. Dryad.

5 Varsha Khodiyar Fiona Murphy Panelists
As Data Curation Editor for the Nature Research journal Scientific Data, Varsha is responsible for overseeing the structured metadata files created each Data Descriptor, and curates Scientific Data’s recommended repository list. Varsha’s curation experience includes a post-doctoral position on the Human Gene Nomenclature Committee, and 8 years of Gene Ontology annotation. Varsha began her publishing career as Data Publishing Manger for F1000Research. Varsha has been an RDA member since the second plenary in September She previously co-chaired the Data Publication Workflows WG. Fiona Murphy is an independent research data and publishing consultant advising institutions, learned societies and commercial companies, Fiona is also an Associate Fellow at the University of Reading, a Board Member for the Dryad Data Repository. She is a past and current member of several research projects including data2paper (a cloud-based app for automating the data article submission process) and the Scholarly Commons Working Group (a FORCE11 project devising principles and practices for open science systems). She is also past Co-Chair for the RDA Publishing Data Workflows WG and is currently involved with a number of RDA IGs and WGs.

6 Data Citation Metadata Elements
Meeting Agenda Presentation and discussion of known ‘sticking points’ contributing to data citation/matching issues: Authorship Dataset versioning Data citation dates before digitization Data citation dates after the current date Panel discussion/request for comment Identify further issues Wrap up: where does consensus exist on these issues? Where does it not?

7 Data Citation Metadata Elements
Introduction Following established journal citation practices only gets us so far, as data and other non-traditional scholarly output feature unique and subtle differences with respect to identification and interpretation Various efforts are underway with respect to specific implementation of guidelines, yet there remain some gaps

8 Data Citation Metadata Elements
Introduction For many of the citation questions identified, contributions from stakeholder communities are needed for consistency and standardization Aim is to improve citation practice as well as citation matching, and in turn contribute toward improved author credit and increased incentives for data sharing

9 Data citations: determination of authorship, publishing entity

10 Author entity identification
Data Citation Authorship Author entity identification Identifying the proper author entity for citation purposes may require significant negotiation with data repository: custodians, curators, etc, listed, but who is the author? Fundamental question of credit; vital for metrics/analytics

11 Author identification: questions
Data Citation Authorship Author identification: questions How is ‘data author‘ defined for citation purposes? What can be done to ensure that data author becomes a more recognized, even mandatory concept/element?

12 Publishing entity identification
Data Publication Publishing entity identification DataCite, Publisher element may be broadly defined; necessary to identify primary curator in order to avoid confusion/duplication of records

13 How is ‘data publisher‘ defined for citation purposes?
Data Publication Publisher: questions How is ‘data publisher‘ defined for citation purposes?

14 Data citations: versioning practices

15 Undertaken after observing wide variation in versioning practices
DCI versioning study Undertaken after observing wide variation in versioning practices Key metadata element for reproducibility 72% of all data repositories in DCI were found to have no version information for datasets

16 Where is version information found?
Versioning Practices Where is version information found? A significant amount of version information is not readily available through a dedicated metadata tag Version information may be concatenated with dataset title, or may be found in the dataset identifier (accession number, DOI, etc)

17 Versioning Practices Versions in citations? Only 26% of repositories which employ versioning include version in a recommended data citation Versions are included in formal data citations for a greater share of these repositories

18 Versioning Practices Takeaways Repository policies with respect to versioning are not displayed on database websites Metadata vs. data versioning is generally unclear/difficult to determine Versioning practices may vary significantly even within the datasets of a single data repository While versioning practices are being adopted by repositories in the interest of correct citation and reproducibility, little or no guidance exists regarding best practice at a cross-disciplinary or discipline-specific level

19 Versioning practices: questions
Should a single data item with multiple versions have only one citable record (reproducibility based on version information included in the citation blurb), or should a new DOI be issued for each version? Does this depend on format/discipline?

20 Data citations: dates before digitization

21 Citation dates from periods before digitization
Dates Before Digitization Citation dates from periods before digitization Datasets that were not ‘born digital’ May be the original publication date of a dataset which is now found online Data may be highly valuable (e.g. 19th century glacier data contributing to studies of climate change effects)

22 Pre-digitization citation dates: DataCite recommendations
Dates Before Digitization Pre-digitization citation dates: DataCite recommendations

23 Dates before digitization: questions
Do older dates in this context cause confusion? Do they make sense for certain disciplines? When does a previously non-digital publication become a data publication?

24 Data citations: dates after the current date

25 Citation dates after the current date
Future Citation Dates Citation dates after the current date Metadata for datasets that are not yet accessible is being made available for harvest by outside groups; included in metadata feeds which primarily describe already-published datasets ‘Published dates’ for these datasets are listed as in the future Sometimes this data is specified as being under embargo until a specific date; other times no embargo is specified but there is a statement to the effect that the data will be made available within the next 6 months, year, etc. Intersection of publisher/repository/funder policies for data availability

26 Citation dates after the current date: questions
Future Citation Dates Citation dates after the current date: questions Is there friction between stakeholder policies for data availability (timing of article publication vs. data object publication, etc.)? Should dataset metadata be provided to indexers, etc, for data that has not yet been published?

27 Wrap up: consensus and next steps
Data Citation Metadata Elements Wrap up: consensus and next steps “Technical mechanisms for citation are only surface characteristics of the knowledge infrastructures in which they are embedded” - Christine Borgman Big Data, Little Data, No Data: Scholarship in the Networked World (2015)

28 Megan Force, Editor, Data Citation Index | (215) 823-6194 | megan
Megan Force, Editor, Data Citation Index | (215) | | clarivate.com


Download ppt "Outstanding Metadata Issues Affecting Data Citation Accuracy"

Similar presentations


Ads by Google