Download presentation
Presentation is loading. Please wait.
1
Megan Force Editor, Data Citation Index
Outstanding Metadata Issues Affecting Data Citation Accuracy Megan Force Editor, Data Citation Index May 2018
2
Introduction - Clarivate Analytics
We have a 60-year legacy of curating the most authoritative knowledgebase, including the Web of Science, a custodian of 100 years’ worth of research Over 7,000 leading research institutions and scholarly publishers use our products
3
Journal Citation Reports Specialist Literature
Web of Science platform covers nearly 33,000 journals and other scholarly output including books, conference proceedings, datasets, and patents Web of Science – the most accurate and comprehensive resource for the world The Web of Science provides access to an unrivalled breadth of global research literature linked to a rigorously selected core of world class journals, ensuring a unique combination of discovery through meticulously captured metadata and citation connections, coupled with quality, impact and neutrality. Web of Science Journal Citation Reports 32,840 Total journals in Web of Science* ~11,500 journals in over 230 science and social sciences disciplines Regional Collections Total Cites Journal Impact Factor (JIF) Five-Year Journal Impact Factor Immediacy Index Cited Half-Life Citable Items JIF Percentile Eigenfactor® Metrics 18,245 Core Collection* 8,888 SCIE 3,256 SSCI 1,785 AHCI 5,369 ESCI 9,950 Specialist Literature* 5,358 BIOSIS Citation Index 4,941 Zoological Record 1,038 FSTA 4,338 INSPEC 5,533 Medline 7,945 CABI 4,646 Regional Collections* Specialist Literature Core Collection 18,245 journals from recognized globally significant journal lists and emerging sources Core Collection Databases: SCIE – Science Citation Index Expanded (included in JCR) SSCI – Social Sciences Citation Index (included in JCR) AHCI – Arts & Humanities Citation Index ESCI – Emerging Sources Citation Index * Unique journal count across databases Conferences 191K+ Books 81K+ Datasets 7M+ Patent Records 70M+
4
Data Data repository Facts collected for reference or analysis
Definitions Data Facts collected for reference or analysis Non traditional scholarly output of scientific research often analysed in traditional research publications. May include numerical, textual, image, video or software information Data repository An online resource where data are deposited and stored for preservation and access
5
Data Citation Indexing: Transparency , Reuse, Credit
• Enables research conclusions to be verified and validated • Makes reproducibility of premises and results possible • Exposes data findings and their value to a wider audience • Ensures a mechanism for receiving credit for scholarly work and an opportunity for tracking/ translating such attribution into rewards
6
Much work to do! Typical Data citation
Introduction – data Lack of consensus across data repositories Varying quality across data repositories Evidence Data Citation index content from 350 data repositories 7M data records 6.5M citations Differences between disciplines Lack of curation Much work to do! Typical Data citation Armenteras, Dolors; Gibbes, Cerian; Anaya, Jesus; Davalos, Liliana (2017): R script to compare models to forest loss alert system. Dryad.
7
Data Citation Index: Example Record
8
Data Citation Metadata
Data Citation Metadata Elements Data Citation Metadata Following established journal citation practices only gets us so far, as data and other non-traditional scholarly output feature unique and subtle differences with respect to identification and interpretation Various efforts are underway with respect to specific implementation of guidelines, yet there remain some gaps
9
Data citations: dates before digitization
10
Citation dates from periods before digitization
Dates Before Digitization Citation dates from periods before digitization Datasets that were not ‘born digital’ May be the original publication date of a dataset which is now found online Data may be highly valuable (e.g. 19th century glacier data contributing to studies of climate change effects)
11
Pre-digitization citation dates: DataCite recommendations
Dates Before Digitization Pre-digitization citation dates: DataCite recommendations
12
Dates before digitization: questions
Do older dates in this context cause confusion? Do they make sense for certain disciplines? When does a previously non-digital publication become a data publication?
13
Data citations: determination of authorship, publishing entity
14
Author entity identification
Data Citation Authorship Author entity identification Identifying the proper author entity for citation purposes may require significant negotiation with data repository: custodians, curators, etc, listed, but who is the author? Fundamental question of credit; vital for metrics/analytics
15
Author identification: questions
Data Citation Authorship Author identification: questions How is ‘data author‘ defined for citation purposes? What can be done to ensure that data author becomes a more recognized, even mandatory concept/element?
16
Publishing entity identification
Data Publication Publishing entity identification DataCite, Publisher element may be broadly defined; necessary to identify primary curator in order to avoid confusion/duplication of records
17
How is ‘data publisher‘ defined for citation purposes?
Data Publication Publisher: questions How is ‘data publisher‘ defined for citation purposes?
18
Data citations: versioning practices
19
Undertaken after observing wide variation in versioning practices
DCI versioning study Undertaken after observing wide variation in versioning practices Key metadata element for reproducibility 72% of all data repositories in DCI were found to have no version information for datasets
20
Where is version information found?
Versioning Practices Where is version information found? A significant amount of version information is not readily available through a dedicated metadata tag Version information may be concatenated with dataset title, or may be found in the dataset identifier (accession number, DOI, etc)
21
Versioning Practices Versions in citations? Only 26% of repositories which employ versioning include version in a recommended data citation Versions are included in formal data citations for a greater share of these repositories
22
Versioning Practices Takeaways Repository policies with respect to versioning are not displayed on database websites Metadata vs. data versioning is generally unclear/difficult to determine Versioning practices may vary significantly even within the datasets of a single data repository While versioning practices are being adopted by repositories in the interest of correct citation and reproducibility, little or no guidance exists regarding best practice at a cross-disciplinary or discipline-specific level
23
Versioning practices: questions
Should a single data item with multiple versions have only one citable record (reproducibility based on version information included in the citation blurb), or should a new DOI be issued for each version? Does this depend on format/discipline?
24
Data citations: dates after the current date
25
Citation dates after the current date
Future Citation Dates Citation dates after the current date Metadata for datasets that are not yet accessible is being made available for harvest by outside groups; included in metadata feeds which primarily describe already-published datasets ‘Published dates’ for these datasets are listed as in the future Sometimes this data is specified as being under embargo until a specific date; other times no embargo is specified but there is a statement to the effect that the data will be made available within the next 6 months, year, etc. Intersection of publisher/repository/funder policies for data availability
26
Citation dates after the current date: questions
Future Citation Dates Citation dates after the current date: questions Is there friction between stakeholder policies for data availability (timing of article publication vs. data object publication, etc.)? Should dataset metadata be provided to indexers, etc, for data that has not yet been published?
27
Wrap up: consensus and next steps
Data Citation Metadata Elements Wrap up: consensus and next steps “Technical mechanisms for citation are only surface characteristics of the knowledge infrastructures in which they are embedded” - Christine Borgman Big Data, Little Data, No Data: Scholarship in the Networked World (2015)
28
Megan Force, Editor, Data Citation Index | (215) 823-6194 | megan
Megan Force, Editor, Data Citation Index | (215) | | clarivate.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.