Identifiers and trust: lessons for data publishers Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of.

Slides:



Advertisements
Similar presentations
Identifiers and trust: lessons for data publishers Valued Resources: Roles and Responsibilities of Digital Curators and Publishers FOURTH BLOOMSBURY.
Advertisements

The future – what needs to happen? The publisher view Ed Pentz Executive Director, CrossRef Discovery and access: standards and the information chain,
CrossRef Linking and Library Users “The vast majority of scholarly journals are now online, and there have been a number of studies of what features scholars.
The Changing Scholarly Communications Infrastructure Ed Pentz, Executive Director, CrossRef.
introduction to MSc projects
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
Institutional Perspective on Credit Systems for Research Data MacKenzie Smith Research Director, MIT Libraries.
The Urge to Merge Kathleen A. Hansen, Professor University of Minnesota School of Journalism and Mass Communication SLA, Toronto, June 8, 2005 Kathleen.
Update on the VERSIONS Project for SHERPA-LEAP SHERPA Liaison Meeting UCL, 29 March 2006.
Piero Attanasio mEDRA: the European DOI agency The DOI as a tool for interoperability between private and public sector Athens, 14 January.
1 Chuck Koscher, CrossRef New Developments Relating to Linking Metadata Metadata Practices on the Cutting Edge May 20, 2004 Chuck Koscher Technology Director,
CrossRef, DOIs and Data: A Perfect Combination Ed Pentz, Executive Director, CrossRef CODATA ’06 Session K4 October 25, 2006.
Agenda: DMWG SM policy status ESIP meeting recap Reminder - DM Webinar Series New and updated web pages on DM website Metadata Training Sessions CDI meeting.
Recommendation “Landing Pages” RDAP this is last-minute filler, as I only found out the day before that one of panel members couldn’t make it, so.
1 CrossRef - a DOI Implementation for Journal Publishers January 29, 2003 CENDI Workshop.
1 Ed Pentz, CrossRef CrossRef and DOIs: New Developments 32 nd LIBER Annual General Conference Extending the Network: libraries and their partners 18 June.
VERSIONS Project Workshop London School of Economics and Political Science 10 May 2006.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
Scholarly communications Discussion group Linked Data Workshop May 2010.
Presentation to Legal and Policy Issues Cluster JISC DRP Programme Meeting 28 March 2006.
Dr Jamal Roudaki Faculty of Commerce Lincoln University New Zealand.
1 Annual Meeting 2004 CrossRef Publishers International Linking Association, Inc Charles Hotel, Cambridge, MA November 9 th, 2004.
The Many Facets of Metadata Exchange Between Publishers and the Research Community: The Role that A&I Services and DOIs Play in Providing Access to Electronic.
Weaving Data into the Scholarly Information Network UNECE Work Session on the Communication of Statistics OECD Conference Centre, Paris June 30 - July.
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.
Citing Datasets. Research: search for knowledge or any systematic investigation to establish facts. And to establish facts, one needs Data.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.
International Planetary Data Alliance Registry Project Update September 16, 2011.
NRF Open Access Statement
CESSDA SaW Training on Trust, Identifying Demand & Networking
Demonstrating Scholarly Impact: Metrics, Tools and Trends
SEARCH ENGINE OPTIMIZATION
An Introduction to the Institutional Repository
1.4 Wired and Wireless Networks
From the old to the new… Towards better resource discoverability
Open Access, Research Funders, Research Data, and the REF
Making Sense of the Alphabet Soup of Standards
Research software best practices: Transparency, credit, and citation
Telling the Whole Story
Primary vs. Secondary Sources
Publishing software and data
SowiDataNet - A User-Driven Repository for Data Sharing and Centralizing Research Data from the Social and Economic Sciences in Germany Monika Linne, 30.
Education of a scientist video
Institutional role in supporting open access, open science, open data
University of Nigeria, Nsukka
Linking persistent identifiers at the British Library
SEARCH ENGINE OPTIMIZATION
CNI Spring 2010 Membership Meeting
Suffolk Public Schools
A Brief Introduction to the Internet
Access  Discovery  Compliance  Identification  Preservation
Module 6: Preparing for RDA ...
Open Access to your Research Papers and Data
Getting your research noticed
SFU Open Access Policy Endorsed by Senate January 9, 2017
Journals, homepage of publishers Mozilla Firefox/Google Chrome
Standards For Collection Management ALCTS Webinar – October 9, 2014
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Name authority control in an evolving landscape
Research Data Management
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
An Open Archival Repository System for UT Austin
Understanding the purpose and process of research
Developing Strong Content and Conducting Research
Interlinking standards, repositories and policies
….part of the OSU Libraries' suite of digital library tools…
Verifying Sources.
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
Fundamental Science Practices (FSP) of the U.S. Geological Survey
Presentation transcript:

Identifiers and trust: lessons for data publishers Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of the lessons learned in the scholarly publishing world that may be valuable for the digital curators of datasets and databases. I think that repositories that organise, maintain and preserve content must start to act like publishers and adopt some of the practices employed by scholarly publishers particularly in the areas of identifiers and citation and establishing trust with whoever might find the data useful. Take a step back - look at existing publishing identifiers Valued Resources: Roles and Responsibilities of Digital Curators and Publishers FOURTH BLOOMSBURY CONFERENCE ON E-PUBLISHING AND E-PUBLICATIONS, 24 & 25 JUNE 2010

Publishing industry use of identifiers has a venerable history going back to the late 1960s and these were identifiers to facilitate trading and the supply chain- the ISBN was developed for publishers and book stores in the late 1960s becoming an ISO standard in 1970 and then got incorporated into bar codes. The ISSN is an identifier for serial publications - magazines and journals and became an ISO standard in 1971 - it’s bit different in that it’s focused on library cataloguing and issued by National Libraries in many countries but is still a supply chain/trading identifier to improve transactions between libraries, publishers and agents. Very successful -worked well for a long time.

A different type of identifier came along with the World Wide Web - the URL - the Uniform Resource Locator which is a link to a specific location - it identifies a resource and specifies how to get it. This was very different from the ISBN and ISSN which were not actionable identifiers. The URL concept broaden out to that of the URI - the Uniform Resource Identifier which is now the basis of a lot of semantic web activity. The development of the WWW with HTML and the HTTP protocol was all about linking and this tied in very well with something that was very important to scholarly journals...

and that is citations - citation is at the core of scientific research and scholarly journal articles - authors acknowledge and give credit to the work of others and to show that they’ve done their due diligence. citations drive prestige, authors careers, and funding and tenure decisions.

In a 1998 paper two unknown Stanford grad students laid out their proposals for something called Page Rank which they stated was specifically based on traditional scholarly citation analysis as developed by Eugene Garfield. There often a lot of talk that publishers don’t “get” the web and new developments but I think this story shows that there is a lot of interconnection between the old and the new.

As scholarly publishers started linking references they encountered two problems. 1) URLs change and links break and 2) just one set of references in one article can potentially link to dozens of other organizations. Broken links are frustrating for all web users but it is particularly bad for scholarly publishers and undermines trust in the system

No matter how you try and make it better - 404 error messages are bad.

and broken links lead to unhappy users and broken links lead to unhappy users. The issues of links going bad - and over time an increasing number of links going bad has been given a name - Link Rot.

Link rot has been documented in the journal literature Link rot has been documented in the journal literature. So to overcome the problem of broken links and the issue of scalability - the need for thousands of publishers to link to one another lead to the use of the DOIs and the creation of CrossRef as a registration agency for DOIs and metadata in order to run a citation linking system.

http://dx.doi.org/10.1098/rspb.2010.0258

Identifiers + Metadata Services + Community = I’m not going to talk specifically about CrossRef and DOIs but effectively what has happened is that scholarly publishers - through CrossRef - have built a registry of identifiers and metadata on which they’ve build services for the community to achieve persistent citation and linking. That’s worked out pretty well. But now I want to talk about trust because this is something that publishers are still struggling with in the online world. Services + Community = Persistent citation and linking

Trust is something that has to be built over time - and it has to be earned. As my colleague Geoffrey Bilder likes to say - the worst way to get somebody to trust you to is to just say “trust me”. publishers have built corporate brands which may have some importance to libraries but none for researchers. What researchers care about are the journal brands. Journals build their brands by performing essential services of Registration, Certification, promoting awareness and maintaining context. dataset repositories will have to build trust in the same way but developing these same services. Registration, Certification, Awareness, Stewardship: editorial, production, marketing, access

Publishers have two problems though - a lot of what they do to add value pre-publication is unseen and therefore unappreciated and what they do post-publication in terms of stewarding content also isn’t appreciated. Journals also build trust through the largely unseen processes that occur between submission and publication - this is an overview of the editorial and production process from an article describing the Open Journal System and really captures what goes into producing a scholarly journal and demonstrates the added value that publishers provide but to a large extent this whole process is invisible to readers. Open Journal Systems: An example of open source software for journal management and publishing, J Willinsky. Library Hi Tech. 2005, Vol 23, Issue 4, p 504 doi:10.1108/07378830510636300

Certification of the version of record The perception that the publishing process is completed when the “final version” of an article is published is deadly for publishers. There is no “final” version of content. There is a fixed version of record that is certified by the publisher but the publisher also plays a crucial role in the ongoing stewardship of content - it is the publisher who issues updates, corrections, retractions and withdrawals. Certification of the version of record Ongoing stewardship of scholarly content

Version of record Scholarly Publishing Roundtable (US House Committee on Science and Technology/White House Office of Science and Technology Policy) To the fullest extent possible, access should be to the definitive version of journal articles — the version of record (VoR) produced and stewarded by the publisher.

Versions & Citation When is something a new version? When does something get a new identifiers? Focus on citation: if something will change the interpretation of a work it gets a new identifier Must keep older versions - users should get to what was cited

"together we can create a reality that we all agree on — the reality we just agreed on…any user can change any entry, and if enough users agree with them, it becomes true."

Sir Tim told BBC News that there needed to be new systems that would give websites a label for trustworthiness once they had been proved reliable sources…So I'd be interested in different organisations labeling websites in different ways.

Industry Problems The scholarly pre-publication process is largely invisible The common belief that the publisher’s job is done on publication of the “final” version A proliferation of versions of content online that are not stewarded Trust metrics have not been established on the web So to summarize some of the issues behind the CrossMark project we have the fact that ....

CrossMark A logo identifying a publisher certified version of record Clicking the logo tells you: If the copy is publisher- maintained and if there have been corrections Where the publisher- maintained version is Other metadata the publisher chooses to include

Enables Researchers to Easily determine if they are looking at a publisher-maintained version of record and if not, a link to the publisher version Easily ascertain the current status of the document and if there have been updates Easily access and use any non- bibliographic metadata the publisher has provided

Enables Publishers to Identify the publisher-maintained version of record Emphasize initial certification of the version of record AND ongoing stewardship Highlight and disseminate corrections in an industry standard way Highlight other (often invisible) steps taken to ensure the trustworthiness of the content

Things to think about If it’s not online it doesn’t exist If it’s not linked it doesn’t exist The identifier is only one small piece of the puzzle Any ID must unique, persistent and discoverable Sustainable infrastructure - technical and social

articles vs data CrossRef builds on existing citation practices established over 350 years Reward system firmly established for articles and article citation Not the case for data: social aspects are much harder than the technical Collaboration critical to interlink data and articles Data is different - publishers don’t want it!

Conclusion Identifiers are tools to enable services and are useless without metadata Editorial selection and citation practices are critical More work is needed to establish trust metrics online Journals must establish data policies requiring deposit in appropriate repositories

Data publishers must... Establish trust through editorial and production processes Certify and steward versions of record for citation purposes (and keep old versions!) so researchers get credit Create system of persistent, actionable IDs and authoritative metadata Develop a community and services

Curing something can lead to very tasty results!

Ed Pentz epentz@crossref.org

proliferation of versions of content with very little information for readers about what’s what.