Identifiers and trust: lessons for data publishers Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of.

Identifiers and trust: lessons for data publishers
Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of the lessons learned in the scholarly publishing world that may be valuable for the digital curators of datasets and databases. I think that repositories that organise, maintain and preserve content must start to act like publishers and adopt some of the practices employed by scholarly publishers particularly in the areas of identifiers and citation and establishing trust with whoever might find the data useful. Take a step back - look at existing publishing identifiers Valued Resources: Roles and Responsibilities of Digital Curators and Publishers FOURTH BLOOMSBURY CONFERENCE ON E-PUBLISHING AND E-PUBLICATIONS, 24 & 25 JUNE 2010

Publishing industry use of identifiers has a venerable history going back to the late 1960s and these were identifiers to facilitate trading and the supply chain- the ISBN was developed for publishers and book stores in the late 1960s becoming an ISO standard in 1970 and then got incorporated into bar codes. The ISSN is an identifier for serial publications - magazines and journals and became an ISO standard in it’s bit different in that it’s focused on library cataloguing and issued by National Libraries in many countries but is still a supply chain/trading identifier to improve transactions between libraries, publishers and agents. Very successful -worked well for a long time.

A different type of identifier came along with the World Wide Web - the URL - the Uniform Resource Locator which is a link to a specific location - it identifies a resource and specifies how to get it. This was very different from the ISBN and ISSN which were not actionable identifiers. The URL concept broaden out to that of the URI - the Uniform Resource Identifier which is now the basis of a lot of semantic web activity. The development of the WWW with HTML and the HTTP protocol was all about linking and this tied in very well with something that was very important to scholarly journals...

and that is citations - citation is at the core of scientific research and scholarly journal articles - authors acknowledge and give credit to the work of others and to show that they’ve done their due diligence. citations drive prestige, authors careers, and funding and tenure decisions.

In a 1998 paper two unknown Stanford grad students laid out their proposals for something called Page Rank which they stated was specifically based on traditional scholarly citation analysis as developed by Eugene Garfield. There often a lot of talk that publishers don’t “get” the web and new developments but I think this story shows that there is a lot of interconnection between the old and the new.

As scholarly publishers started linking references they encountered two problems. 1) URLs change and links break and 2) just one set of references in one article can potentially link to dozens of other organizations. Broken links are frustrating for all web users but it is particularly bad for scholarly publishers and undermines trust in the system

No matter how you try and make it better - 404 error messages are bad.

and broken links lead to unhappy users
and broken links lead to unhappy users. The issues of links going bad - and over time an increasing number of links going bad has been given a name - Link Rot.

Link rot has been documented in the journal literature
Link rot has been documented in the journal literature. So to overcome the problem of broken links and the issue of scalability - the need for thousands of publishers to link to one another lead to the use of the DOIs and the creation of CrossRef as a registration agency for DOIs and metadata in order to run a citation linking system.

Identifiers + Metadata Services + Community =
I’m not going to talk specifically about CrossRef and DOIs but effectively what has happened is that scholarly publishers - through CrossRef - have built a registry of identifiers and metadata on which they’ve build services for the community to achieve persistent citation and linking. That’s worked out pretty well. But now I want to talk about trust because this is something that publishers are still struggling with in the online world. Services + Community = Persistent citation and linking

Trust is something that has to be built over time - and it has to be earned. As my colleague Geoffrey Bilder likes to say - the worst way to get somebody to trust you to is to just say “trust me”. publishers have built corporate brands which may have some importance to libraries but none for researchers. What researchers care about are the journal brands. Journals build their brands by performing essential services of Registration, Certification, promoting awareness and maintaining context. dataset repositories will have to build trust in the same way but developing these same services. Registration, Certification, Awareness, Stewardship: editorial, production, marketing, access

Publishers have two problems though - a lot of what they do to add value pre-publication is unseen and therefore unappreciated and what they do post-publication in terms of stewarding content also isn’t appreciated. Journals also build trust through the largely unseen processes that occur between submission and publication - this is an overview of the editorial and production process from an article describing the Open Journal System and really captures what goes into producing a scholarly journal and demonstrates the added value that publishers provide but to a large extent this whole process is invisible to readers. Open Journal Systems: An example of open source software for journal management and publishing, J Willinsky. Library Hi Tech. 2005, Vol 23, Issue 4, p 504 doi: /

Certification of the version of record
The perception that the publishing process is completed when the “final version” of an article is published is deadly for publishers. There is no “final” version of content. There is a fixed version of record that is certified by the publisher but the publisher also plays a crucial role in the ongoing stewardship of content - it is the publisher who issues updates, corrections, retractions and withdrawals. Certification of the version of record Ongoing stewardship of scholarly content

Version of record Scholarly Publishing Roundtable (US House Committee on Science and Technology/White House Office of Science and Technology Policy) To the fullest extent possible, access should be to the definitive version of journal articles — the version of record (VoR) produced and stewarded by the publisher.

Versions & Citation When is something a new version?
When does something get a new identifiers? Focus on citation: if something will change the interpretation of a work it gets a new identifier Must keep older versions - users should get to what was cited

"together we can create a reality that we all agree on — the reality we just agreed on…any user can change any entry, and if enough users agree with them, it becomes true."

Sir Tim told BBC News that there needed to be new systems that would give websites a label for trustworthiness once they had been proved reliable sources…So I'd be interested in different organisations labeling websites in different ways.

Industry Problems The scholarly pre-publication process is largely invisible The common belief that the publisher’s job is done on publication of the “final” version A proliferation of versions of content online that are not stewarded Trust metrics have not been established on the web So to summarize some of the issues behind the CrossMark project we have the fact that ....

CrossMark A logo identifying a publisher certified version of record
Clicking the logo tells you: If the copy is publisher- maintained and if there have been corrections Where the publisher- maintained version is Other metadata the publisher chooses to include

Enables Researchers to
Easily determine if they are looking at a publisher-maintained version of record and if not, a link to the publisher version Easily ascertain the current status of the document and if there have been updates Easily access and use any non- bibliographic metadata the publisher has provided

Enables Publishers to Identify the publisher-maintained version of record Emphasize initial certification of the version of record AND ongoing stewardship Highlight and disseminate corrections in an industry standard way Highlight other (often invisible) steps taken to ensure the trustworthiness of the content

Things to think about If it’s not online it doesn’t exist
If it’s not linked it doesn’t exist The identifier is only one small piece of the puzzle Any ID must unique, persistent and discoverable Sustainable infrastructure - technical and social

articles vs data CrossRef builds on existing citation practices established over 350 years Reward system firmly established for articles and article citation Not the case for data: social aspects are much harder than the technical Collaboration critical to interlink data and articles Data is different - publishers don’t want it!

Conclusion Identifiers are tools to enable services and are useless without metadata Editorial selection and citation practices are critical More work is needed to establish trust metrics online Journals must establish data policies requiring deposit in appropriate repositories

Data publishers must... Establish trust through editorial and production processes Certify and steward versions of record for citation purposes (and keep old versions!) so researchers get credit Create system of persistent, actionable IDs and authoritative metadata Develop a community and services

Curing something can lead to very tasty results!

Ed Pentz

proliferation of versions of content with very little information for readers about what’s what.

Identifiers and trust: lessons for data publishers Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of.

Similar presentations

Presentation on theme: "Identifiers and trust: lessons for data publishers Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Identifiers and trust: lessons for data publishers Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of.

Similar presentations

Presentation on theme: "Identifiers and trust: lessons for data publishers Good afternoon - today I’m going to be speaking to you about identifiers and trust and some of."— Presentation transcript:

Similar presentations

About project

Feedback