Persistent identifiers: the 7 levels of identification Juha Hakala Helsinki University Library ELAG 2005 1-3 June 2005, CERN.

Slides:



Advertisements
Similar presentations
Serials identification and the electronic environment F. Pellé, ISSN IC Cairo, October 2001.
Advertisements

COUNTER: improving usage statistics Peter Shepherd Director COUNTER December 2006.
John Espley and Robert Pillow ALA New Orleans 26 June 2011 The RDA Sandbox and RDA Implementation Scenario One.
Harvesting and archiving the Web Nordunet2000, Juha Hakala Helsinki University Library.
DOI Syntax - NISO Standard? Ed Pentz Academic Press.
Integrating the DOI with Intra- organization Legacy Systems WWW8 Conference - DOI Workshop Toronto, May 11, 1999 Andy Stevens John Wiley & Sons, Inc.,
CrossRef Linking and Library Users “The vast majority of scholarly journals are now online, and there have been a number of studies of what features scholars.
July 2010 D2.1 Upgrading strategy Javier Soto Catalog Release 3. Communities.
Persistent identifiers – an Overview Juha Hakala The National Library of Finland
Special collections and digital libraries: a new role for consortia? Dale Flecker Harvard University Library.
The Literature Review as an integral part of PhD Research
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
School of Computing and Mathematical Sciences
Interface for the University Library Catalogue Implementing Direct Manipulation Proposal 4.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Developing PANDORA Mark Corbould Director, IT Business Systems.
Distributed Computing COEN 317 DC2: Naming, part 1.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
IAEA International Atomic Energy Agency Agenda item 3.3 INIS IT developments 13th INIS/ETDE Joint Technical Committee Meeting October 2011, Vienna,
Rfc2141bis, rfc3406bis and the ISBN + NBN namespaces IETF 83, Paris, France Juha Hakala The National Library of Finland.
UNIMARC : what next? Alan Hopkinson Chairman Permanent UNIMARC Committee.
Identifiers for the digital world Brian Green EDItEUR / International ISBN Agency The Book Business and International Information Standards EDItEUR Seminar,
Simple Program Design Third Edition A Step-by-Step Approach
European digital repositories: an overview ELAG 2006, Bucharest Juha Hakala Helsinki University Library.
The world’s libraries. Connected. WorldShare platform & Management Services Integrate all of your collections: print, licensed & digital Chris Thewlis.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Chapter 17 Domain Name System
Link Resolvers: An Introduction for Reference Librarians Doris Munson Systems/Reference Librarian Eastern Washington University Innovative.
Copy cataloguing in Finland Juha Hakala The National Library of Finland
1 Guidelines For The Future Sharing Best Practice For National Bibliographies In The Digital Era Neil Wilson Information Coordinator IFLA Bibliography.
Linking resources Praha, June 2001 Ole Husby, BIBSYS
Jenn Riley Metadata Librarian IU Digital Library Program New Developments in Cataloging.
Distributed Computing COEN 317 DC2: Naming, part 1.
DOI Workshop, Luxembourg - 20 May Identifiers in Context Andy Powell UKOLN University of Bath UKOLN.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
Persistent Identifiers: A Publisher’s Perspective Cliff Morgan, John Wiley & Sons Ltd ERPANET Seminar on Persistent Identifiers University College Cork,
Identifiers for Digitised Heritage Danijela Getliher Jasenka Zajec National and University Library in Zagreb The Seventh SEEDI Conference Digitisation.
European Endeavor Users Group Meeting Helsinki, Sept Esa-Pekka Keskitalo, System Analyst Helsinki University Library OpenURL 1.0.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Digital library projects in the Nordic national libraries Juha Hakala Helsinki University Library – The National Library of Finland.
Robert Pillow, VTLS Inc. How Will RDA Impact Your System? A Forum of Vendors Discussing Implementation Plans Association for Library Collections & Technical.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
1 Kyung Hee University Chapter 18 Domain Name System.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
C-2-C Industry workshop The future starts with DRM.
ISAN: International Standard Audiovisual Number Hollywood Post Alliance Technology Retreat January 27 & 28, 2005 S. Merrill Weiss Merrill Weiss Group LLC.
1 Not So Strange Bedfellows: Information Standards For Librarians AND Publishers November 6, 2015.
COMMON COMMUNICATION FORMAT (CCF). Dr.S. Surdarshan Rao Professor Dept. of Library & Information Science Osmania University Hyderbad
Digital Object Identifier doi> Norman Paskin The International DOI Foundation W3C DRM workshop January 22/
Tiziana // Alessandra Lenzi - MG Breaking down the walls Project Museo Galileo and the Linked Open Data A joint project between.
Sally McCallum Library of Congress
CNR – National Research Council, Rome (IT) Central Library ‘G. Marconi’ National Centre for Grey Literature and National ISSN Centre CNR – National Centre.
7-1 Holdings Session 7 Trends & Issues in MARC 21 Holdings CONSER Publication Patterns Initiative Publication history Current issues with MARC 21 Holdings.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
Joint Declaration of Data Citation Principles (Overview) The Data Citation Synthesis Group Joint Declaration.
URN resolution via Z39.50 August 1999 Z39.50 Tutorial, Stockholm Juha Hakala Helsinki University Library
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Metadata: an overview Alan Hopkinson ILRS Middlesex University.
COUNTER Code of Practice - an introduction to Release 4
PIDs and National PID Services
From the old to the new… Towards better resource discoverability
Peter Shepherd COUNTER March 2012
Towards connecting geospatial information and statistical standards in statistical production: two cases from Statistics Finland Workshop on Integrating.
Linking persistent identifiers at the British Library
Journal separation anxiety
PREMIS Tools and Services
Recording the Attributes of Series MARC21 in NACO RDA Series Authority Records Welcome back, everyone. In this module, we are going to continue talking.
Presentation transcript:

Persistent identifiers: the 7 levels of identification Juha Hakala Helsinki University Library ELAG June 2005, CERN

Persistence?  Is not dependent on the identifier itself, but on legal, organisational and technical infrastructure ISSN would collapse without the ISSN standard, a community using it according to the generally accepted principles, ISSN International Centre governing the system and the ISSN database linking the non-semantic (that is, dumb) identifiers to serials ISSN would collapse without the ISSN standard, a community using it according to the generally accepted principles, ISSN International Centre governing the system and the ISSN database linking the non-semantic (that is, dumb) identifiers to serials  Even a technically brilliant system may be discontinued if its mission breaks apart

”Normal” identifiers and resolution services  Resolution services are a new brand of identifiers which render traditional identifier systems actionable in the Internet (Web) environment Resolve: provide a link from reference to the resource Resolve: provide a link from reference to the resource  Prime examples: DOI and URN Both may encompass, at least in principle, any existing identifier (URN namespaces have been defined for e.g. ISSN and ISBN) Both may encompass, at least in principle, any existing identifier (URN namespaces have been defined for e.g. ISSN and ISBN) Both are useless without an existing identifier adding flesh to the DOI/URN bones Both are useless without an existing identifier adding flesh to the DOI/URN bones  From now on, only ”normal” identifiers will be discusses Complex enough topic for 35 minutes… Complex enough topic for 35 minutes…

Seven levels of identifiers  After the collapse of integrated library system paradigm, and implementation of IR portals, digital asset management systems, digital archives, e-resource management systems, what do we need to identify? This can be analysed from top to bottom, from organisations to search attributes This can be analysed from top to bottom, from organisations to search attributes Such analysis may show gaps and help in design of identifier systems Such analysis may show gaps and help in design of identifier systems

Top level: libraries  Identifier system must cover at least other (memory) organisations  National level (union catalogue codes) exists; due to the Internet / Web it became necessary to develop an international system  ISIL, International Standard Identifier for Libraries and Related Organisations; ISO Consists of ISO country code, hyphen and UC code Consists of ISO country code, hyphen and UC code FI-H (Helsinki University Library)FI-H (Helsinki University Library)  Danish Library Authority hosts the ISIL IC; national centres have been established in some countries but the system needs wider acceptance

2nd level: collections and services  These identifiers are important for IR portals; international exchange of collection & service (e.g. a Z39.50 server) metadata is cumbersome unless there is an efficient means for duplicate control  These identifiers do not exist yet Helsinki University Library is writing a New Work Item proposal for ISO TC 46 on ISCI; International Standard Collection Identifier Helsinki University Library is writing a New Work Item proposal for ISO TC 46 on ISCI; International Standard Collection Identifier No on-going efforts to develop service ID No on-going efforts to develop service ID

ISCI: design principles  Will be based on ISIL in order to allow efficient decentralization of the ISCI assignment and creation of Internet-wide resolution service without a global ISCI DB  Will consist of three parts: ISIL, delimiting character (colon) and the actual (colon- less) collection identifier FI-H:Slavica (Slavic collection in HUL) FI-H:Slavica (Slavic collection in HUL)  Need for an international support center?

3rd level: authors  International exchange of authority records can be made more efficient with persistent and unique identification  ISADN, International Standard Authority Data Number, has been discussed for quite a few years, but it is not yet formally under development  Retrospective assignment may create interesting ”ownership” problems, especially if the future ISADN contains country of origin Is Franz Liszt German or Hungarian? Is Franz Liszt German or Hungarian?

4rd level: identifiers for works  ISWC: International Standard Musical Work Code T T Letter T, 9-digit unique number and check digitLetter T, 9-digit unique number and check digit  ISAN: International Standard Audiovisual Number ISAN 006A-15FA-002B-C95F-A ISAN 006A-15FA-002B-C95F-A 12-digit root segment + 4-digit segment for episode identification and check digit12-digit root segment + 4-digit segment for episode identification and check digit  ISTC: International Standard Text Code ISTC OA B4A105 6 ISTC OA B4A105 6 agency code, year, work element & check digitagency code, year, work element & check digit  These systems were developed at the same time, but their syntax and terminology used varies This should not complicate usage too much This should not complicate usage too much

ISTC/ISWC/ISAN issues  Many library system vendors are investigating the possibility of implementing FRBR, but few have been capable of doing it (VTLS, OCLC)  Once an ILMS is frbrized, implementing work identifiers is essential, but there is more than technology to consider here: Do we need to pay for these identifiers; even when retrospectively generating them for old works? Do we need to pay for these identifiers; even when retrospectively generating them for old works? Who will establish the national centers and create the identifiers (and work level records they require)? Who will establish the national centers and create the identifiers (and work level records they require)?

5th level: manifestations  This used to be familiar terrain for us ISBN, ISSN, NBN belong here ISBN, ISSN, NBN belong here  E-publishing has destroyed the old status quo: Systems that worked well for decades have adaptation problems for different reasons Systems that worked well for decades have adaptation problems for different reasons It is not yet entirely clear if the revisions done (or planned) are sufficient It is not yet entirely clear if the revisions done (or planned) are sufficient

E-problems with manifestations  It is increasingly difficult to define valid ”targets” ISSN could be assigned to any Web site out there ISSN could be assigned to any Web site out there Publishers want to give ISBNs to anything that can in principle be sold separately (e-book chapters, images within a book, teddy bears on sale in book stores) Publishers want to give ISBNs to anything that can in principle be sold separately (e-book chapters, images within a book, teddy bears on sale in book stores)  The number of things to be identified is growing fast; this will cause syntax problems (ISBN revision was done to make more room) and staff issues in ISSN/ISBN national centers There is no point to give a persistent identifier to a non-persistent resource; therefore resources must be identified, described & archived which is labour- intensive process There is no point to give a persistent identifier to a non-persistent resource; therefore resources must be identified, described & archived which is labour- intensive process

Case ISBN  The old ISBN was running out of number space  Several extension options were discussed: 13, 16, even 32-digit ISBNs 13, 16, even 32-digit ISBNs The idea to make ISBN a ”dumb” number such as ISSN was voted down (for this the librarians in the WG are to blame) The idea to make ISBN a ”dumb” number such as ISSN was voted down (for this the librarians in the WG are to blame)  The new ISBN will be compliant with the EAN system 13 digits, starting with 978, 979 or in the future with something else to extend the scope of the system further 13 digits, starting with 978, 979 or in the future with something else to extend the scope of the system further New check digit calculation algorithm adopted from EAN New check digit calculation algorithm adopted from EAN It is possible to convert from an old ISBN to the new (starting with 978) and back It is possible to convert from an old ISBN to the new (starting with 978) and back  Publishers retroconvert to new ISBNs; libraries will keep the old ones ILMS need to do sophisticated things with old/new ISBNs ILMS need to do sophisticated things with old/new ISBNs

6th level: component parts  Libraries have not done too well in this area in the past due to staff limitations We catalogue serials but not the articles We catalogue serials but not the articles  E-publishing may force us to change tactics since now even component parts are separate items accessible directly  Manual processing must be partially or fully be replaced by automated processes; this will also have an impact on identifiers Automated ID generation solves the staff bottleneck Automated ID generation solves the staff bottleneck

SICI: still alive, but not kicking  Serial Item and Component Identifier, NISO standard; has never really taken off NISO standard; has never really taken off Can be generated programmatically provided that the article is structured enough Can be generated programmatically provided that the article is structured enough (199502/03)21:3<>1.0.TX;2-Y (199502/03)21:3<>1.0.TX;2-Y Complex; consists of ISSN and stuff identifying the issue and article within it Complex; consists of ISSN and stuff identifying the issue and article within it Publishers have their own systems like PII which have been easier to create and maintain (for them) Publishers have their own systems like PII which have been easier to create and maintain (for them) Still not clear how popular SICI will eventually be Still not clear how popular SICI will eventually be

BICI: Dead On Arrival, or conflict between theory and practice  Book Item and Contribution Identifier  NISO draft standard, never completed  Consists of ISBN and extra stuff to identify the relevant section within the book; may be automatically generated  Publishers & book stores prefer to rely solely on ISBN in their systems Using ISBN only is not a neat solution (uses a lot of ISBNs, and giving ISBN both for the thing as a whole and its component parts is messy) Using ISBN only is not a neat solution (uses a lot of ISBNs, and giving ISBN both for the thing as a whole and its component parts is messy)

7th level: search attributes etc.  Within Z39.50, sets (e.g. attribute and diagnostic), record syntaxes etc. are identified by ISO Object Identifiers MARC21: MARC21: Bib-1: ; term examples: Bib-1: ; term examples: Author: Author: Name: Name: Author-name personal: Author-name personal: Personal name: Personal name:

OID problems  Bib-1 attribute set is not quite as coherent as it should be, there are lots of (domestic) search attributes missing from it, and sometimes there are too many alternatives  Attempt to develop Bib-2 failed, and even if we succeed in the future, co-existence of Bib-1 and Bib-2 may cause trouble  ISO OIDs can be applied to anything Not clear how to use them in ”bibliographic context” to e.g. identify government publications or parts of them; this is currently being investigated in Finland Not clear how to use them in ”bibliographic context” to e.g. identify government publications or parts of them; this is currently being investigated in Finland

Conclusion  E-publishing and new applications (and their novel metadata) have expanded both the scope of identifiers needed and the requirements towards existing systems, especially on manifestation & component parts levels  Standards developers have reacted to these needs, but the progress has been slow; still, on some areas system builders have been even more slow

Conclusion (2)  Identifier is more than just a string of characters There must be an agent which assigns the identifier to a resource, and (usually) describes it There must be an agent which assigns the identifier to a resource, and (usually) describes it  As long as all parts in this picture are stable, identification is a routine process  Agent breakdowns have been the most common reason for problems in the past Number of national ISSN agencies are non-active Number of national ISSN agencies are non-active  E-resources have destroyed the balance, and it may take a while before the identification system works again in ”business as usual” style