rfc2141bis, rfc3406bis and the ISBN + NBN namespaces IETF 83, Paris, France Juha Hakala The National Library of Finland
The need for modernization RFC 2141 was adopted in It is based on original specification of URLs (RFC1808) and therefore does not use and – Other PID systems (Handle, ARK) are similar in this respect RFC 3406 does not conform to RFC 5226 (IANA procedures document), and revision of 2141 will have an impact on namespace definition procedures as well.
Conserning namespace registrations The changes made to RFC 2141 and 3406 up to now do not necessitate re-registration of existing namespaces; we have to revise RFCs 3044 & 3187 (ISSN and ISBN) because the identifier standards changed substantially However, rfc3188 (NBN) revision process was started because the national libraries want to use all the functionality the new syntax will offer
URN syntax (2141bis, version 02) Conforms fully to RFC 3986 Adding support was non-trivial, since there were many things to consider: – RFC 3986 requirements and the way in which Web browsers use fragments (they do not pass fragments to the server, but use them ”internally” to identify positions within retrieved documents) – Varying practices in different namespaces The outcome was a multi-tiered solution where RFC 3986 is always followed, but namespaces may have their internal solutions to fragment identification
URN syntax (2) The role of is restricted to indicate the requested URN resolution service and (possibly) parameters of that service – For instance, retrieve descriptive metadata about the resource in a particular format such as Dublin Core or MARC (used by libraries) Character set has been aligned slightly (to align the text with RFC 3986); namespace identifier (NID) syntax was discussed in more details but the issue is now settled – we will trust on common sense of IANA experts and people writing namespace registration requests
Remaining issues In general, version 02 of rfc2141bis is a mature document Since the draft builds upon RFC 2141 and RFC 3986, there were few open issues to start with, and nothing that would have been highly controversial, politically or technically Practical experience from using URNs (tens of millions have been assigned) has not revealed any design flaws in the syntax
Remaining issues (2) In order to prepare the draft for publication, we may want to: – Align the statements concerning the URN scope in different parts of the document. Introduction says that URN does not have a specific scope since its scope is the sum of the scopes of the namespaces; 7.1 claims that URNs serve as resource identifiers for concrete and abstract objects that have network accessible instances and/or metadata – Use the term resource when referring to what is being identified (instead or object, document, artefact etc.)
Remaining issues (3) Functional equivalence – Not properly specified in 2141bis; options: Two URNs within the same namespace resolve to the same instance of a resource; this should not happen Two URNs within the same namespace resolve to different instances of a resource; this is OK in some namespaces (but not in all of them; see e.g. rfc3187bis and rfc3188bis) Two URNs from different namespaces resolve to same or different instances of a resource; this is OK Two URNs resolve to the same resource in different levels (work, manifestation, fragment of a manifestation); this is OK – Existing namespace registrations do not discuss functional equivalence; in most namespaces this is not necessary since e.g. two URN:ISBNs should not be functionally equivalent (however, RFC3188bis will discuss this)
rfc3406bis The aim is to outline a mechanism and provide a template for URN namespace definition There are 40+ URN namespaces; the level of use and control of use varies a lot – Tens of millions of URN:NBNs have been assigned, making it the most popular bibliographic identifier ever; some other namespaces are ”dead” – Standard-based namespaces are strictly controlled as regards identifier assignment; there is virtually no control in some other namespaces such as URN:UUID
URN namespace definition mechanisms, version 02 Takes into account both the new features in rfc2141bis and the experiences gained so far from the namespace registration processes There has been no difficult issues, but the fact that RFC 2483 is out of date does have an impact on rfc3406bis as well – There is a need to specify which services must / should be supported in a namespace; it is hard to do this when some services are missing or lack essential functionality
Remaining issues Like rfc2141bis, 3406bis is much more detailed than the RFC it is based on, due to the understanding gained since the URN system was established Apart from the problems related to service specification, there are few open issues to discuss (as reflected by the lack of discussion on the URN-WG list) IMHO the most vital issue is a practical one: how can we make sure that the IANA experts approve of only those namespace registrations that deserve it, and how can rfc3406bis support their work? – A badly managed namespace undermines the value of the URN system as a whole – Overlap between namespaces is inevitable, but should be avoided if and when possible
rfc3188bis: general National Bibliography Number is not a standard identifier, but a set of identifier systems used (primarily) by the national libraries, following the local practices and needs NBNs used to be local identifiers, but using them as URNs renders them globally unique and actionable in the Internet The namespace has been in production use over a decade; tens of millions of identifiers have been assigned in several countries primarily in Europe – Digitized contents, harvested Web documents, e-deposit; generally materials that a) do not qualify for a true standard identifier, and b) is preserved long-term
NBN syntax & semantics Every NBN string has some embedded meaning URN:NBN consists of – ISO two letter country code URN:NBN:FI = Finland – Sub-division element (voluntary); the National Library must maintain a registry of these URN:NBN:FI:STAT = Statistics Finland – Publication element Beyond the requirements of URI/URN syntax specifications, there are no additional requirements for this section
URN:NBN and fragments NBN can be used to identify a fragment of a publication (section, chapter) – There will not be a namespace specific internal method for fragment identification; instead Physical fragments may be identified using the RFC 3986 procedure; this will produce standard browser functionality (the entire resource is retrieved) Logical fragments may be identified by ”normal” NBNs; in this case the result (e.g. a journal article) may not be a physical fragment but a complete file Logical fragments may also be identified by a local fragment syntax (to be recognized by the relevant resolvers)
rfc3188bis: status and plans Under development since 2010, first as a private contribution, then as the WG deliverable The text is mature as regards the syntax, but scope and functional equivalence could / should be discussed in more details – If two national libraries harvest the same resource3 into their web archives, they may assign different URN:NBNs to it – This is not a problem, since these URNs will resolve to different physical copies of the resource
rfc3187bis: about ISBN An ISO standard, established in early 70’s Persistent and unique identifier for books Each manifestation (hard cover, soft cover, PDF, ePUB) gets its own ISBN In theory the system has spread almost everywhere; in practice, there are a lot of countries where ISBN assignment is not working (properly / at all) There are two variants, ISBN-10 (up to 2006) and ISBN-13, specified in 2005 and used since 2007 Examples (ISBN-13) (ISBN-10) Syntactical differences are ”978” or ”979” in the beginning and the checksum calculation algorithm, which is compliant with EAN in ISBN-13
Resolution of URN:ISBNs ISBN is ”semantic” (non-dumb) identifier: 978 = Prefix element (EAN ”book land” code; also 979) = Registration group element (for English language; also 978-1) 395 = Registrant element (Publisher ID) = Publication element 6 = Check digit There is no single point where all ISBNs could be resolved (note the difference with the ISSN ), so URN:ISBN must contain a hint of where to find resolver This hint is the registration group element; in some cases it provides a good hint (951 = Finland), but occasionally it is less useful (3 = Germany, Austria and German-speaking parts of Switzerland
rfc3187bis version 02 The current draft is (relatively) mature Namespace registration request has been extended so that it takes into account both ISBN-10 and ISBN-13 Fragment usage has been specified – Complete ISBNs can be assigned to logical fragments of a book, but it is not possible to add anything to the identifier string to indicate a fragment, either in the spirit of RFC 3986 or otherwise
rfc3187bis: status and plans Include discussion on functional equivalence – Two different ISBNs should never resolve to the same thing (e.g. a manifestation of a book) – Two ISBNs may resolve to different manifestations of the same work (and be interconnected via the work level metadata) – Two ISBNs may resolve to the same manifestation of a book on different levels (an entire book / a single chapter within the book)
rfc3187bis: status and plans (2) Indicate which resolution services are necessary in the URN namespace – For instance: retrieve descriptive / administrative metadata; fetch the resource or a list of locations; retrieve metadata about the work and related manifestations of the work Polish the language – Make sure that just the terms ”resource” or ”book” are used – Remove remaining occurrences RFC 2119 terms not written in capital letters so as to avoid confusion