OCLC Online Computer Library Center Erpanet Symposium on Persistent Identifiers PURLs Stuart Weibel Senior Research Scientist June 17, 2004
What do we want from Identifiers Authority Reliability Appropriate Functionality (resolution and other services) Persistence – throughout the life cycle of the information object What are the business models to support identifiers? Not just a matter of money, but costs are part of the equation
PURL: Persistent Uniform Resource Locators PURLs look like URLs… they ARE URLs PURLs emerged from OCLC’s participation in the IETF URN activity A tool for managing names and namespaces since 1996
The 404 Problem Resources disappear… Some are actually gone Disk reorganizations take place Changes in responsibility for resources occur… bought, sold, abandoned, removed URLs serve double duty as names and locators Making URLs symbolic names will improve their usefulness
What is a PURL? PURL: Persistent Uniform Resource Locators They look like URLs… they ARE URLs No new technology, no new protocols A toolset for managing names and namespaces
PURLs take advantage of inherent redirection facility in the HTTP protocol PURLs provide an additional level of indirection that maps a symbolic identifier to a network location PURLs work without plug-ins or other special code in browsers… they are ‘just’ URLs No New Technology added… a feature, not a bug How PURLs Work
Not a guarantee of perpetual access Not a magic solution to the 404 problem Not persistence of resources, but rather of the names PURLs are a toolset that can be used to manage resource names and locations with greater reliability What does Persistent mean?
Persistence derives from… The social or contractual commitments of organizations responsible for managing information resources. Technology can help, but the problem is, at its heart, a social one.
Logical Components of a PURL protocol resolver asset name address
PURL Server as a Redirection Server Client Resource Server resource PURL Server http GET http redirect http GET
PURL Server as a Resource Server Client resource PURL Server http GET
Do I have to run my own PURL Server? OCLC’s PURL Server is open to all, including the ability to request domains As of Monday, May 24, 2004 : PURLs Created = PURLs Resolved = Unique Client Systems = The PURL server software is available at the purl.org site for anyone to download and use without cost or restriction.
PURLs and The Identifier Layer Cake The Web: http…TCP/IP… Functionality Technology Policy Social Business
Functional Layer: Operational characteristics of Identifiers Is it globally unique? No problem – it’s a URL Matching persistence with the need? Organizational commitment Can a given identifier be reassigned? No Is it resolvable? Yes: To that which is assigned by the registrant How does it ‘behave’? Exactly like a URL, but managed Is the ‘name’ portion of the identifier opaque, or can it carry ‘semantics’? Determined by the registrant Do humans need to read and transcribe them? Probably Do identifiers need to be matched to the characteristics of the assets they identify? Determined by the registrant
PURL Technical Layer What dependencies are assumed? http What is the nature of the system Open Source, public domain Are servers centralized? federated? peer to peer? Distributed and stand-alone, but could be federated (see POIs, as an example) How is uniqueness assured? Inherent in the character of URLs
PURL Policy Layer Who has the ‘right’ to assign or distribute Identifiers? Anyone can register without cost Who has the ‘right’ to resolve them or offer services? Unspecified What are appropriate assets? Determined by the registrant Can identifiers be recycled? No Can ID-Asset bindings be changed? Yes, at the discretion of the registrant Is there supporting metadata? No intrinsic PURL metadata Is there a governance model? What you do in the privacy of your own PURL server is your own business
PURL Business model layer Who pays the cost? PURL.ORG is maintained by OCLC as a free service Anyone can run their own PURL server (and pay for it) How, and how much? Negligible costs Who decides? The server host The problem with identifier business models… Those who accrue the value are often not the same as those who bear the costs. Libraries are in the business, however, of aggregating costs and making them look free. You can’t collect revenue on resolution
PURL Social Layer: Who do you trust? Who do you trust? Governments? Cultural heritage institutions? Commercial entities? Non-profit consortia? It depends on the context, the service, and the motivations for the service.
In Summary PURLs offer a methodology and tool set for managing resource names and namespaces Neither PURLs nor any other technology are a replacement for policies or commitments to manage resource names PURLs represent a community-based solution founded in freely available, widely deployed technology.