Old Dominion University Department of Computer Science

Slides:



Advertisements
Similar presentations
The Corporation for National Research Initiatives The Handle System Persistent, Secure, Reliable Identifier Resolution.
Advertisements

OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
LOCALIZED REFERENCE LINKING PROJECT Dale Flecker NFAIS/NISO Linking Workshop February 24, 2002 Philadelphia.
DDI3 Uniform Resource Names: Locating and Providing the Related DDI3 Objects Part of Session: DDI 3 Tools: Possibilities for Implementers IASSIST Conference,
URI IS 373—Web Standards Todd Will. CIS Web Standards-URI 2 of 17 What’s in a name? What is a URI/URL/URN? Why are they important? What strategies.
ISP 433/533 Week 8 IR in libraries. Goal Universal Access to Information Vannevar Bush 1945 article Memex A memex is a device in which an individual stores.
Grid Computing, B. Wilkinson, 20043a.1 WEB SERVICES Introduction.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Why identifiers? To access resources To cite resources To unambiguously identify a resource –To register it as intellectual property –To record changes.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Persistent Identifiers Reinhard.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Locating objects identified by DDI3 Uniform Resource Names Part of Session: Concurrent B2: Reports and Updates on DDI activities 2nd Annual European DDI.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
Linking resources Praha, June 2001 Ole Husby, BIBSYS
Web HTTP Hypertext Transfer Protocol. Web Terminology ◘Message: The basic unit of HTTP communication, consisting of structured sequence of octets matching.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata Lessons Learned Katy Ginger Digital Learning Sciences University Corporation for Atmospheric Research (UCAR)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
DNER Architecture Andy Powell 6 March 2001 UKOLN, University of Bath UKOLN is funded by Resource: The Council for.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
Interoperability How to Build a Digital Library Ian H. Witten and David Bainbridge.
Introduction to Active Directory
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Identifiers and Repositories hussein suleman uct cs honours 2006.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Toward an Open Architectural Framework for Digital Objects M. Cristina Pattuelli INLS March 19, 2001.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
ODU CS CS 695 Fall 2002 Michael L. Nelson Introduction to Digital Libraries Week 5: Early DLs and the Kahn/Wilensky Framework Old Dominion.
Repositories, Identifiers and the Kahn/Wilensky Framework
Getting a Leg Up on OAI for the NSDL
Building A Repository for Digital Objects
Introduction to Persistent Identifiers
An Overview of Data-PASS Shared Catalog
Chapter Eight Interoperability How to Build a Digital Library
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
Naming in Distributed Web-based Systems
Distribution and components
Jenn Riley Metadata Librarian Digital Library Program
CS 501: Software Engineering Fall 1999
Maggie, Carlo, Peter, Rebecca (GEDE discussions)
Persistent identifiers in VI-SEEM
Introduction to Digital Libraries Week 6: Complex Objects, Part 1
OAI and Metadata Harvesting
Web Design & Development
Web Server Design Week 16 Old Dominion University
Web Server Design Week 16 Old Dominion University
Open Archive Initiative
Open Archival Information System
Introduction to Digital Libraries Assignment #3
Old Dominion University Department of Computer Science
Jenn Riley Metadata Librarian Digital Library Program
New Perspectives on XML
Presentation transcript:

Introduction to Digital Libraries Week 5: Repositories, Identifiers and the Kahn/Wilensky Framework Old Dominion University Department of Computer Science CS 751/851 Fall 2006 Michael L. Nelson <mln@cs.odu.edu> 9/27/06

Arms, Ch. 13 repositories archives “any computer system whose primary function is to store digital material for use in a library” a collection of “stuff” archives repositories that make longevity promises covered in a future lectures

“Key Concepts in the Architecture of the Digital Library” next 8 slides taken from Bill Arm’s seminal article in the inaugural issue of D-Lib Magazine: http://www.dlib.org/dlib/July95/07arms.html

The technical framework exists within a legal and social framework DLs no longer represent systems specific to academics or information specialists content influences how the DL is used architecture must allow the implementation of various policies

Understanding of digital library concepts is hampered by terminology “common English” != “professional English” multiple professional jargons too What do these words mean to you? copy publish content document work

The underlying architecture should be separate from the content stored in the library general purpose functions and content-specific functions should be separated TL analogy: the more specific the bookshelf is to holding actual books, the harder it is to repurpose the bookshelf in the future

Names and identifiers are the basic building block for the digital library names != addresses in any DL architecture diagram, (almost) anything that can be drawn can be named consider the impact that handles/DOIs have had on the publishing/DL community

Digital library objects are more than collections of bits objects = metadata + data “but what is metadata?” don’t ask hard questions figure 2 in http://www.dlib.org/dlib/July95/07arms.html

The digital library object that is used is different from the stored object what you store is not necessarily what you get storage and dissemination are separate events, and can represent separate formats also, potentially separate from the application-specific format

Users want intellectual works, not digital objects The DL architect’s needs should not inconvenience the users’ needs recombination of objects what is an object in your world view? figure 4 in http://www.dlib.org/dlib/July95/07arms.html

Repositories must look after the information they hold “Repository Access Protocol” Kahn Wilensky Framework http://www.cnri.reston.va.us/home/cstr/arch/k-w.html figure 3 in http://www.dlib.org/dlib/July95/07arms.html

A Framework for Distributed Digital Object Services More commonly known as the Kahn/Wilensky Framework (KWF) A high level document, not even detailed enough to be an architecture, that defines some of the key concepts and terms that form the basis for the next generation of DLs DLs beyond “make the ftp server look nice”

Key KWF Terms digital objects (DOs) repository handles a unit of exchange for the DL with a particular data structure and characteristics repository the place where DOs live handles a unique, persistent name for a DO

KWF Originator makes a Data which consists of Digital Object which comes from a handle generator Handle which can go in a Repository which is accessed by which registers the DO’s handle with a Handle Server Repository Access Protocol (RAP) at which point the DO becomes a registered DO

Digital Objects Digital object = data + key-metadata data is typed; core types include: bit-sequence / set-of-bit-sequences digital-object / set-of-digital-objects handle / set-of-handles other types can be defined, and registered with a global type registry definition and registration left undefined similar to MIME? key-metadata includes handle, possibly other metadata (left undefined in KWF)

Digital Objects Typed data; example from KWF: Composite DOs: a DO subtype: computer-science-tech-report with metadata: author, institution, series, etc. Composite DOs: a DO with data of type digital-object non-composite DOs are elemental DOs composite DOs can be used to collect similar works together composite DO that contains a DO for each work of Shakespeare...

Changing Digital Objects Mutable DOs can be changed once placed in a repository key-metadata cannot be changed -- the DO’s handle does not change! Immutable DOs cannot be changed once placed in a repository however, it can be deleted

Uniform Resource Identifiers URI RFC 2396 RFC 1738 URL RFC 2141 URN

URIs & URNs registered URI schemes registered URN namespaces http://www.iana.org/assignments/uri-schemes registered URN namespaces http://www.iana.org/assignments/urn-namespaces

From RFC 2396 “A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.”

URLs URLs are tightly coupled with the physical location of an object, and are thus more likely to be transient “Error 404 - File not found” Tricks to make URLs more durable: plan ahead when constructing web site structure use good DNS CNAMEs symbolic links on filesystems http server redirects

URNs But with all the tricks available, URLs are not suitable for archival use in DLs how long will this URL (a report in LTRS): http://techreports.larc.nasa.gov/ltrs/PDF/1997/tm/NASA-97-tm112871.pdf be good? how to handle mirroring, replication, etc.? “appropriate copy” problem… mnemonic: URL = IP address (128.82.5.173) URN = IP name (blearg.cs.odu.edu)

Handles Handles can be thought of as a Uniform Resource Name (URN) implementation http://www.dlib.org/dlib/february96/02arms.html for historical comparison of efforts http://www.handle.net/ contains info about the handle system persistence location independence multiple instances Handles are of the general form: GlobalAuthority.LocalAuthority/LocallyUniqueString or, for example: NASA.LaRC/tm112871

NASA.LaRC/tm112871 “NASA” would be assigned from the global naming authority “LaRC” would be created by who registered “NASA”, and the entire string “NASA.LaRC” would be registered “tm112871” is a locally unique string generated by “LaRC” ODU.CS/tm112871 is possible...

Handle Syntax In URL-type syntax: Using a proxy server: <a href=“hdl:NASA.LaRC/tm112871”> “hdl” is a scheme; handle is resolved into a URL by locally defined handle server see http://ftp.ics.uci.edu/pub/ietf/uri/ for a good list of schemes and naming projects Using a proxy server: <a href=“http://hdl.handle.net/NASA.LaRC/tm112871”> hdl.handle.net performs resolution from: http://www.handle.net/draft-ietf-handle-system-01.html

Handles Observation: isn’t the handle system just the Domain Name System (DNS) all over again? The need for URNs for just general WWW use is obvious; the need for them in DLs even more so...

Semantics in Names Two schools of thought: semantic clues in names, such as: NASA.LaRC/tm112871 www.larc.nasa.gov are: good: easy to parse, remember, map to real-world concepts, etc. bad: names are not for human consumption, are hurtful or restrictive in the long run, etc.

“I Love Mom” (without Semantics) image from Eddie Kohler http://www.cs.ucla.edu/~kohler/

Purls Persistent URLs (Purls) examples: http://purl.net/, OCLC Maps stable URLs (registered in purl.net space) to transient URLs (i.e. cs.odu.edu/~user/ space) examples: http://www.purl.org/DC http://www.purl.org/NET/oai_explorer

DOIs Digital Object Identifier System (DOIs) http://www.doi.org/ no semantics in the names (well, that’s not always true…) driven by the publishing industry examples: doi:10.1045/september2002-rasmussen 10.1145/544220.544284 resolver: http://dx.doi.org/

info URI “info” URI proposal http://info-uri.info/ http://www.ietf.org/internet-drafts/draft-vandesompel-info-uri-04.txt how to identify “stuff” that does not resolve? LCSHs? XML namespaces? “Truth”, “Love”, “Beauty”, etc. how to promote locally unique ids to globally unique? URNs require persistence…

Repositories “A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval” (KWF) A stored DO is a DO that resides in a repository A registered DO is a DO that the repository has registered with a handle server storing and registering can be the same or different processes

Repositories A repository keeps a properties record for each DO contains key-metadata and any other metadata the repository chooses to keep A repository of record (ROR) is the first repository that a DO is placed in ROR authorizes additional instances of the DO A dissemination is the result of an access service request

Repository Access Protocol (RAP) “Protocol” may be misleading, its really just the skeleton for a protocol RAP is designed to be simple repositories themselves should be simple KWF defines 3 basic operation classes: ACCESS_DO DEPOSIT_DO ACCESS_REF this is the catch-all operation for all meta-services...

RAP RAP is fleshed out more in Cornell CS 95-TR1540 Where KWF suggested that the operations would take “metadata”, “key-metadata”, and “digital object” as arguments, TR1540 splits some of those into separate operations RAP could be implemented as a subset of a more sophisticated protocol (Dienst, Z39.50, etc.) prelude to the Open Archives Initiative protocol for metadata Harvesting (OAI-PMH)

RAP

Terms and Conditions First lengthy discussion with respect to KWF in Cornell CS 95 TR-1593 TC are attached to: each DO dissemination repository TC are a precondition for any operation on the above Repositories responsible for enforcing TC

Why Are TC Difficult? Wide open model -- “everyone can access and do everything” is much simpler How do you: specify TC? inform user of TC? negotiate TC? enforce TC? esp. with respect to 3rd party enforcers

KWF Now The KWF was never “implemented” in a real DL , (the 1995 Cornell TRs notwithstanding), yet it has influenced all repository & object model projects that followed e.g. Warwick Framework, Fedora, Buckets/SODA, Dienst, OAI-PMH T&C, or “Rights Expressions”, have mostly been moved out of the DL/repository protocols and into complex object formats Koyle, “Rights Expression Langages”, 2004 http://www.loc.gov/standards/relreport.pdf

Objects vs. Archives “Repositories must look after the information they hold” This is the tenet that I question… Most DL objects still bound to the applications that generate or render the objects