ODU CS CS 695 Fall 2002 Michael L. Nelson Introduction to Digital Libraries Week 5: Early DLs and the Kahn/Wilensky Framework Old Dominion University Department of Computer Science CS 695 Fall 2002 Michael L. Nelson 09/26/02
ODU CS CS 695 Fall 2002 Michael L. Nelson DL Architectural Review The purpose of this week’s lecture is to provide background and concepts for preparation of reviewing the architecture of various DLs Assumptions –TCP/IP connectivity no dial-up services, CD-ROMs, etc. –distribute “actual stuff” (report, software, etc.) no abstract servers, etc.
ODU CS CS 695 Fall 2002 Michael L. Nelson DL Architecture History Two main approaches: –build special client and server (generally using Motif/X11, Tcl/Tk, etc.), and use TCP/IP as the transport protocol only pros: rich functionality cons: high development cost, client distribution problem observation: many of these projects spent more time building the interfaces, protocols, searching, etc. than populating their DL!
ODU CS CS 695 Fall 2002 Michael L. Nelson DL Architecture History Two main approaches (cont’d): –use standard, higher level, orthogonal TCP/IP protocols: SMTP, FTP, Gopher, WAIS, http, etc. con: less functionality pros: less development cost, uses commonly available clients observation: this approach is now the most common
ODU CS CS 695 Fall 2002 Michael L. Nelson Early TCP/IP DLs Netlib – –begun in 1985, distributing mathematical software via (SMTP) –other access methods and protocols added (ftp, X11 client, http)
ODU CS CS 695 Fall 2002 Michael L. Nelson Netlib Accesses from:
ODU CS CS 695 Fall 2002 Michael L. Nelson Netlib Accesses from:
ODU CS CS 695 Fall 2002 Michael L. Nelson Netlib Accesses from:
ODU CS CS 695 Fall 2002 Michael L. Nelson Early TCP/IP DLs Physics pre-print server –originally: –now: now run from Cornell U, with support from LANL –begun in 1991 as an service to exchange TeX source of pre-prints TeX (and LaTeX, etc.) is a text formatting environment popular in math, physics, CS, etc. –ftp, http access added shortly
ODU CS CS 695 Fall 2002 Michael L. Nelson arXiv usage, 1/94 – 6/97 Red - Number of connections in each week Blue - Number of hosts connecting that week (divide by 10 for correct number) Green - Number of new hosts that week (divide by 10) from:
ODU CS CS 695 Fall 2002 Michael L. Nelson arXiv usage, 7/97 – 2/01 Red - Number of connections in each week Blue - Number of hosts connecting that week (divide by 10 for correct number) Green - Number of new hosts that week (divide by 10) from:
ODU CS CS 695 Fall 2002 Michael L. Nelson arXiv usage, 7/97 – 9/02 Red - Number of connections in each week Blue - Number of hosts connecting that week (divide by 10 for correct number) Green - Number of new hosts that week (divide by 10) from:
ODU CS CS 695 Fall 2002 Michael L. Nelson Early TCP/IP DLs Anonymous FTP –used by numerous computer science departments and related laboratories for the distribution of both tech reports and software ftp://techreports.larc.nasa.gov/ begun in late 1992 http access added in 1994 –see TMs 4567 and from
ODU CS CS 695 Fall 2002 Michael L. Nelson Characteristics of Early TCP/IP DLs Useful –could get the “thing” that you were looking for Constrained by transport protocol –SMTP, FTP, etc. interface inherently “clunky” –searching, formatting, sophisticated browsing, etc. difficult to implement Small scale –would the same systems work well if the holdings went from 100’s or 1000’s to millions?
ODU CS CS 695 Fall 2002 Michael L. Nelson Early HTTP DLs Initial http implementations / conversions pretty much provided incremental steps in DL improvement –a “nice” ftp interface, maybe with better searching and browsing –but the nature of the DLs changed little LTRS is an example of a http DL that is really: FTP+Searching(WAIS)+Browsing
ODU CS CS 695 Fall 2002 Michael L. Nelson Early HTTP DLs But http is a very general transport protocol, and it is possible to build even higher level protocols on top of it Combine this with the more expressive WWW client, and there is a lot of potential Dienst ( Cornell CS TR –builds an actual DL protocol on top of http the first to do so?
ODU CS CS 695 Fall 2002 Michael L. Nelson DL Sophistication Over Time ftp / gopher http LTRS, e-print, Netlib, etc. http Dienst Kahn / Wilensky implementations ? Sophistication Time We Are Here Traditional IR, Databases, CD-ROMs, etc.
ODU CS CS 695 Fall 2002 Michael L. Nelson A Framework for Distributed Digital Object Services More commonly known as the Kahn/Wilensky Framework (KWF) A high level document, not even detailed enough to be an architecture, that defines some of the key concepts and terms that form the basis for the next generation of DLs –DLs beyond “make the ftp server look nice”
ODU CS CS 695 Fall 2002 Michael L. Nelson A Friendlier Intro to KWF Key points from Bill Arm’s paper ( –The underlying architecture should be separate from the content stored in the library –Names and identifiers are the basic building block for the digital library –Digital library objects are more than collections of bits –The digital library object that is used is different from the stored object –Repositories must look after the information they hold –Users want intellectual works, not digital objects Prelude to OAIS, digital preservation, etc. Well… maybe, or maybe not!
ODU CS CS 695 Fall 2002 Michael L. Nelson Key KWF Terms digital objects (DOs) –a unit of exchange for the DL with a particular data structure and characteristics repository –the place where DOs live handles –a unique, persistent name for a DO
ODU CS CS 695 Fall 2002 Michael L. Nelson KWF Originator Digital Object Data Handle Repository Access Protocol (RAP) Handle Server makes a which consists of which comes from a handle generator which can go in a which is accessed bywhich registers the DO’s handle with a at which point the DO becomes a registered DO
ODU CS CS 695 Fall 2002 Michael L. Nelson Digital Objects Digital object = data + key-metadata –data is typed; core types include: bit-sequence / set-of-bit-sequences digital-object / set-of-digital-objects handle / set-of-handles –other types can be defined, and registered with a global type registry definition and registration left undefined similar to MIME? –key-metadata includes handle, possibly other metadata (left undefined in KWF)
ODU CS CS 695 Fall 2002 Michael L. Nelson Digital Objects Typed data; example from KWF: –a DO subtype: computer-science-tech-report –with metadata: author, institution, series, etc. Composite DOs: –a DO with data of type digital-object –non-composite DOs are elemental DOs –composite DOs can be used to collect similar works together composite DO than contains a DO for each work of Shakespeare...
ODU CS CS 695 Fall 2002 Michael L. Nelson A Digital Object figure 2 from
ODU CS CS 695 Fall 2002 Michael L. Nelson Changing Digital Objects Mutable DOs can be changed once placed in a repository –key-metadata cannot be changed -- the DO’s handle does not change! Immutable DOs cannot be changed once placed in a repository –however, it can be deleted
ODU CS CS 695 Fall 2002 Michael L. Nelson Uniform Resource Identifiers URI URLURN Most people are more familiar with URLs, but both Uniform Resource Locators and Uniform Resource Names are instantiations of Uniform Resource Identifiers
ODU CS CS 695 Fall 2002 Michael L. Nelson URNs Handles can be thought of as a Uniform Resource Name (URN) implementation URLs are tightly coupled with the physical location of an object, and are thus more likely to be transient –“Error File not found” Tricks to make URLs more durable: plan ahead when constructing web site structure use good DNS CNAMEs symbolic links on filesystems http server redirects
ODU CS CS 695 Fall 2002 Michael L. Nelson URNs But with all the tricks available, URLs are not suitable for archival use in DLs how long will this URL (a report in LTRS): –be good? –how to handle mirroring, replication, etc.? “appropriate copy” problem… mnemonic: –URL = IP address ( ) –URN = IP name (blearg.cs.odu.edu)
ODU CS CS 695 Fall 2002 Michael L. Nelson Handles See RFCs 2141 & 2168 for more URN info – for historical comparison of efforts contains info about the handle system –persistence –location independence –multiple instances Handles are of the general form: GlobalAuthority.LocalAuthority/LocallyUniqueString or, for example: NASA.LaRC/tm112871
ODU CS CS 695 Fall 2002 Michael L. Nelson NASA.LaRC/tm “NASA” would be assigned from the global naming authority “LaRC” would be created by who registered “NASA”, and the entire string “NASA.LaRC” would be registered “tm112871” is a locally unique string generated by “LaRC” –ODU.CS/tm is possible...
ODU CS CS 695 Fall 2002 Michael L. Nelson Handle Syntax In URL-type syntax: – –“hdl” is a scheme; handle is resolved into a URL by locally defined handle server see for a good list of schemes and naming projects Using a proxy server: – –hdl.handle.net performs resolution from:
ODU CS CS 695 Fall 2002 Michael L. Nelson Handles Observation: isn’t the handle system just the Domain Name System (DNS) all over again? The need for URNs for just general WWW use is obvious; the need for them in DLs even more so... A good project? –study & evaluate various URN implementations, including the handle system
ODU CS CS 695 Fall 2002 Michael L. Nelson Semantics in Names Two schools of thought: –semantic clues in names, such as: –NASA.LaRC/tm – –are: good: easy to parse, remember, map to real-world concepts, etc. bad: names are not for human consumption, are hurtful or restrictive in the long run, etc.
ODU CS CS 695 Fall 2002 Michael L. Nelson Other Naming Projects Persistent URLs (Purls) – OCLC –Maps stable URLs (registered in purl.net space) to transient URLs (i.e. ils.unc.edu/~user/ space) Digital Object Identifier System (DOIs) – –no semantics in the names (well, that’s not always true…) driven by the publishing industry
ODU CS CS 695 Fall 2002 Michael L. Nelson Repositories “A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval” (KWF) A stored DO is a DO that resides in a repository A registered DO is a DO that the repository has registered with a handle server –storing and registering can be the same or different processes
ODU CS CS 695 Fall 2002 Michael L. Nelson Repositories A repository keeps a properties record for each DO –contains key-metadata and any other metadata the repository chooses to keep A repository of record (ROR) is the first repository that a DO is placed in –ROR authorizes additional instances of the DO A dissemination is the result of an access service request
ODU CS CS 695 Fall 2002 Michael L. Nelson Repositories figure 3 from
ODU CS CS 695 Fall 2002 Michael L. Nelson Repository Access Protocol (RAP) “Protocol” may be misleading, its really just the skeleton for a protocol RAP is designed to be simple –repositories themselves should be simple KWF defines 3 basic operation classes: –ACCESS_DO –DEPOSIT_DO –ACCESS_REF this is the catch-all operation for all meta-services...
ODU CS CS 695 Fall 2002 Michael L. Nelson RAP RAP is fleshed out more in Cornell CS 95- TR1540 Where KWF suggested that the operations would take “metadata”, “key-metadata”, and “digital object” as arguments, TR1540 splits some of those into separate operations RAP could be implemented as a subset of a more sophisticated protocol (Dienst, Z39.50, etc.) –prelude to the Open Archives Initiative (OAI) metadata harvesting protocol
ODU CS CS 695 Fall 2002 Michael L. Nelson RAP
ODU CS CS 695 Fall 2002 Michael L. Nelson Terms and Conditions First lengthy discussion with respect to KWF in Cornell CS 95 TR-1593 Terms and Conditions (TC) can be arbitrarily complex, but generally consist of: –permissions: read, write, etc. –authentication - person, group, etc. –payment –3rd party intervention (possibly in support of the above)
ODU CS CS 695 Fall 2002 Michael L. Nelson TC TC are attached to: –each DO –dissemination –repository TC are a precondition for any operation on the above Repositories responsible for enforcing TC
ODU CS CS 695 Fall 2002 Michael L. Nelson Booch Diagram for TC repository terms and conditions terms and conditions terms and conditions digital object dissemination data N Figure 1 from 95 TR-1593
ODU CS CS 695 Fall 2002 Michael L. Nelson Why Are TC Difficult? Wide open model -- “everyone can access and do everything” is much simpler How do you: –inform user of TC? –negotiate TC? –enforce TC? esp. with respect to 3rd party enforcers –specify TC?
ODU CS CS 695 Fall 2002 Michael L. Nelson Access Rules and TC Figure 1 from TR-1540
ODU CS CS 695 Fall 2002 Michael L. Nelson Access Rules and TC TR-1593 makes access_rules an instance of the class terms_and_conditions Defines KWF concepts in a Common Object Request Broker Architecture (CORBA) context –CORBA is a standard/architecture/mechanism for object communication across heterogeneous everything... –
ODU CS CS 695 Fall 2002 Michael L. Nelson CORBA Implementation Messages passed to the Object Request Broker (ORB) by interceptors –ca any current projects would likely use SOAP Interceptors create: –credential object –security context object –access decision object
ODU CS CS 695 Fall 2002 Michael L. Nelson CORBA Implementation 1. Client requests a dissemination 2. Interceptor creates a credential object to store clients privilege attributes (PA) 3. Client and server establish a security context 4. Access decision object (ADO) controls access to the DO 5. ADO looks at DO’s control attributes (CA) and compares them to the client’s PAs 6. Negotiation (lots of icky details hidden here) 7. ADO grants or denies 8. Dissemination (or failure message) return to Client
ODU CS CS 695 Fall 2002 Michael L. Nelson “It’s so complex, it’s wonderful!” This is a long way from or anonymous ftp services... Things sure were simpler when everyone can read everything But open and free communication is a subset of DL applications –success, widespread adoption of DLs depends on ability to model the more complex TC for various information
ODU CS CS 695 Fall 2002 Michael L. Nelson Are We There Yet? Grand and glorious projects often collapse under their own weight, fail to achieve critical mass, or generally under-achieve –GOSIP, Ada, Z39.50, Hyper-G Simple and limited scope projects are more successful, achieve critical mass, etc. –TCP/IP, C, OAI-PMH, WWW
ODU CS CS 695 Fall 2002 Michael L. Nelson Rough Consensus and Running Code Are KWF and its implementations: –just what the DL community needs? –not enough? –too much? When the “perfect” DL architecture arrives, will we be too invested in our current DLs to transition? We reject kings, presidents and voting. We believe in rough consensus and running code. IETF Credo, Dave Clark, 1992