Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel

Similar presentations


Presentation on theme: "1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel"— Presentation transcript:

1 1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel herbertv@cs.cornell.edu Lecture 5 A research perspective on Digital Libraries

2 2 herbert van de sompel DL Ancestry

3 3 herbert van de sompel URLs to some of these DLs ADS: http://adswww.harvard.edu/http://adswww.harvard.edu/ NCSTRL: http://www.ncstrl.orghttp://www.ncstrl.org UCSTRI: http://www.cs.indiana.edu:800/cstr/cover.htmlhttp://www.cs.indiana.edu:800/cstr/cover.html arXiv: http://arXiv.orghttp://arXiv.org LTRS: http://techreports.larc.nasa.gov/ltrs/http://techreports.larc.nasa.gov/ltrs/ NTRS: http://techreports.larc.nasa.gov/cgi-bin/NTRShttp://techreports.larc.nasa.gov/cgi-bin/NTRS

4 4 herbert van de sompel DL Architectural Review Assumptions made in this perspective –things start with TCP/IP connectivity –distribute full content (reports, software, etc.) not only metadata

5 5 herbert van de sompel DL Architecture History approach 1 1. Build special client and server (generally using Motif/X11, Tcl/Tk, etc.), and use TCP/IP as the transport protocol only pros: rich functionality cons: high development cost, client distribution problem observation: many of these projects spent more time building the interfaces, protocols, searching, etc. than populating their DL!

6 6 herbert van de sompel DL Architecture History approach 2 2. use standard protocols built upon TCP/IP: SMTP, FTP, Gopher, WAIS, HTTP, etc. con: less functionality (restricted by protocol) pros: less development cost, uses commonly available clients observation: this approach is now the most common The ones listed on slide 2 fit into this category

7 7 herbert van de sompel Early TCP/IP DLs a very old one: IETF: http://www.ietf.org/ http://www.ietf.org/ Internet RFC’s Very first TCP/IP DL?

8 8 herbert van de sompel Early TCP/IP DLs Netlib –http://www.netlib.org/ –begun in 1985, distributing mathematical software via e-mail (SMTP) –other access methods and protocols added (ftp, X11 client, http)

9 9 herbert van de sompel Netlib 1995

10 10 herbert van de sompel Netlib 2001

11 11 herbert van de sompel Los Alamos arXiv Physics pre-print server –http://xxx.lanl.gov/ == http://arXiv.org –begun in 1991 as an e-mail service to exchange TeX source of pre-prints in high energy physics –ftp, http access added shortly –Now THE communication channel in Physics –Paul Ginsparg

12 12 herbert van de sompel Characteristics of early TCP/IP, non-HTTP DLs Useful –could get the “thing” that you were looking for Constrained by transport protocol –SMTP, FTP, etc. interface inherently “clunky” –Higher level services such as searching, sophisticated browsing, etc. difficult to implement Small scale –would the same systems work well if the holdings went from 100’s or 1000’s to millions?

13 13 herbert van de sompel Characteristics of early TCP/IP, HTTP DLs Initial HTTP implementations / conversions pretty much provided incremental steps in DL improvement –a “nice” ftp interface, maybe with better searching and browsing –but the nature of the DLs changed little LTRS is an example of a http DL that is really: FTP+Searching(WAIS)+Browsing http://techreports.larc.nasa.gov/ltrs/ Also check out user interface of http://arXiv.org

14 14 herbert van de sompel Early TCP/IP, HTTP DLs But http is a very general transport protocol, and it is possible to build even higher level protocols on top of it Combine this with the expressive HTTP client (web browser), and there is a lot of potential Dienst –(http://www.ncstrl.org/Dienst/htdocs/Info/protocol4.html) –builds an actual DL protocol on top of HTTP 1994 -- the first to do so? Open Archives Initiative: metadata harvesting protocol on top of HTTP

15 15 herbert van de sompel Sophistication increases, tracks meet e-mail ftp / gopher http LTRS, e-print, Netlib, etc. http Dienst sophistication time research track library automation track

16 16 herbert van de sompel A Framework for Distributed Digital Object Services Kahn/Wilensky Framework [Kahn 1995] 1995 A high level document Almost a definition of key concepts, terminologies, … for next generation DLs Foundation for a research discipline? Not detailed enough to be a real architecture. Architecture is independent of the type of data stored in the DL

17 17 herbert van de sompel KWF: key terms digital object (do) –A do is a data structure that contains Digital data; data is typed (cf MIME) Persistent Key Metadata; especially handle Other metadata (for instance Terms and Conditions) handle –a handle is a unique, persistent name for a do repository –The place where do’s live –Has unique global name Repository Access Protocol (RAP) –To deposit/access do’s in repositories

18 18 herbert van de sompel KWF: flow Originator digital object makes a Data which consists of Key-Metadata handle handle comes from a handle generator Handle Server which registers the do’s handle with a handle server at which point the do becomes a registered do Accesses/Deposits the do in repositories by means of the Repository Access Protocol What the client receives as a result of an access to a do is a dissemination. client Properties record per do Key metadata: handle Other metadata: Terms and conditions Transaction record per do Repository which can go in a repository at which point the do becomes a stored do

19 19 herbert van de sompel Digital objects do = data + key-metadata –data is typed; core types include: bit-sequence / set-of-bit-sequences digital-object / set-of-digital-objects handle / set-of-handles –other types can be defined, and registered with a global type registry definition and registration left undefined ~ similar to MIME –key-metadata includes handle –possibly other metadata (left undefined in KWF)

20 20 herbert van de sompel Digital objects Composite do’s: –a do with data of type digital-object –non-composite do’s are elemental do’s –composite do’s can – for instance -- be used to collect similar works together composite do than contains a do for each work of Shakespeare...

21 21 herbert van de sompel Changing digital objects Mutable do’s can be changed once placed in a repository –key-metadata cannot be changed –the do’s handle does never change! Immutable do’s cannot be changed once placed in a repository –however, they can be deleted

22 22 herbert van de sompel Handles Guest lecture by Professor Arms 02/19

23 23 herbert van de sompel Repositories A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval A stored do is a do that resides in a repository A registered do is a do that the repository has registered with a handle server –storing and registering can be the same or different processes

24 24 herbert van de sompel Repositories A repository keeps a properties record for each do –contains key-metadata and any other metadata the repository chooses to keep A do may have a transaction record associated with it in a repository

25 25 herbert van de sompel Repository Access Protocol “Protocol” may be misleading, its really just the concept for a protocol RAP is designed to be simple; higher level services should come from other protocols KWF defines 3 basic operation classes: –ACCESS_DO [metadata; key-metadata, digital object] A dissemination of a do is the result of a request to access a do –DEPOSIT_DO [metadata; key-metadata, digital object] –ACCESS_REF this is a means to tell the world about other ways (protocols) to access do’s in the repository.

26 26 herbert van de sompel Terms and Conditions TC are attached to: –each do –each dissemination –each repository TC are a precondition for any operation on the above Repositories responsible for enforcing TC

27 27 herbert van de sompel Terms and Conditions repository terms and conditions terms and conditions terms and conditions digital object dissemination data 11 1 1 1 1 1 1 1 1 1 1 1 N Figure 1 from 95 TR-1593

28 28 herbert van de sompel Digital Objects: Terms and Conditions Set by originator and/or repository Can be arbitrarily complex, but generally consist of: – permissions: read, write, etc. – authentication - person, group, etc. – payment – 3rd party intervention (possibly in support of the above)

29 29 herbert van de sompel Readings Kahn, R. & Wilensky, R. 1995. A Framework for Distributed Digital Object Services http://WWW.CNRI.Reston.VA.US/home/cstr/arch/k- w.html Arms, W.Y. 1995. Key Concepts in the Architecture of the Digital Library. In: D-Lib Magazine. http://www.dlib.org/dlib/July95/07arms.html http://www.dlib.org/dlib/July95/07arms.html Marc VanHeyningen. 1994. The Unified Computer Science Technical Report Index: Lessons in indexing diverse resources. http://www.cs.indiana.edu/ucstri/paper/paper.html http://www.cs.indiana.edu/ucstri/paper/paper.html


Download ppt "1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel"

Similar presentations


Ads by Google