1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

Digital Library Architecture: A Service-Based Approach
1 CS 430 / INFO 430 Information Retrieval Lecture 13 Architecture of Information Retrieval Systems.
ISP 433/533 Week 8 IR in libraries. Goal Universal Access to Information Vannevar Bush 1945 article Memex A memex is a device in which an individual stores.
The Internet Useful Definitions and Concepts About the Internet.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
Fun with Geospatial Metadata, CUGIR, CORC, MARC, and OAI: The CSDGM to MARC Grant Project Adam Chandler, Olin Library Elaine Westbrooks, Mann Library Vivek.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
T.Sharon-A.Frank 1 Internet Resources Discovery (IRD) FDL Examples.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
Application Layer. Applications A program or group of programs designed for end users. A program or group of programs designed for end users. Software.
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
CNRI Handle System and its Applications
What IS the Web? Mrs. Wilson Internet Basics & Beyond.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
1 The NSDL Program Stephen Griffin National Science Foundation.
INTERNET PROTOCOLS. Microsoft’s Internet Information Server Home Page Figure IT2031 UNIT-3.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Corporation For National Research Initiatives Technical Issues in Electronic Publishing Corporation for National Research Initiatives William Y. Arms.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
Web Server.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
Web Services An Introduction Copyright © Curt Hill.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
Web Technologies Lecture 6 State preservation. Motivation How to keep user data while navigating on a website? – Authenticate only once – Store wish list.
Open Archives Initiative CNI Phoenix December 13, 1999 Dale Flecker, Harvard Carl Lagoze, Cornell John Ober, CDL Don Waters, Mellon.
Harokopio University of Athens – Department of Informatics and Telematics HAROKOPIOUNIVERSITY A Distributed Architecture for Building Federated Digital.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
1 CS 502: Computing Methods for Digital Libraries Guest Lecture William Y. Arms Identifiers: URNs, Handles, PURLs, DOIs and more.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Identifiers and Repositories hussein suleman uct cs honours 2006.
COMPUTER NETWORKS Hwajung Lee. Image Source:
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Toward an Open Architectural Framework for Digital Objects M. Cristina Pattuelli INLS March 19, 2001.
ODU CS CS 695 Fall 2002 Michael L. Nelson Introduction to Digital Libraries Week 5: Early DLs and the Kahn/Wilensky Framework Old Dominion.
The Multi-Faceted Use of the OAI-PMH in the LANL Repository Written By: Henry, Xiaoming,Patrick Henry, Xiaoming,Patrick and Herbert. Presented By: Shashi.
Instructor Materials Chapter 5 Providing Network Services
Repositories, Identifiers and the Kahn/Wilensky Framework
Some bits on how it works
Distributed web based systems
Networking for Home and Small Businesses – Chapter 6
Networking for Home and Small Businesses – Chapter 6
Wsdl.
Old Dominion University Department of Computer Science
Open Archive Initiative
Institutional Repositories
Networking for Home and Small Businesses – Chapter 6
Old Dominion University Department of Computer Science
Protocol Application TCP/IP Layer Model
Presentation transcript:

1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel Lecture 5 A research perspective on Digital Libraries

2 herbert van de sompel DL Ancestry

3 herbert van de sompel URLs to some of these DLs ADS: NCSTRL: UCSTRI: arXiv: LTRS: NTRS:

4 herbert van de sompel DL Architectural Review Assumptions made in this perspective –things start with TCP/IP connectivity –distribute full content (reports, software, etc.) not only metadata

5 herbert van de sompel DL Architecture History approach 1 1. Build special client and server (generally using Motif/X11, Tcl/Tk, etc.), and use TCP/IP as the transport protocol only pros: rich functionality cons: high development cost, client distribution problem observation: many of these projects spent more time building the interfaces, protocols, searching, etc. than populating their DL!

6 herbert van de sompel DL Architecture History approach 2 2. use standard protocols built upon TCP/IP: SMTP, FTP, Gopher, WAIS, HTTP, etc. con: less functionality (restricted by protocol) pros: less development cost, uses commonly available clients observation: this approach is now the most common The ones listed on slide 2 fit into this category

7 herbert van de sompel Early TCP/IP DLs a very old one: IETF: Internet RFC’s Very first TCP/IP DL?

8 herbert van de sompel Early TCP/IP DLs Netlib – –begun in 1985, distributing mathematical software via (SMTP) –other access methods and protocols added (ftp, X11 client, http)

9 herbert van de sompel Netlib 1995

10 herbert van de sompel Netlib 2001

11 herbert van de sompel Los Alamos arXiv Physics pre-print server – == –begun in 1991 as an service to exchange TeX source of pre-prints in high energy physics –ftp, http access added shortly –Now THE communication channel in Physics –Paul Ginsparg

12 herbert van de sompel Characteristics of early TCP/IP, non-HTTP DLs Useful –could get the “thing” that you were looking for Constrained by transport protocol –SMTP, FTP, etc. interface inherently “clunky” –Higher level services such as searching, sophisticated browsing, etc. difficult to implement Small scale –would the same systems work well if the holdings went from 100’s or 1000’s to millions?

13 herbert van de sompel Characteristics of early TCP/IP, HTTP DLs Initial HTTP implementations / conversions pretty much provided incremental steps in DL improvement –a “nice” ftp interface, maybe with better searching and browsing –but the nature of the DLs changed little LTRS is an example of a http DL that is really: FTP+Searching(WAIS)+Browsing Also check out user interface of

14 herbert van de sompel Early TCP/IP, HTTP DLs But http is a very general transport protocol, and it is possible to build even higher level protocols on top of it Combine this with the expressive HTTP client (web browser), and there is a lot of potential Dienst –( –builds an actual DL protocol on top of HTTP the first to do so? Open Archives Initiative: metadata harvesting protocol on top of HTTP

15 herbert van de sompel Sophistication increases, tracks meet ftp / gopher http LTRS, e-print, Netlib, etc. http Dienst sophistication time research track library automation track

16 herbert van de sompel A Framework for Distributed Digital Object Services Kahn/Wilensky Framework [Kahn 1995] 1995 A high level document Almost a definition of key concepts, terminologies, … for next generation DLs Foundation for a research discipline? Not detailed enough to be a real architecture. Architecture is independent of the type of data stored in the DL

17 herbert van de sompel KWF: key terms digital object (do) –A do is a data structure that contains Digital data; data is typed (cf MIME) Persistent Key Metadata; especially handle Other metadata (for instance Terms and Conditions) handle –a handle is a unique, persistent name for a do repository –The place where do’s live –Has unique global name Repository Access Protocol (RAP) –To deposit/access do’s in repositories

18 herbert van de sompel KWF: flow Originator digital object makes a Data which consists of Key-Metadata handle handle comes from a handle generator Handle Server which registers the do’s handle with a handle server at which point the do becomes a registered do Accesses/Deposits the do in repositories by means of the Repository Access Protocol What the client receives as a result of an access to a do is a dissemination. client Properties record per do Key metadata: handle Other metadata: Terms and conditions Transaction record per do Repository which can go in a repository at which point the do becomes a stored do

19 herbert van de sompel Digital objects do = data + key-metadata –data is typed; core types include: bit-sequence / set-of-bit-sequences digital-object / set-of-digital-objects handle / set-of-handles –other types can be defined, and registered with a global type registry definition and registration left undefined ~ similar to MIME –key-metadata includes handle –possibly other metadata (left undefined in KWF)

20 herbert van de sompel Digital objects Composite do’s: –a do with data of type digital-object –non-composite do’s are elemental do’s –composite do’s can – for instance -- be used to collect similar works together composite do than contains a do for each work of Shakespeare...

21 herbert van de sompel Changing digital objects Mutable do’s can be changed once placed in a repository –key-metadata cannot be changed –the do’s handle does never change! Immutable do’s cannot be changed once placed in a repository –however, they can be deleted

22 herbert van de sompel Handles Guest lecture by Professor Arms 02/19

23 herbert van de sompel Repositories A network accessible storage system in which digital objects may be stored for possible subsequent access or retrieval A stored do is a do that resides in a repository A registered do is a do that the repository has registered with a handle server –storing and registering can be the same or different processes

24 herbert van de sompel Repositories A repository keeps a properties record for each do –contains key-metadata and any other metadata the repository chooses to keep A do may have a transaction record associated with it in a repository

25 herbert van de sompel Repository Access Protocol “Protocol” may be misleading, its really just the concept for a protocol RAP is designed to be simple; higher level services should come from other protocols KWF defines 3 basic operation classes: –ACCESS_DO [metadata; key-metadata, digital object] A dissemination of a do is the result of a request to access a do –DEPOSIT_DO [metadata; key-metadata, digital object] –ACCESS_REF this is a means to tell the world about other ways (protocols) to access do’s in the repository.

26 herbert van de sompel Terms and Conditions TC are attached to: –each do –each dissemination –each repository TC are a precondition for any operation on the above Repositories responsible for enforcing TC

27 herbert van de sompel Terms and Conditions repository terms and conditions terms and conditions terms and conditions digital object dissemination data N Figure 1 from 95 TR-1593

28 herbert van de sompel Digital Objects: Terms and Conditions Set by originator and/or repository Can be arbitrarily complex, but generally consist of: – permissions: read, write, etc. – authentication - person, group, etc. – payment – 3rd party intervention (possibly in support of the above)

29 herbert van de sompel Readings Kahn, R. & Wilensky, R A Framework for Distributed Digital Object Services w.html Arms, W.Y Key Concepts in the Architecture of the Digital Library. In: D-Lib Magazine. Marc VanHeyningen The Unified Computer Science Technical Report Index: Lessons in indexing diverse resources.