OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI in DigiTool DigiTool Version 3.0.
Harvesting Metadata Using OAI-PMH Roy Tennant California Digital Library.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
ETD’s at the University of Saskatchewan or… David Fox & Darryl Friesen University of Saskatchewan October 4, 2003.
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Digital Library in a Box Ming Luo, Hussein Suleman, Edward Fox Virginia Tech Subcontract to Collaborative Project led by University of Florida (also with.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Metadata Harvesting Interoperable digital collections.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Harvesting Interoperable digital collections.
Metadata Harvesting Interoperable digital collections.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
ALA 2002 LITA Open Source Software Open Archives Initiative Hussein Suleman AmericanSouth.org 14 June 2002.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
LIS 654 BUILDING DIGITAL LIBRARIES FALL 2011 NOVEMBER 03, 2011 The OAI-PMH Harvester Plugin for The Omeka Content Management System JAMES R. GRIFFIN III.
OAI-PMH The Open Archives Initiative Protocol for Metadata Harvesting Presenter: Knud Möller Friday,
1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova 4 February.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
Digital Library Component Models hussein suleman uct cs honours 2005.
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2006.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
SPASE and the VxOs Jim Thieman Todd King Aaron Roberts.
Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives Hussein Suleman, Digital Library Research.
Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives Hussein Suleman, Digital.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
ETD Search Services Ming Luo Edward A. Fox Virginia Tech.
Metadata Harvesting Interoperable digital collections.
ETDs and NDLTD Hussein Suleman University of Cape Town May 2004.
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
Open Digital Libraries Edward A. Fox Virginia Tech, Dept. of Computer Science.
Introduction to the OAI Protocol for Metadata Harvesting Version 2.0 Hussein Suleman Virginia Tech DLRL 25 March 2002.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
NDLTD Toward Universal Accessibility of ETDs: Building the NDLTD Union Archive Hussein Suleman, Edward A. Fox,
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
NDLTD Standards, Metadata and the OAI-PMH Hussein Suleman University of Cape Town October 2003.
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2009.
Getting a Leg Up on OAI for the NSDL
Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives Hussein Suleman, Digital Library Research Laboratory.
Georges Arnaout Chaitanya Krishna
OAI and Metadata Harvesting
Open Archive Initiative
IVOA Interoperability Meeting - Boston
Presentation transcript:

OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003

OAI & ODL - CS66042 Outline 1.Introduction to OAI 2.Definitions and Concepts 3.OAI Protocol for Metadata Harvesting 4.Introduction to ODL 5.OAI and ODL Components

OAI & ODL - CS Introduction to OAI What is the Open Archives Initiative ? –Group of people and organizations dedicated to solving problems of digital library interoperability by developing simple protocols. Major Accomplishment: –Protocol for Metadata Harvesting (OAI-PMH)

OAI & ODL - CS What is the OAI-PMH ? What is the Protocol for Metadata Harvesting? –Network protocol to transfer metadata from one archive to another Any metadata (XML-encoded data records) In a continuous stream As simply as possible

OAI & ODL - CS General System Strategy Services Metadata Harvesting Document Model

OAI & ODL - CS Case Study: AmericanSouth Digital library of resources related to Southern history and culture Multiple independent university-based collections of electronic documents Emory UTK Virginia Tech American South.Org portal OAI Protocol for Metadata Harvesting

OAI & ODL - CS Versions of OAI-PMH v1.0 January 2001 v1.1 July 2001 –Minor revision from v1.0 v2.0 June 2002 –Mostly syntactical changes –These notes are based on version 2.0 !

OAI & ODL - CS Definitions / Concepts Basic Principles –What is an Open Archive? –Harvesting vs. Federation –Data and Service Providers Underlying Technology –HTTP and XML Protocol Policies –What is a record? –Multiplicity of Metadata –Sets –Datestamp, Harvesting and Flow Control

OAI & ODL - CS What is an Open Archive ? Any WWW-based system that can be accessed through the well-defined interface of the Open Archives Protocol for Metadata Harvesting … aka OAI-Compliant Repository No implications for: –Physical storage of data –Cost of data –Metadata and data formats –Access control to server

OAI & ODL - CS Harvesting vs Federation Competing approaches to interoperability –Federation is when services are run remotely on remote data (e.g. Meta-searching) –Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g. Union catalogues) Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting OAI currently focuses on harvesting

OAI & ODL - CS Data and Service Providers Data Providers refer to entities who possess data/metadata and are willing to share this with others (internally or externally) via well-defined OAI protocols (e.g. database servers) Service Providers are entities who harvest data from Data Providers in order to provide higher-level services to users (e.g. search engines) In networking terms, the data provider is a network server and a service provider connects to the server as a client.

OAI & ODL - CS HTTP and XML Protocol for Metadata Harvesting is an almost stateless request/response protocol Requests and responses are sent via the HTTP protocol Requests are encoded as GET/POST operations Responses are well-formed XML documents

OAI & ODL - CS What is a record ? A record refers to an independent XML structure that may be associated with digital or physical objects Records are usually associated with metadata, not data OAI advocates harvesting of records, which contain metadata and additional fields to support the harvesting operation

OAI & ODL - CS As Compared to Z39.50 Z39.50OAI Content (Objects)Distributed World ViewBibliographic Object PresentationData provider Searching isDistributedCentralized Search done byData providerService provider Metadata searched isUp to dateStale Semantic MappingWhen searchingMetadata delivery

OAI & ODL - CS What OAI Is Not Not search Not database Not metadata Not OAIS

OAI & ODL - CS What OAI is good for Where content is widely distributed, in different kinds of non-Z39.50 enabled locations –Metadata provider more lightweight than Z39.50 –Metadata provider scales well Service provider scales according to search capability Metadata is sufficient for services desired Normalization, de-duping, augmentation desired Not mutually exclusive –Portals can use both Z39.50 & OAI

OAI & ODL - CS Sample OAI Record oai:sigir:ws OAI Workshop at SIGIR Hussein Suleman English oai:sigir:ws3md

OAI & ODL - CS Multiplicity of Metadata Multiple formats of metadata allowed Dublin Core is mandatory Any other format allowed as long as it has an XML encoding E.g. MARC (Libraries), IMS (Education), ETDMS (Theses/Dissertations), RFC1807 (Bibliographies)

OAI & ODL - CS Sets Protocol mechanism to allow for harvesting of sub-collections No well-defined semantics – depends completely on local data providers May be defined by arrangement between data providers and service providers E.g. Subject areas, years, author names, search queries

OAI & ODL - CS Datestamps & Harvesting Each record needs a datestamp that indicates its date of creation or modification Dates are used to allow for harvesting by date range, thus allowing incremental and continuous transfer of metadata from a data provider to a service provider

OAI & ODL - CS Flow Control HTTP “ retry-after ” mechanism can be leveraged to support server-side delaying of a client ’ s request Resumption Tokens can be used to return partial results – the client is issued with a token which may be presented to the server to receive more results

OAI & ODL - CS How OAI Works OAI “ VERBS ” Identify ListSets ListMetadataFormats ListIdentifiers GetRecord ListRecords HARVESTERHARVESTER REPOSITORYREPOSITORY OAI Service Provider Metadata Provider HTTP Request HTTP Response (OAI Verb) (Valid XML)

OAI & ODL - CS The baseURL Requests are sent by HTTP to baseURLs, with parameters appended, e.g. – Responses are the documents that are returned by the server The baseURL is the point of contact to communicate with a component !

OAI & ODL - CS Protocol for Metadata Harvesting Service Requests –Identify –ListSets –ListMetadataFormats –ListIdentifiers –GetRecord –ListRecords Metadata Multiplicity Date Ranges Resumption Tokens

OAI & ODL - CS Identify Purpose –Return general information about the archive and its policies Parameters –None Sample URL –

OAI & ODL - CS ListSets Purpose –Provide a hierarchical listing of sets in which records may be organized Parameters –None Sample URL –

OAI & ODL - CS ListMetadataFormats Purpose –List metadata formats supported by the archive as well as their schema locations and namespaces Parameters –identifier – for a specific record (O) Sample URL – bin/OAI?verb=ListMetadataFormats

OAI & ODL - CS ListIdentifiers Purpose –List headers for all items corresponding to the specified parameters Parameters –from – start date (O) –until – end date (O) –set – set to harvest from (O) –metadataPrefix – metadata format to list identifiers for (R) –resumptionToken – flow control mechanism (X) Sample URL – verb=ListIdentifiers&metadataPrefix=oai_dc

OAI & ODL - CS GetRecord Purpose –Returns the metadata for a single identifier in the form of an OAI record Parameters –identifier – unique id for record (R) –metadataPrefix – metadata format (R) Sample URL – verb=GetRecord&identifier=oai:test:123&metadataPrefix=oai_dc

OAI & ODL - CS ListRecords Purpose –Retrieves metadata for multiple records Parameters –from – start date (O) –until – end date (O) –set – set to harvest from (O) –resumptionToken – flow control mechanism (X) –metadataPrefix – metadata format (R) Sample URL – verb=ListRecord&metadataprefix=oai_dc&from=

OAI & ODL - CS Protocol Details OAI Transaction == An OAI request (HTTP) & corresponding OAI response (XML) –Optional: use resumptionToken & other flow control mechanisms to manage service load Item Identifiers – Persistence & Uniqueness Item Datestamps – Date of last metadata change; supports selective harvesting

OAI & ODL - CS Examples of OAI Requests verb=ListMetadataFormats verb=ListIdentifiers&metadataPrefix=oai_dc&from= verb=GetRecord&metadataPrefix=oai_dc& identifier=oai%3Aacl.sr.language-archives.org%3AA

OAI & ODL - CS An OAI Response T19:20:30Z

OAI & ODL - CS An OAI Record oai:arXiv:cs/ cs Using Structural Metadata… … ….

OAI & ODL - CS Unique Identifiers Each item must have a unique identifier Identifiers must follow rules for valid URIs Example: – oai: : – oai:etd.vt.edu:etd Each identifier must resolve to a single item and always to the same item –Can ’ t reuse OAI item identifiers

OAI & ODL - CS Datestamps Needed for every OAI record to support incremental harvesting Must be updated when addition or modification or deletion made in order to ensure changes are correctly propagated to harvesters Different from dates within the metadata – OAI datestamp is used only for harvesting Can be either YYYY-MM-DD or YYYY-MM- DDThh:mm:ssZ (must be GMT timezone)

OAI & ODL - CS OAI Provider Architectures Descriptive Metadata DBMSXML HTML DBMS OAI Administrative Metadata Webserver - HTTP OAI Application (CGI, ASP, PHP, etc.) OAI Harvesters

OAI & ODL - CS Repository Explorer

OAI & ODL - CS RE Parameter Testing

OAI & ODL - CS RE Formatted View of Data

OAI & ODL - CS RE Raw XML views of data

OAI & ODL - CS RE Automatic Test Suite

OAI & ODL - CS RE Error in XML

OAI & ODL - CS Introduction to ODL Open Digital Libraries –Framework for componentized Digital Libraries –Design principles for components –Protocols for inter-component communications –Built upon OAI-PMH v1.1

OAI & ODL - CS Program Document Document Document Program Program Image Image Image Video Video Video usersdigital objects ? 4.1. Users and Objects

OAI & ODL - CS ? Program Document Document Document Program Program Image Image Image Video Video Video ? digital library Monolithic and/or Custom-built web-based application 4.2. Digital Library

OAI & ODL - CS Program Document Document Document Program Program Image Image Image Video Video Video componentized digital library ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 4.3. Componentized DL

OAI & ODL - CS How about OAI-PMH ? Metadata transfer among digital libraries “ is almost = ” metadata exchange among components Need a few changes to support inter- component communication, including: –Support for additional information in responses –Support for adding records as well (PutRecord)

OAI & ODL - CS Program Document Document Document Program Program Image Image Image Video Video Video open digital library OA PMH XPMH 4.5. Open Digital Library

OAI & ODL - CS Open Digital Library Protocol Extended OAI-PMH Protocol for Metadata Harvesting

OAI & ODL - CS Open Digital Library Component Extended OPEN ARCHIVE OPEN ARCHIVE

OAI & ODL - CS Open Digital Library Network of Extended Open Archives where each node acts as either a provider of data, services or both. Component = Node Protocol = Arc

OAI & ODL - CS Program Document Document ETD Program ETD Image Image ETD Video Video ETD-4 ETD Digital Library Search Filter Union Recent Browse PMH ODLRecent ODLBrowse ODLUnion ODLSearch ODLUnion PMH USER INTERFACE Students and researchers ETD collections 4.9. Example Open Digital Library

OAI & ODL - CS Prototype - FrontPage

OAI & ODL - CS Prototype - Search

OAI & ODL - CS Prototype - Browse

OAI & ODL - CS ODL Component Requirements Search –Retrieve a list of items –Index new items Annotate –Add annotation to item –Retrieve a list of annotations for an item

OAI & ODL - CS Layer 1 : OAI PMH Protocol for Metadata Harvesting –Transfer stream of metadata from one archive or component to another Service Requests –Identify, ListSets, ListMetadataFormats –ListIdentifiers, GetRecord, ListRecords

OAI & ODL - CS Layer 2 : Extended OAI-PMH OAI-PMH + extensions for general- purpose inter-component communication –Added in generic containers in every response for additional information –Added “ PutRecord ” to submit a record –Increased granularity to support times as well as dates (same as OAI-PMH v2.0) –Ignored DC requirement

OAI & ODL - CS Layer 3 : ODL Protocols Specialized protocol semantics for different components, e.g.: –Search component uses ODLSearch protocol ListRecords and ListIdentifiers embed query terms in “ set ” parameter –Annotation component uses ODLAnnotate protocol ListRecords and ListIdentifiers specify the item for which annotations are requested in the “ set ” parameter PutRecord adds an annotation to an item

OAI & ODL - CS Case Study: ETD ODL Prototype Electronic Thesis and Dissertation Open Digital Library

OAI & ODL - CS Ultimate Goal Package different configurations into instant DL systems DL building = component configuration All DLs speak the same language(s) Basic services are trivial to provide so more effort is spent on advanced capabilities of DLs

OAI & ODL - CS OAI and ODL components No one needs to start from scratch ! OAI Components create OAI data providers from existing systems or collections –XMLFile, ETD-db extensions, etc. ODL Components implement basic digital library services and communicate using ODL and OAI protocols –Search, Browse, Annotate, etc.

OAI & ODL - CS Basic Model OAI Data Provider ODL Service Provider Component OAI-PMH ODL Protocol User Interface

OAI & ODL - CS Simple Searching OAI Data Provider Search Engine Component OAI-PMH ODLSearch Search Engine WWW Interface XMLFile IRDB IRDB user interface

OAI & ODL - CS Software to be installed XML-File –create Open Archive from collection of XML files Harvester –test harvesting of data from OAI archive IRDB –simple search engine IRDB user interface

OAI & ODL - CS Steps in building it Install XMLFile –Test XMLFile Install IRDB –Connect to XMLFile ’ s baseURL –Test IRDB Install user interface –Connect to IRDB ’ s baseURL –Test user interface

OAI & ODL - CS Testing: Repository Explorer The Repository Explorer is a tool for testing Open Archives. You can issue individual commands and validate the results (using XML Schema) You can also perform a sequence of automatic tests

OAI & ODL - CS Wrap up and discussion We will build a simple digital library from components ! XML-File Data Provider IRDB Search Engine (with built-in Harvester) HTML User Interface

OAI & ODL - CS Final Thoughts OAI-PMH is a simple protocol for exporting and importing metadata ODL Components based on OAI can be used to build modular systems Lots of tools available now ! Lots of interest from other people already, even publishers!

OAI & ODL - CS Links Open Archives Initiative – OAI Metadata Harvesting Protocol – ol.htmhttp:// ol.htm Virginia Tech DLRL OAI Projects – Repository Explorer – CITIDEL –

OAI & ODL - CS More Links NDLTD – ARC Cross-Archive Search Service – XML Schema Validator – Dublin Core Metadata Initiative – E-Prints DL-in-a-box – XML Tools at W3C –

That ’ s All, Folks ! Questions ?