The Open Archives Initiative Movement Kurt Maly Old Dominion University Norfolk Virginia, USA Brazilian DL.

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
IST Humboldt University Berlin, Germany – Computer and Media Service – Electronic Publishing Group Birgit Matthaei, 4th Sept. 2003, Bath,
Heinrich Stamerjohanns Institute for Science Networking Distributed Open Archives Dr. Heinrich Stamerjohanns Institute for Science Networking at the University.
OLAC Process and OLAC Protocol: A Guided Tour Gary F. Simons SIL International ___________________________ OLAC Workshop 10 Dec 2002, Philadelphia.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
Service Providers: Future Perspectives Michael L. Nelson Old Dominion University Norfolk Virginia, USA 2nd Workshop.
Service Providers: Future Perspectives Michael L. Nelson Old Dominion University Norfolk Virginia, USA
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
14 October 2003ADASS 2003 – Strasbourg1 Resource Registries for the Virtual Observatory R.Plante (NCSA), G. Greene (STScI), R. Hanisch (STScI), T. McGlynn.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
Beth Forrest Warner Director, Digital Library Initiatives University of Kansas Presentation to Oregon State University Library May 5, 2003.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
Institutional Repositories Tools for scholarship Mary Westell University of Calgary AMTEC Conference May 26, 2005.
The Open Archives Initiative Simeon Warner Cornell University, Ithaca, NY, USA CREPUQ 2002, Montréal, Canada 14:00, 24 October 2002.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
Open Archives for Library and Information Science: an international experience Antonella de Robbio and Paula Sequeiros IV EBIB Conference: Open Access.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Open Access to Grey Literature on e-Infrastructures: The BELIEF-II Project Digital Library Stefania Biagioni, Donatella Castelli, Franco Zoppi CNR-ISTI.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
CYCLADES IST CYCLADES: A Personalised Collaborative Digital Library Environment Umberto Straccia I.S.T.I. - C.N.R. Pisa (ITALY)
Sharing With the Open Archives Initiative Jenn Riley Metadata Librarian Indiana University.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
U.S. Government Use of the OAI-PMH Michael L. Nelson Old Dominion University Norfolk Virginia, USA ISTEC / NSF.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Mod_oai: Metadata Harvesting for Everyone Michael L. Nelson, Herbert Van de Sompel, Xiaoming Liu, Aravind Elango
International Planetary Data Alliance Registry Project Update September 16, 2011.
Systems for scholarly communication
OAI and Metadata Harvesting
Open Archives Initiative
Digitometric Services for Open Archives Environments
Open Archive Initiative
Introduction to Digital Libraries Week 13: Reference Linking & OpenURL
Institutional Repositories
Presentation transcript:

The Open Archives Initiative Movement Kurt Maly Old Dominion University Norfolk Virginia, USA Brazilian DL international conference Política de Informação em Bibliotecas Digitais Campinas, Brazil March 19-22, 2003

Outline* OpenArchivesInitiative - history and summary description OAI services Why the OAI-PMH is not important Defining the OAI-PMH data model More interesting services (DP9, Celestial, Kepler) * Slides from Herbert Van de Sompel & Carl Lagoze & Michael Nelson included

herbert van de sompel The Open Archives Initiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. Paul Ginsparg, Rick Luce & Herbert Van de Sompel OAI origin herbert van de sompel

e-print e-print accessibility e-print herbert van de sompel e-print

e-print accessibility e-print herbert van de sompel e-print

preprint solutions herbert van de sompel Santa Fe meeting: improve accessibility of preprints by improving searchability via the provision of an interoperability spec

Core concepts of Santa Fe convention herbert van de sompel low-barrier interoperability data-provider & service-provider model metadata harvesting model shared metadata format and parallel, community- specific metadata formats acceptable use Dienst subset OAMS XML reply HTTP based Gentelmen’s agreement

metadata harvesting herbert van de sompel metadata e-print

metadata harvesting herbert van de sompel metadata Author Title Abstract Identifer e-print

interest from other communities herbert van de sompel Digital Library Federation meetings ~ research library community has many materials for which they would like to ‘expose’ metadata OAI San Antonio meeting: ~ interest from librarians, publishers, others,...

resulting actions: organizational herbert van de sompel establish organizational stability for the OAI: institutional backing from CNI & DLF steering committee: policy guidance technical committee: technical specifications executive group: day to day coordination workshops: public dissemination, feedback

resulting actions: technical herbert van de sompel [09/2000] revise specifications to allow adoption beyond preprints: technical committee [09/ /2001]compile new specifications: editing by Carl and Herbert [11/ /2001] alpha-test specifications: oai-alpha group [01/2001] discontinue the Santa Fe Convention [01/2001] release version 1.0 of the OAI protocol [07/2001) version 1.1 [06/2002] version 2.0

core concepts in OAI 1.0 herbert van de sompel low-barrier interoperability data-provider & service-provider model metadata harvesting model shared metadata format and parallel, community- specific metadata formats acceptable use flexibility OAI 1.0 protocol Dublin Core HTTP based Community specific Reply XML Schema Self contained

low-barrier interop umbrella herbert van de sompel metadata OPACimageFTXTA&Ie-print

low-barrier interop umbrella herbert van de sompel metadata OPACimageFTXTA&Ie-print Author Title Abstract Identifer

communication re OAI herbert van de sompel lists: subscribe via oai-general list [replaces UPS list; UPS- subscribers will be moved] oai-implementers list web: FAQ: mail:

freeze specifications for months: stable for experimentation; not definitive minimize risk for early adopters maximize chances for future interoperability across communities revision of specifications herbert van de sompel

The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. new OAI mission statement herbert van de sompel

The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. Continued support of this work remains a cornerstone of the Open Archives program. new OAI mission statement herbert van de sompel

The fundamental technological framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials. [...] new OAI mission statement herbert van de sompel

OAI-PMH Meeting History OAI Open Day, Washington DC 1/2001 CERN meeting 10/2002 Protocol definition, development tools DPs, retrofitting existing DLs SPs, new services Socio-Economic- Political Issues

Shift of Topics From the protocol itself, supporting & debugging tools and how to retrofit (existing) DLs… …to building (new) services that use the OAI-PMH as a core technology and reporting on their impact to the institution/community

NTRS metadata harvesting replacement for nasa.gov/cgi-bin/NTRS –previous NTRS was based on distributed searching –hierarchical harvesting (nigh) publicly available

Arc harvests all known archives first end-user service provider source available through SourceForge hierarchical harvesting

NCSTRL metadata harvesting replacement for Dienst-based NCSTRL based on Arc computer science metadata

Archon physics metadata based on Arc features: –citation indexing –equation-based searching

Torii physics metadata features –personalization –recommendations –WAP access

iCite physics metadata features –citation based access to arXiv metadata

my.OAI covers all registered metadata features –result sets –personalization –many other advanced features

Cyclades scientific metadata features –personalization –recommendations –collaboration status?

citebase arXiv metadata citation based indexing, reporting

OAIster harvests all known archives

Public Knowledge Project domain-specific filtering of harvested metadata (?)

Perseus they claim to harvest all DPs, but only humanities related DPs appear in the pull down menu

Service Providers It is clear that SPs are proliferating, despite (because of?) the inherent bias toward DPs in the protocol –easy to be a DP -> many DPs -> SPs eventually emerge –hard to be a DP -> SPs starve –currently 5x DPs more than SPs SPs are beginning to offer increasingly sophisticated services –competitive market originally envisioned for SPs is emerging

Why The OAI-PMH is NOT Important Users don’t care OAI-PMH is middleware –if done right, the uninterested user should never have to know OAI Inside Using the OAI-PMH does not insure a good SP OAI-PMH is (or is becoming) HTTP for DLs –few people get excited about http now http & OAI-PMH are core technologies whose presence is now assumed

Other Uses For the OAI-PMH Assumptions: –Traditional DLs / SPs will continue on their present path of increasing sophistication citation indexing, search results viz, personalization, recommendations, subject-based filtering, etc. –growth rates remain the same (5x DPs as SPs) Premise: OAI-PMH is applicable to any scenario that needs to update / synchronize distributed state –Future opportunities are possible by creatively interpreting the OAI-PMH data model

resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records item = identifier record = identifier + metadata format + datestamp set-membership is item-level property OAI-PMH Data Model

Typical Values repository –collection of publications resource –scholarly publication item –all metadata (DC + MARC) record –a single metadata format datestamp –last update / addition of a record metadata format –bibliographic metadata format set –originating institution or subject categories

Interesting Services DP9 –gateway to expose repository contents in HTML suitable for web crawlers Celestial –OAI “cache”, also 1.1 -> 2.0 converter Static (mini-) repositories –XML files, based on OLAC work OpenURL metadata format registries –record = metadata format

DP9 Architecture see Liu et al., JCDL 2002; Slide from Liu

DP9 Formatting Format of URLs – report-10 &prefix=oai_dc – HTML Meta tags –Some crawlers (such as Inktomi) use the HTML meta tags to index a Web pages; DP9 also maps Dublin Core metadata to corresponding HTML meta tags. –For pages that are designed exclusively for robots navigation, a noindex robots meta tag is used – X-FORWARDED-FOR header to distinguish between different users coming in via a proxy Slide from Liu

Celestial Developed by Southampton – –designed to complement DP9 –see Liu, Brody, et al., D-Lib Magazine 8(11) Where DP9 is a non-caching proxy, Celestial caches the metadata records –can off-load work from individual archives, higher availability –can harvest 1.1, 2.0; exports in 2.0

“Static” Repositories Premise: a repository does not wish to have an executing program on its site, so it has a “static” XML file with some of the OAI- PMH responses in place –Design still being discussed accessed through a proxy could be a low functionality node, or the XML file could be produced by a process and moved outside a firewall Based on OLAC work by Bird & Simons –

Original Kepler Framework Support "personal data providers" or "archivelets“ An archivelet is a self-contained, self- installing software package that easily allows a researcher to create and maintain a small, OAI-PMH-compliant archive. General public have a seamless access to the totality of all such published materials.

Enhanced Kepler Framework (EKF) Improve the scalability and service reliability by the concept of buddy nodes and SuperNodes. Extend OAI-PMH with Push model and hybrid push/pull model. Rapid discovery of content as soon as it is published. Works with firewall and network address translation proxy Support community-based installation and integration.

Motivation behind Kepler The success of Peer-to-Peer (P2P) network. The vision of author self-archiving. Efficient repository synchronization technology defined by OAI-PMH.

Peer to Peer Network File sharing P2P networks such as Napster, Gnutella, Freenet. LOCKSS (Lots Of Copies Keep Stuff Safe) provides long-term preservation of scientific journals. Recent arrival of FastTrack and openFT: – A 2-tier system :SuperNodes and Nodes to solve scalability problem. – Kazaa (based on FastTrack technology) claims 20M downloads and scales well.

Author Self-Archiving Subject based: ArXiv.org is a very successful subject-based self-archiving service. Since its inception in 1991 there are nearly 200K documents submitted. Institutional Based: Eprints software from Southampton. Personal Based: Personal homepage (indexed by researchindex) and Kepler (indexed by OAI-PMH compliant service such as arc.

Original Kepler Framework

Problem of Original Kepler framework Centralized Registration Server (LDAP): Increases the complexity of installing the Kepler server side software. Open Protocol: The archivelet uses a non- standard protocol for registration, thus inhibiting the development of third party applications. Security and NAT (Network Address Translation): In many instances, an archivelet is behind a firewall or NAT proxy, which makes it difficult for the service provider to harvest the archivelet.

Problem of Original Kepler Framework Availability: The archivelet is extremely unstable. Some use dynamic IP address. Freshness: Large number of archivelets with sparse changes. This doesn’t fit well with OAI- PMH’s “poll”-based mode. Full text vs. Metadata: As the archivelet is not up all the time, it is desirable to harvest full-text documents as well to improve the availability of full-text to end-users.

Improvement through EKF “push” and a hybrid “push/pull” model to address the scalability, security and freshness problem. SuperNode and Buddy node to improve scalability and server availability. The protocol in EKF is open and we hope it will inspire third-party development of Kepler tools

EKF-Architecture

EKF- Push/Pull model Pull – Retrieval without prior coordination (e.g., as used by current robots and OAI- PMH) Hybrid Push/Pull – Retrieval after notification Push –Notification followed by a provider push.

EKF-Push/Pull Model

EKF- Why extend OAI-PMH? "Update Overhead" problem. –Frequent crawling has to be done to synchronize the data providers and service providers. – It is inefficient if the data providers seldom change during a harvest interval. – On the other hand, without frequent crawling, service providers may become inconsistent with data providers.

Extension of OAI-PMH in Service Provider Side The AddFriend and Notify support push/pull hybrid model. –The AddFriend verb informs the service provider of the existence of a data provider. –The Notify verb informs the service provider that a data provider is up/down or any new data is available. A PushMetadata verb is added to support the push model.

Design and Implementation Loose Name Space Management –Use address to uniquely identify archivelet. –Avoid the effort of maintaining a global namespace. Sample –oai: kepler.cs.odu.edu:

Archivelet Components –File System based. OAI-PMH-compliant repository –Publication tool. –A simple extended HTTP server which supports the OAI-PMH protocol and push/pull model. It might act as a SuperNode and BuddyNode at the same time.

SuperNode A SuperNode has all the functionalities of an archivelet. The SuperNode collects all documents and metadata from archivelets in its friends list, and builds value-added services over these harvested data. SuperNode is typically deployed at an institution with a high quality network.

Protocol Syntax Add a Friend: Request to be added as a friend –? verb=AddFriend&id=&baseURL= Notify: used for major events of an archivelet, including startup, shutdown and document update. –?verb=Notify&event=[start/stop/update]&id=&baseUR L PushMetadata: –? verb=PushMetadata&contents=

Optional Implementation Features outside of kernel protocol, but may operate as a “hook” to attract more usages of Kepler. –Cache full text document. –Query service in archivelet. –Security and Spoofing Issues

Conclusions Protocol / transport gateways –Dienst OAI DOG, –Z39.50 ZMARCO (UIUC) –SOAP VT (Suleman) & ODU (Zubair)

OAI-PMH Will Have Arrived When: general web robots issue OAI-PMH verbs –…DP9 will no longer be needed –requires shift in “control”: harvester or repository? mod_oai is developed and is included in the default Apache configuration OAI-PMH fades into the background –similar to TCP/IP, http, XML, etc. –next year’s workshop is on OpenURL

Conclusions DPs continue to proliferate –and spawn SPs! SPs are / are becoming a competitive market –e.g., at least 10 different interfaces to arXiv metadata –growing sophistication of services –differentiation of SPs will be on features that have little to nothing to do with OAI-PMH