Download presentation
Presentation is loading. Please wait.
Published byHerbert Riley Modified over 9 years ago
1
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University With apologies to Carl Lagoze
2
Where I come from... Trained economist Early (1991) visionary of free online scholarship Creator of NetEc in 1993 Principal founder of RePEc in 1997 –Largest distributed academic DL in the world –Collection that is open for Contribution Usage –Grown to over 200 archives, over 10 partly interoperable user services
3
Metadata collection process Metadata is expensive to collect. Free online scholarship requires academic self- documentation Building free metadata collection is difficult no established business model no established funding channels Only a collaborative effort will be succeed.
4
The example of eprint servers attractive building block for the transformation of scholarly communication but isolated efforts do not make for a scholarly communication system need to federate archives need to interoperate with other scholarly communication components
5
e-print Example: e-print accessibility e-print
6
Example: e-print accessibility e-print
7
metadata harvesting metadata e-print
8
metadata harvesting metadata Author Title Abstract Identifer e-print
9
other examples within the area of scholarly commuication already implemented in RePEc Sharing of log data between service providers Provision non-document data for document data provider personal data institutional data
10
core concepts in OAI 1.1 shared metadata format OAI 1.1 protocol Dublin Core HTTP based Community specific Reply XML Schema Self contained low-barrier interoperability data-provider / service-provider model metadata harvesting model parallel metadata formats
11
harvester / repository repositoryrepository oai protocol harvesterharvester support data harvesting data items
12
OAI protocol requests Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository harvesterharvester service providerdata provider
13
HTTP encoding - requests BASE-URL ----------->an.oa.org/OAI-script keyword arguments -->verb=ListIdentifers&set=S1 GET http://an.oa.org/OAI-script?verb=ListIdentifers&set=S1 POST POST http://an.oa.org/OAI-script HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1
14
HTTP encoding - responses 2000-19-01T19:30:30-04:00 http://an.oa.org/OAI-script?verb=GetRecord &identifier=oai%3AarXiv%3A0001 &metadataPrefix=oai_dc record contents additional records response header xml namespaces response data
15
record oai:eg:001 1999-01-01 My Example No restrictions protocol support format-specific metadata community-specific record data
16
selective harvesting - datestamps repositoryrepository harvest within date range record
17
selective harvesting - sets repositoryrepository harvest within set S1 record S2
18
Communication re OAI lists: subscribe via http://www.openarchives.org oai-general list oai-implementers list web: http://www.openarchives.org FAQ: http://www.openarchives.org/faq.htm mail: openarchives@openarchives.org
19
Version 1.1 frozen specifications for 12 -18 months: stable for experimentation; not definitive minimize risk for early adopters maximize chances for future interoperability across communities revision of specifications The technical committee are working on the “definitive” specifications. They will come out 2002-05-01.
20
The technical committee - Herbert Van de Sompel (LANL) - Carl Lagoze (Cornell U) - Thomas Krichel (Long Island U & RePEc) - Jeff Young (OCLC) - Tim Cole (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U & arXiv) - Michael Nelson (NASA & NACA) - Caroline Arms (Library of Congress) - Muhammad Zubair (Old Dominion U & ARC) - Steven Bird (U Penn & Open Language Archive Community) - Robert Tansley (MIT & DSpace) - Andy Powell (UK (UKOLN) - Mogens Sandfær (DTV, Denmark) - Thomas Severiens(Oldenburg U & Physnet) - Thomas Baron (CERN) - Les Carr(U of Southampton) - Thomas Place(Tilburg U)
21
Issues in front of the committee Error Handling:SOAP: Harvesting Granularity: Mandatory DC: Set Semantics and Collection Description: XML Schema: Result Set Filtering: Flow Control, Result Set Cardinality, Response Level Container: Awareness Mechanisms: Multiple Metadata Return and "Best" Metadata Selection: Machine Readable Rights Management: From GetRecord to GetRecords: Dedupping Issues: idempotency of base-urls: xml format for mini-archives: response compression:
22
Thank you for your attention! Thomas Krichel Palmer School of Library and Information Science 720 Northern Boulevard Brookville NY 11548-1300 USA http://openlib.org/home/krichel Krichel@openlib.org
23
Error handling badArgument badGranularity badResumptionToken badVerb cannotDisseminateFormat idDoesNotExist noRecordsMatch noSetHierarchy
24
SOAP SOAP is a mechanism to transmit service requests over the Internet. As yet it is not a fully matured protocol. A SOAP compatible version of the protocol may be written later.
25
Harvesting granuality From and Until arguments may allow a more finer time stemps, up to one second. Level supported is chosen by the data provider and set in the response to the Identify verb. All times expressed in UTC.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.