Georges Arnaout Chaitanya Krishna The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Website: http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm Editors: Carl Lagoze (Cornell University) Herbert Van de Sompel (Los Alamos Laboratory) Michael Nelson (NASA Langley Research Ctr) Simeon Warner (Cornell University) Presented by: Georges Arnaout Chaitanya Krishna CS 791/891-WEB SYNDICATION FORMATS 1
OAI Open Archives Initiative The protocol is openly documented, and metadata is “exposed” to at least some peer group Archive defined as a “collection of stuff” -- or “Repository” OAI is happening at break-neck speed... figure reference: http://www.cs.odu.edu/~mln/oaf-nelson.ppt CS 791/891-WEB SYNDICATION FORMATS
But what is interoperability ??? Definition OAI-PMH: - A protocol that provides an application-independent interoperability framework based on metadata harvesting. But what is interoperability ??? CS 791/891-WEB SYNDICATION FORMATS 3
What is Interoperability? It is the ability of exchanging and using information from 2 or more applications or systems. CS 791/891-WEB SYNDICATION FORMATS 4
CS 791/891-WEB SYNDICATION FORMATS What’s a Harvester ??? it’s a client application that issues OAI-PMH requests, operated in order to collect metadata from the repositories. CS 791/891-WEB SYNDICATION FORMATS 5
CS 791/891-WEB SYNDICATION FORMATS What is a repository ??? It is a BIG database – A place where data is stored and maintained. It is a network accessible server. The data contained in the repository are the metadata that are exposed to harvesters. CS 791/891-WEB SYNDICATION FORMATS 6
Verbs Summary Verb Function Identify description of repository ListMetadataFormats metadata formats supported by repository ListSets sets defined by repository ListIdentifiers OAI unique ids contained in repository ListRecords listing of N records GetRecord listing of a single record figure reference: http://www.cs.odu.edu/~mln/oaf-nelson.ppt CS 791/891-WEB SYNDICATION FORMATS
CS 791/891-WEB SYNDICATION FORMATS OAI-PMH Data Model OAI-PMH distinguishes between 3 distinct entities related to the exposed metadata: 1- Resource: The object that metadata is about. 2- Item: Instance of a metadata object -That instance may be disseminated on the fly, cross-walked from some canonical form , actually stored in repository. 3- Record: is metadata in a specific metadata format. CS 791/891-WEB SYNDICATION FORMATS 8
Example: resource item = identifier all available metadata item about David item Dublin Core metadata MARC SPECTRUM records record = identifier + metadata format + datestamp figure reference: http://www.cs.odu.edu/~mln/oaf-nelson.ppt CS 791/891-WEB SYNDICATION FORMATS
The XML-encoding of records Header Metadata About http://www.openarchives.org/OAI/openarchivesprotocol.html#Record Above link shows encoding of a record in XML CS 791/891-WEB SYNDICATION FORMATS 10
What happens if a record was deleted from the repository??? deleteRecord CS 791/891-WEB SYNDICATION FORMATS 11
What happens if a record was deleted from the repository??? Repositories must declare one of 3 levels of support: 1- no repository does not maintain information about deletions MUST NOT reveal a deleted status in any response. 2- persistent (opposite) maintains info about deletions with no time limit MUST persistently keep track of deletions and reveal the status of a deleted record. 3- transient persistent but to a limited time. Such a repository MAY reveal a deleted status. Not revealing the status is acceptable CS 791/891-WEB SYNDICATION FORMATS 12
Selective Harvesting (datestamp and SET) Selective harvesting allows harvesters to limit harvest requests to portions of the metadata available from a repository. CS 791/891-WEB SYNDICATION FORMATS 13
Selective Harvesting via datestamps Request: http://www3.bth.se/servlet/Cupp?verb=Identify&from=2006-01-01&until=2007-01-01 CS 791/891-WEB SYNDICATION FORMATS
CS 791/891-WEB SYNDICATION FORMATS SET membership A set is an optional construct for grouping items for the purpose of selective harvesting. Think of it as a Fraternity. A student (item) may belong to a fraternity. Not all students belong to a fraternity. CS 791/891-WEB SYNDICATION FORMATS 15
Selective Harvesting Via Set <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> CS 791/891-WEB SYNDICATION FORMATS
CS 791/891-WEB SYNDICATION FORMATS Date/time: 1957-03-20T20:30:00Z is: UTC 8:30:00 PM on March 20th 1957 Encoded in: ISO8601, Z-notation Request: YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ. Response: YYYY-MM-DDThh:mm:ssZ. CS 791/891-WEB SYNDICATION FORMATS 17
The BIG PICTURE CS 791/891-WEB SYNDICATION FORMATS 18 Figure reference:http://www.oaforum.org/tutorial/english/page3.htm CS 791/891-WEB SYNDICATION FORMATS 18
Request/Response Request is encoded in http Response in XML figure reference: http://www.cs.odu.edu/~mln/oai-cendi.ppt CS 791/891-WEB SYNDICATION FORMATS
CS 791/891-WEB SYNDICATION FORMATS GET Example http://export.arxiv.org/oai2?verb=GetRecord&identifier=oai:arXiv.org:cs/0112017&metadataPrefix=oai_dc CS 791/891-WEB SYNDICATION FORMATS 20
CS 791/891-WEB SYNDICATION FORMATS Flow Control List requests: A number of OAI-PMH requests. The number could be very large partition them among a series of requests and response CS 791/891-WEB SYNDICATION FORMATS 21
Flow Control Example harvester RDBMS ListRecords Records 1-100, resumptionToken=AXad31 ListRecords, resumptionToken=AXad31 Records 101-200, resumptionToken=pQ22-x ListRecords, resumptionToken=pQ22-x Records 201-277 figure reference: http://www.cs.odu.edu/~mln/oaf-nelson.ppt CS 791/891-WEB SYNDICATION FORMATS
Response with no errors <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH> <responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“GetRecord”… …>http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> </GetRecord> </OAI-PMH> CS 791/891-WEB SYNDICATION FORMATS 23
CS 791/891-WEB SYNDICATION FORMATS Response with errors In event of an error or exception condition, repositories must indicate OAI-PMH errors by including the error in the response. Request:http://arXiv.org/oai2? verb=nastyVerb Response: <?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-06-01T19:20:30Z</responseDate> <request verb="ListRecords" from="2002-06-01T02:00:00Z" until="2002-06-01T03:020:00Z" metadataPrefix="oai_marc"> http://memory.loc.gov/cgi-bin/oai</request> <error code="badArgument"/> </OAI-PMH> Figure reference:http://www.openarchives.org/OAI/openarchivesprotocol.html#Identify CS 791/891-WEB SYNDICATION FORMATS 24
CS 791/891-WEB SYNDICATION FORMATS Request Verbs There are six different request types: 1) GetRecord 2) Identify 3) ListIdentifiers 4) ListMetadataFormats 5) ListRecords 6) ListSets CS 791/891-WEB SYNDICATION FORMATS
Argument Summary metadataPrefix from until set resumptionToken identifier Identify ListMetadata Formats optional ListSets exclusive ListIdentifiers ListRecords GetRecord Figure reference:http://www.cs.odu.edu/~mln/jcdl03/ CS 791/891-WEB SYNDICATION FORMATS
Error Summary BA NMF IDDNE BRT NSH CDF NRM Identify ListMetadata Formats NMF IDDNE ListSets BRT NSH ListIdentifiers CDF NRM ListRecords GetRecord Figure reference:http://www.cs.odu.edu/~mln/jcdl03/ CS 791/891-WEB SYNDICATION FORMATS
CS 791/891-WEB SYNDICATION FORMATS Dublin Core The Dublin Core metadata element set is a standard for cross-domain information resource description. Mandated metadata format since the initial release of protocol. Purpose of this requirement was to promote interoperability among data providers. CS 791/891-WEB SYNDICATION FORMATS 28
Example http://memory.loc.gov/cgi-bin/oai2_0?verb=Identify http://edoc.hu-berlin.de/OAI-2.0?verb=Identify
Repository explorer and example http://re.cs.uct.ac.za/ We shall discuss following HU-Berlin example in above repository explorer http://edoc.hu-berlin.de/OAI-2.0
OAI-PMH service provider http://www.ncstrl.org/ this is a service provider using OAI-PMH. CS 791/891-WEB SYNDICATION FORMATS
CS 791/891-WEB SYNDICATION FORMATS Conclusion OAI-PMH allows for any metadata format, so long as it is encoded in XML with an XML schema. All repositories must support oai_dc for a minimum level of interoperability. OAI-PMH now defines a single XML Schema to validate responses to all OAI-PMH requests In a successful and trend-setting collaboration with the Dublin Core Metadata Initiative, an XML Schema for unqualified Dublin Core has been created, which is hosted by the DCMI and used in the delivery of metadata in the mandatory DC format in the OAI-PMH. CS 791/891-WEB SYNDICATION FORMATS 32
CS 791/891-WEB SYNDICATION FORMATS Questions? What are the benefits of OAI-PMH? Is the open archives initiative only concerned with metadata? Why choosing the Dublin Core as the standard for OAI-PMH? CS 791/891-WEB SYNDICATION FORMATS 33
CS 791/891-WEB SYNDICATION FORMATS References http://www.openarchives.org/OAI/openarchivesprotocol.html http://www.oaforum.org/tutorial/ http://dublincore.org/ http://www.rsp.ac.uk/usage/harvesters http://www.cs.odu.edu/~mln/jcdl03/ http://www.cs.odu.edu/~mln/oai-cendi.ppt [CENDI Meeting, MD(4/3/02)] http://www.cs.odu.edu/~mln/oaf-nelson.ppt [OA Forum Workshop, Pisa Italy(5/13/02)] CS 791/891-WEB SYNDICATION FORMATS 34