Download presentation
Presentation is loading. Please wait.
Published byAugustus Osborne Modified over 9 years ago
1
Metadata Harvesting Interoperable digital collections
2
Distributed libraries The reality in most digital libraries is that no one location has all the materials that may be of interest. It is often more efficient to allow a number of sites each to retain some of the materials. How can we assure clients that they will see all relevant resources, regardless of which library they search?
3
Two basic approaches One service provider with access to resources stored in multiple locations – Information about all the resources located at the service provider. – Services (DL scenarios) use the information to provide connections to resources at multiple locations Distributed services – Information kept with the resources – Services, local to each collection, interact with other collection sites
4
Two protocols Z39.50 – Developed before the web – Protocol for communicating with collection holders in order to provide services. Open Archives Initiative – Recent innovation – Central service provider gathers information from collection holders
5
Z39.50 - briefly Information Retrieval Service Definition and Protocol Specifications for Library Applications Initially developed over the OSI network standards Protocol for information exchange – Free the information seeker from the need to know the details of the target database configuration Each site provides services – Each service queries remote sites for needed information Information requests mapped to database queries at the collection site. Some inconsistency in the interpretation of queries.
6
Distributed Resources Multiple Services Service provider -- search, browse, compare, etc. Data provider Approach 1 - One service provider gathers information about data and uses it to provide services
7
Distributed data and services Approach 2: Each system is both a data repository and a service provider. Services query other data providers as needed. Search, browse Search, browse, compare
8
Service provider -- search, browse, compare, etc. Data provider Each server likely to have its own clients. Difference is whether the information exchange is periodic or ad hoc Hybrid systems
9
Open Archives Initiative (OAI) Web-based – Uses HTTP to communicate between sites Centralized server – Services provided from a site that has already gathered the information it needs for those services from a distributed collection of sites.
10
OAI PMH Interoperability through Metadata Exchange The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low- barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP. http://www.openarchives.org/pmh/
11
OAI - ORE Aggregations of Web Resources Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. These aggregations, sometimes called compound digital objects, may combine distributed resources with multiple media types including text, images, data, and video. The goal of these standards is to expose the rich content in these aggregations to applications that support authoring, deposit, exchange, visualization, reuse, and preservation. Although a motivating use case for the work is the changing nature of scholarship and scholarly communication, and the need for cyberinfrastructure to support that scholarship, the intent of the effort is to develop standards that generalize across all web-based information including the increasing popular social networks of “web 2.0”. http://www.openarchives.org/ore/
12
OAI-ORE example
13
OAI - ORE ORE allows aggregation of related web pages to form a logical unit – The representation allows access to all of the components of a resource at once. http://www.openarchives.org /ore/1.0/primer.html#Exampl e
14
Our focus We will concentrate on OAI – PMH – Allowing us to know about other resources of interest to our societies – Allowing others to know about the resources we have available
15
Older approaches - 1 Z39.50 – Special purpose protocol (machine to machine, not web interface) – Gathers information when it is requested, not on a scheduled basis.
16
OAI Compared to Z39.50 Z39.50OAI Content (Objects)Distributed World ViewBibliographic Object Presentation Data provider Searching isDistributedCentralized Search done byData providerService provider Metadata searched is Up to dateStale Semantic MappingWhen searchingMetadata delivery Source: oai.grainger.uiuc.edu/FinalReport/JCDL_2003_OAI_Intro.ppt
17
Open Archives Initiative Protocol for Metadata Harvesting -- OAI-PMH Repository OAI Harvester OAI HTTP req (OAI verb) HTTP resp (XML) OAI PMH defines an interface between the Harvester and any number of Repositories Metadata Provider Service Provider Implemented as CGI, ASP, PHP, or other Any system may serve as a harvester, repository, or both
18
OAI - PMH components Service Providers and Data Providers Requests and Responses http://www.oaforum.org/tutorial/english/page3.htm#section3
19
Records Metadata of a resource. Three parts – Header (required) Identifier (required: 1 only) Datestamp (required: 1 only) setSpec elements (optional: 0, 1, or more) Status attribute for deleted item – Metadata (required) XML encoded metadata with root tag, namespace Repositories must support Dublin Core, other formats optional – “About” statement (optional) Right statements Provenance statements
20
Identifiers Globally unique identifier Valid URI – Examples oai: : oai:etd.vt.edu:etd-1234567890 – Must resolve to one item No duplicates No reuse of previously used identifiers
21
Datestamps Date of last modification of a record – Used only for harvesting (meta metadata?) Mandatory for each item in the repository Two levels of granularity possible – YYYY-MM-DD – YYYY-MM-DDThh:mm:ssZ T … Z = Time zone -- must be GMT Allows harvesting incrementally -- get only what is new since last visit – Accessed by arguments from and until
22
The OAI-PMH verbs Each requests a specific response from a data repository
23
Identify Function: Description of the archive Example: http://www.language-archives.org/cgi-bin/olaca3.pl?verb=Identify Parameters: none Errors/exceptions: – badArgument (there should not be any) Response format: Element Example Ordinality ‡ repositoryName My Archive 1 baseURL http://archive.org/oai 1 protocolVersion 2.0 1 earliestDatestamp 1999-01-01 1 deleteRecords no, transient, persistent 1 granularity YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ 1 adminEmail oai-admin@archive.org + compression deflate, compress * description oai-identifier, eprints, friends, … * ‡ Ordinality: 1 = mandatory, 1 only; + = mandatory, 1 only; * = optional, 0 or more
24
2006-10-17T01:37:44Z http://www.language-archives.org/cgi- bin/olaca3.pl − OLAC Aggregator http://www.language-archives.org/cgi-bin/olaca3.pl 2.0 mailto:haejoong@ldc.upenn.edu 2002-12-14 no YYYY-MM-DD − <!-- maybe later identity --> Actual response from http://www.language-archives.org/cgi-bin/olaca3.pl?verb=Identify Continued
25
− oai OLACA.language- archives.org : oai:ethnologue.com:aaa Continued
26
− http://www.language-archives.org:8082/dp9/ Steven Bird & Gary Simons Coordinators mailto:olac-admin@language-archives.org Open Language Archives Community http://www.language-archives.org/ Philadelphia, U.S.A. − This repository contains all records from OLAC-registered archives. It is intended to be used by services which do not want to harvest individual OLAC archives. − Metadata may be used only subject to the access permissions given by the individual archives.
27
ListMetadataFormats Function: retrieve available metadata formats from archive Example: archive.org/oai-script?verb=ListMetadataFormats& identifier=oai:HUBerlin.de:3000218 Parameters: identifier (optional) Errors/exceptions: – badArgument – idDoesNotExist – noMetadataFormats
28
− 2006-10-17T01:58:06Z http://www.language-archives.org/cgi- bin/olaca3.pl − olac http://www.language-archives.org/OLAC/1.0/olac.xsd http://www.language- archives.org/OLAC/1.0/ − olac_display http://www.language-archives.org/OLAC/1.0/olac.xsd http://www.language- archives.org/OLAC/1.0/ − oai_dc http://www.openarchives.org/OAI/2.0/oai_dc.xsd http://www.openarchives.org/OAI/2.0/oai_dc/ Response to http://www.language- archives.org/cgi-bin/ olaca3.pl?verb=ListMetadataFormatshttp://www.language- archives.org/cgi-bin/ olaca3.pl?verb=ListMetadataFormats
29
ListSets Function: retrieve set structure of a repository Example: archive.org/oai-script?verb=ListSets Parameters: resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – noSetHierarchy Sets are optional and are used to divide a repository into separate units that will be of interest to different harvesters.
30
ListIdentifiers Function: abbieviated form of ListRecords, retrieve only headers Example: archive.org/oai-script?verb=ListIdentifiers&metadataPrefix= oai_dc&from=2002-12-01 Parameters: – from (optional) – until (optional) – metadataPrefix (required) – set (optional) – resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – cannotDisseminateFormat – noRecordsMatch – noSetHierarchy
31
ListRecords Function: harvest records from a repository Example: archive.org/oai-script?verb=ListRecords& metadataPrefix=oai_dc&set=biology Parameters: – from (optional) – until (optional) – metadataPrefix (required) – set (optional) – resumptionToken (exclusive) Errors/exceptions: – badArgument – badResumptionToken – cannotDisseminateFormat – noRecordsMatch – noSetHierarchy
32
GetRecord Function: retrieve an individual metadata record from a repository Example: archive.org/oai-script?verb=GetRecord&identifier=oai:HUBerlin.de: 3000218 &metadataPrefix=oai_dc Parameters: – Identifier (required) – metadataPrefix (required) Errors/exceptions: – badArgument – cannotDisseminateFormat – idDoesNotExist
35
Interoperability The goal: communication, without human intervention, between information sources – Books that “talk to each other” Live links for references Knowledge of how to find relevant resources when needed Ability to query other information locations
36
Protocols Precise rules for interactions between independent processes – Format of the messages Both structure and content – Specified behavior in response to specific messages Many ways to accomplish the same result, but both sides must have the same understanding of the rules of engagement.
37
Protocol Types RPC model – Point to point – Completely open to definition by developer Verbs (methods) Nouns (objects, resources) – Useful to closed community or group who know about the availability of the resource.
38
SOAP Initial words of the acronym have been discontinued. Initially developed as part of the Microsoft.NET paradigm – Now in W3C committee Stateless, one-way message exchange paradigm XML encoded Flexibility of RPC, but more constrained in the way communication is formatted.
39
REST REpresentational State Transfer An after-the-fact definition of the architecture of the World Wide Web The model is – Client/server – Stateless – Cacheable – Layered Resource interface constrained – Restricted verbs – Restricted content types
40
REST and RPC RPC provides flexibility for any type of interaction between any type of resources REST provides consistency to allow interaction among resources without prior discovery of accepted actions and responses.
41
SOAP and REST Debate in the Web community about which is the better paradigm for application development REST -- restricted, but simple extension of existing Web processes SOAP -- added flexibility with cost in terms of bandwidth, security, complexity for development
42
References Giving SOAP a REST http://www.devx.com/DevX/Article/8155 http://www.devx.com/DevX/Article/8155 SOAP Version 1.2 Part 0: Primer http://www.w3.org/TR/2003/REC-soap12-part0- 20030624/#L1153 http://www.w3.org/TR/2003/REC-soap12-part0- 20030624/#L1153 OAI For Beginners - The Open Archives Forum online tutorial: http://www.oaforum.org/tutorial/index.phphttp://www.oaforum.org/tutorial/index.php Z39.50 Resource Page: http://www.niso.org/standards/resources/Z3950_Resourc es.html http://www.niso.org/standards/resources/Z3950_Resourc es.html Z39.50 An Overview of Development and the Future (1995) http://www.cqs.washington.edu/~camel/z/z.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.