Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson The Open Archives Initiative Michael L. Nelson Computer Science,

Similar presentations


Presentation on theme: "The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson The Open Archives Initiative Michael L. Nelson Computer Science,"— Presentation transcript:

1 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson The Open Archives Initiative Michael L. Nelson Computer Science, Old Dominion University www.cs.odu.edu/~mln/ www.openarchives.org

2 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Open Archives Initiative Protocol for Metadata Harvesting data providers / repositories: o “A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in [the OAI-PMH document]. A repository is managed by a data provider to expose metadata to harvesters.” service providers / harvesters: o “A harvester is a client application that issues OAI-PMH requests. A harvester is operated by a service provider as a means of collecting metadata from repositories.”

3 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Data Providers / Service Providers data providers (repositories) service providers (harvesters)

4 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Overview of OAI-PMH Verbs VerbFunction Identifydescription of repository ListMetadataFormatsmetadata formats supported by repo ListSetssets defined by repository ListIdentifiersOAI unique ids contained in repo ListRecordslisting of N records GetRecordlisting of a single record repository metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)

5 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson resource item Dublin Core metadata MARCXML metadata records entry point to all records pertaining to the resource metadata pertaining to the resource OAI-PMHidentifier metadataPrefix datestamp OAI-PMH identifierOAI-PMH sets OAI-PMH data model

6 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Complexity Comes to OAI-PMH… First noticed in how people would populate their Dublin Core records o people need the HTML splash page o crawlers need the PDF file Ad-hoc conventions and methods used to expose the repository’s knowledge about the structure of the object Next three slides taken from “Resource Harvesting Within the OAI-PMH Framework” o http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html

7 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Dublin Core Encoding Type 1 A Simple Parallel-Plate Resonator Technique for Microwave. Characterization of Thin Resistive Films Vorobiev, A. ING-INF/01 Elettronica A parallel-plate resonator method is proposed for non-destructive characterisation of resistive films used in microwave integrated circuits. A slot made in one... Microwave engineering Europe 2002 Documento relativo ad una Conferenza o altro Evento PeerReviewed http://amsacta.cib.unibo.it/archive/00000014/ pdf http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf locator of resource splash page

8 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Dublin Core Encoding Type 2 … http://amsacta.cib.unibo.it/archive/00000014/ http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf … locator of resource splash page

9 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Dublin Core Encoding Type 3 … http://amsacta.cib.unibo.it/archive/00000014/ http://resolver.unibo.it/00000014/ http://amsacta.cib.unibo.it/archive/00000014/01/GaAs_1_Vorobiev.pdf … locator of resource splash page

10 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson OAI Object Re-Use and Exchange Develop, identify, and profile extensible standards and protocols to allow repositories, agents, and services to interoperate in the context of use and reuse of compound digital objects beyond the boundaries of the holding repositories. Aim for more effective and consistent ways: o to facilitate discovery of these objects, o to reference (link to) these objects (and parts thereof), o to obtain a variety of disseminations of these objects, o to aggregate and disaggregate these objects, o Enable processing by automated agents

11 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson The Structure of Compound Objects is Obfuscated When Mapped to the Web

12 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Useful for humans and useful for applications is often different HTTP LINK HEADER

13 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Through the Resource Map, the Web application sees the compound object

14 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson This approach reveals compound objects in the Web graph

15 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson OAI-PMHOAI-ORE Repository structureObject structure Metadata centricResource centric Metadata harvestingObject re-use (obtain, harvest, register) OAI-PMH and OAI-ORE are complimentary; o you can do one without the other o you can do them together OAI: Its Not Just for Metadata Harvesting Anymore…

16 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson OAI-ORE : Current Status Ongoing definition of the ORE framework o Reach joint problem statement o Issues regarding identification o Model for ORE resource o Publishing ORE resources to the Web o Discovering ORE resources Review of appropriate technologies for ORE Model and Resource Map o ATOM o DID/DIDL, IMS/CP, METS, Ramlet o RDF, RDF/XML o Dublin Core Abstract Model o …

17 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson OAI-ORE : Current Status Explore demonstrators using these concepts in preparation of May 2007 ORE Technical Committee meeting Post May 2007 meeting: o Hopefully work towards alpha specs for ORE resource, Resource Map, discovery of ORE resource o Experimentation with alpha specs

18 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson My research group’s approach to OAI/Preservation integration…

19 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Preservation: Fortress Model 1. Get a lot of $ 2. Buy a lot of disks, machines, tapes, etc. 3. Hire an army of staff 4. Load a small amount of data 5. “Look upon my archive ye Mighty, and despair!” image from: http://www.itunisie.com/tourisme/excursion/tabarka/images/fort.jpg Five Easy Steps for Preservation:

20 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Alternate Models of Preservation Lazy Preservation o Let Google, IA et al. preserve your website Just-In-Time Preservation o Wait for it to disappear first, then a “good enough” version Shared Infrastructure Preservation o Push your content to sites that might preserve it Web Server Enhanced Preservation o Use Apache modules to create archival-ready resources image from: http://www.proex.ufes.br/arsm/knots_interlaced.htm

21 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Web Site Preservation: 2 Problems The counting problem How many pages are on that site? To save it you have to find it The representation problem What’s that page all about? Future use requires understanding Guess the bean count, win the jar

22 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson OAI-PMH Data Model resource item Dublin Core metadata MARCXML metadata MPEG-21 DIDL records OAI-PMH identifier = entry point to all records pertaining to the resource METS metadata pertaining to the resource modeled representation of the resource simple model more expressive model complex model complex model

23 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Integrate OAI-PMH functionality into the web server itself… 1. Use mod_oai - an Apache 2.0 module - automatically answers OAI-PMH requests for an http server - written in C - respects values in.htaccess, httpd.conf 2. Install mod_oai on http://www.foo.edu/http://www.foo.edu/ 3. Define baseURL: http://www.foo.edu/modoai Result: web harvesting with OAI-PMH semantics (e.g., from, until, sets) mod_oai implementation Using OAI-PMH http://www.foo.edu/modoai?verb=ListRecords&metdataPrefix=oai_didl&from=2004-09-15&set=mime:video:mpeg Give me all resources And their preservation metadata From site foo, dating from 9/15/2004 through today that are MIME type video-MPEG

24 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Addressing the Counting Problem: ListIdentifiers CRAWLER: issues a ListIdentifiers, finds URLs of updated resources does HTTP GET updates only can get URLs of resources with specified MIME types

25 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Addressing the Representation Problem: ListRecords in DIDL Format CRAWLER: Makes a ListRecords query, Gets updates as MPEG-21 DIDL records (HTTP headers, resource By Value or By Reference) can get resources with specified MIME types

26 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson CRATE: Preservation Metadata at Dissemination Time Harnesses web server to support preservation Moves preservation metadata from “strict validation at ingest” to “best-effort description at dissemination” Plug-in Name Executable path

27 The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson Validation is Subjective images from: http://facweb.cs.depaul.edu/sgrais/collage.htm Preservation metadata is like a David Hockney photo collage: each image is both true and incomplete, and while the result is not faithful, it does capture the “essence”


Download ppt "The Open Archives Initiative DRIADE Workshop, Durham NC, May 16-17, 2007 Michael L. Nelson The Open Archives Initiative Michael L. Nelson Computer Science,"

Similar presentations


Ads by Google