Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL
2 What is OAI? Harvesting standard, documented at Seven service verbs – Identify – ListMetadataFormats – GetRecord – ListRecords – ListIdentifiers – ListSets Allows multiple metadata formats – DC (Dublin core) format mandatory
3 How OAI works OAI “VERBS” – Identify – ListMetadataFormats – GetRecord – ListIdentifiers – ListRecords – ListSets HARVESTERHARVESTER REPOSITORYREPOSITORY OAI Service Provider Metadata Provider HTTP Request HTTP Response (OAI Verb) (Valid XML)
4 Try it Install Apache-Tomcat or any other Java servlet container Download WAR file from Deploy WAR Demo html Or type a service verb, e.g.
5 The raw XML By default, the resulting XML has stylesheet attached for pretty rendering To remove the stylesheet comment the line OAIHandler.styleSheet=testoai/oaicat.xsl in file oaicat.properties (in WAR file or the web-app dir)
6 OAI XML example T06:48:58Z oai:oai.xyz-repository.com:exercises/ T22:38:28Z exercises <resumptionToken expirationDate=" T07:48:58Z" completeListSize="42" cursor="10">
7 OAICat - a Java implementation OAICat home at Takes care of – web service details – OAI XML specification The implementer has to provide three classes – RepositoryOAICatalog – RepositoryRecordFactory – Repository2oai_dc (lom,...) - usually more than one
8 A sample implementation (Source code and libs in Create a new web module Add servlet oaiHandler to web.xml LreOAIHandler ORG.oclc.oai.server.OAIHandler 5 LreOAIHandler /oaiHandler
9 (cont) Define properties file location properties oaicat.properties Welcome file for testing testoai/index.html
10 Sample record A record with basic fields id, url, title, descr and date SampleOAICatalog contains an array with 3 sample records
11 SampleOAICatalog.listIdentifiers Parameters – from – date to harvest from (String in iso8601 format) date or datetime - depends on granularity – to – date to harvest to – set – a set name, list only records from this set (if null, list all records) set names classify objects in natural groups every record may belong to multiple sets (or none) – metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom,...)
12 SampleOAICatalog.listIdentifiers Must return a map with to fields – headers – a String iterator of OAI headers – identifiers – a String iterator of OAI identifiers Both created by the call (rec is a SampleRecord) String[] header = getRecordFactory().createHeader(rec); headers.add(header[0]); identifiers.add(header[1]); Create result Map listIdMap = new HashMap (); listIdMap.put("headers", headers.iterator()); listIdMap.put("identifiers", identifiers.iterator()); return listIdMap ;
13 getRecordFactory().createHeader(rec) Creates header by calling the methods in SampleRecordFactory String getOAIIdentifier(Object rec) – return full oai identifier “oai:oay.rep.com:id001” String getDatestamp(Object rec) – returns date in iso8601 format Iterator getSetSpecs (Object rec) ArrayList list = new ArrayList (); list.add(...); return list.iterator(); Iterator getAbouts (Object rec) String fromOAIIdentifier(String id) – helper method – convert id to a local id
14 SampleOAICatalog.listSets takes no parameters, returns the list of all sets in this repository – each ListIdentifiers or ListRecords query may contain a set name, limiting the results to just one set
15 SampleOAICatalog.getSchemaLocations like GetRecord, but returns the Vector of all metadata schema locations the record supports – to obtain them, just call getRecordFactory().getSchemaLocations(rec);
16 SampleOAICatalog.getRecord String getRecord(String id, String metadataPrefix) – find record and convert it to xml string ( element) – id is in global format – to get local value call getRecordFactory().fromOAIIdentifier(id) – throw IdDoesNotExistException if record not found – to generate XML use constructRecord constructRecord(rec, metadataPrefix)
17 SampleOAICatalog.listRecords just like ListIdentifiers, only generates a list of XML elements return a map with one element Map listRecMap = new HashMap (); listRecMap.put(“records", records.iterator()); return listRecMap;
18 Crosswalks Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc Only two methods per implementation – boolean isAvailableFor(Object rec) – String createMetadata(Object rec) SampleRecord record = (SampleRecord) rec; return LOMFormat.writeStringWithSchema(record.toLOM()); throw CannotDisseminateFormatException if the metadata not available in this format
19 SampleRecord.toLOM uses LOM-j lib to quickly hack together LOM – automatic serialization/deserialization of LOM and DC XML formats Example lom.newGeneral().newIdentifier(0).newCatalog().setString("lre"); lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id); lom.newTechnical().newLocation(-1).setString(url); lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en"); lom.newGeneral().newTitle().newString(0).setString(title);
20 Resumption A repository usually has fixed limit on the numer of records to return in one call – if there are more available, it returns a resumption token, allowing to receive next packet – Implemented by functions listIdentifiers(String resumptionToken), listRecords(String resumptionToken) – see XYZOAICatalog for details
21 References SIO/Trubar OAI url