Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial 1 -- OAI-PMH repositories: quality issues regarding metadata and protocol compliance Part II – Shareable Metadata Timothy W. Cole

Similar presentations


Presentation on theme: "Tutorial 1 -- OAI-PMH repositories: quality issues regarding metadata and protocol compliance Part II – Shareable Metadata Timothy W. Cole"— Presentation transcript:

1 Tutorial 1 -- OAI-PMH repositories: quality issues regarding metadata and protocol compliance Part II – Shareable Metadata Timothy W. Cole (t-cole3@uiuc.edu) Mathematics Librarian & Professor of Library Administration University of Illinois at Urbana-Champaign, USA Thursday 20 October 2005 OAI4: CERN Workshop on Innovations in Scholarly Communication http://imlsdcc.grainger.uiuc.edu/OAI4_Tutorial_1_URLs.htm Copyright © 2005 Timothy W. Cole & University of Illinois Board of Trustees

2 2 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Metadata is... Structured data about digital (and non-digital) resources that can be used to help support a wide range of operations -- UKOLN Data associated with an information object for purposes of description, administration, legal requirements, technical functionality, use and usage, and preservation. -- Getty Research Institute

3 3 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Descriptive metadata is... Information that serves the purposes of discovery, identification, and selection of a digital or non-digital resource. (Priscilla Caplan in Metadata fundamentals for all librarians) Similar to purposes of traditional library cataloging: To find an entity To identify an entity To select an entity To obtain access to the entity

4 4 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Considerations when creating metadata Audience/Purpose: Who are you describing it for? Why? Standards: Are there standards you can use? What: Are you describing the digital manifestation of a work? A physical object that has been digitized? Both? Granularity: What level of description? Item level? Archival unit? Collection level? (FRBR issues here as well) Context: What relations and contextual information should you include?

5 5 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Quality metadata Characteristics of quality metadata: Completeness Accuracy Provenance Conformance to expectations Logical consistency and coherence Timeliness Accessibility From Hillmann & Bruce, “The Continuum of Metadata Quality.” In Metadata in Practice : ALA, 2004.

6 6 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Benefits of sharing metadata Enhance visibility of your resources Enables reuse of digital content in new contexts Support interoperability across collections Enables repurposing digital content to support new services Facilitate discovery of new linkages and relationships between and among content objects, metadata, and services Ultimately supports a “recombinant” vision of digital libraries First step toward enhanced models of digital libraries, e.g., Lorcan Dempsey, et al. “Metadata Switch: thinking about some metadata management and knowledge organization issues....” http://www.oclc.org/research/publications/archive/2004/dempsey-mslitaguide.pdf

7 7 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC DLF-NSDL OAI Best Practices http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?OAI_Best_Practices Collaborative effort of the Digital Library Federation and the National Science Digital Library (U.S.) – data providers & service providers Scope: Develop technical implementation & shareable metadata best practices for OAI data providers Targeted for broad audience, but group drawn mostly from U.S. library & digital library communities July 2004 -- First meeting of Working Group August 2005 – Limited distribution of review draft of best practices for shareable metadata; technical best practices still to come Comments welcome – intended to complement existing resources: OAI listservs, OAForum, community specific guidelines & best practicesOAForum

8 8 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Shareable metadata Shareable metadata has all the positive characteristics of quality metadata, plus: Includes proper context Has content coherence Uses standard vocabularies Has consistency Exhibits technical conformance These features are necessary to support useful and effective aggregation of heterogeneous metadata harvested from widely dispersed sources

9 9 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC OAI depends on shared work model OAI / metadata sharing depends on a collaboration between data providers and service providers Data providers: Conform to OAI-PMH (incl. Implementation Guidelines)Implementation Guidelines Provide shareable metadata of appropriate richness Provide access to resources described by metadata Service providers: Conform to OAI-PMH (incl. Implementation Guidelines)Implementation Guidelines Filter, normalize, & enrich metadata for purpose Present aggregated metadata without bias Present Do no harm

10 10 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Data provider should document practices Provide documentation on choices made when providing metadata for exposure via OAI – remember, metadata itself has provenance Provide human-readable documentation on your Website Utilize OAI optional containers to help service providers Document especially: Use of terminologies / controlled vocabularies Source of metadata (was it transformed from different format?) Value encoding practices for: names, dates, identifiers,... Local practices for quality control, updating frequencies,... Most service providers have some capacity to normalize harvested metadata on a provider-by-provider basis

11 11 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Appropriate representation of resources OAI provides only one view of a resource Records shared via OAI should be appropriate for purpose Consider: Contents: what to include in OAI metadata record Context: make explicit that which might be implicit in your local system; OAI metadata records must stand alone Metadata format: provide multiple if possible select metadata schema(s) appropriate to content retain richness of native scheme as much as possible

12 Illustration of metadata for purpose http://images.library.uiuc. edu:8081/u?/tdc,107

13 http://images.library.uiuc.edu:8081/cgi- bin/oai.exe?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:images.library.uiuc.edu:tdc/107

14 http://snuffy.lib.umn.edu:8080/image/oai/HandleRequest.do?verb=GetRecord&metadataPrefix=oai_dc &identifier=oai:digital.lib.umn.edu:mpw00250

15 15 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Describing Versions and Reproductions Dublin Core one-to-one rule infrequently followed Can imply significant cost Not always best way to present resources to end-users Not appropriate for richer metadata formats Other approaches (has implications for use of identifier element): Entry page approach (1 record, entry page lists versions) Clustering approach (cluster like formats together) Vocabulary approach (for linking related records) Relation element linking approach Intellectual object approach (1 record, multiple relation fields)

16

17

18 18 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Identifiers, URLs, & linking Use persistent URIs & recognized standard identifiers (e.g., ISBNs, ISSNs, DOIs, PURLs, EPICUR, OpenURLs,...) Provide one unambiguous URI for primary user access to resource If schema allows, encode the nature of each identifier provided For analog resources, still provide URL for how to access item Other links: rights statement; access restrictions; collection or institution home page; curator contacts; alternate versions;... Don’t confuse resource identifiers & OAI identifiers (Guidelines)Guidelines Express multiple identifiers in repeated fields; but only include multiple where meaning/function of each identifier is clear – consensus regarding “actionable” resource URLs still evolving

19 19 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Granularity, differentiation, & context Appropriate granularity defined by anticipated use Individual images on a page, full pages, or whole book? Individual letters or complete archive? General advice: smallest granularity appropriate for resource – consider user & use common sense OAI-PMH tends to encourage de-contextualization Records shared via OAI often lose implicit linkages and context of local implementation Use OAI Set Descriptions & item-level relation fields to preserve as much useful context as possible, see also: http://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?SetPractices

20 http://www.hti.umich.edu/cgi/b/broker20/broker20? verb=ListRecords&metadataPrefix=oai_dc&set=oaiall:fish2icbib

21 http://www.hti.umich.edu/cgi/b/broker20/broker20?verb=ListSets

22 22 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Metadata formats MYTH - OAI only allows exposure of simple Dublin Core (DC) records MYTH - OAI allows exposure of items in only a single metadata format There is a distinction in OAI between items & records Item is all available metadata for a resource Record is dissemination of an item in specific metadata format Expose richest metadata you can; crosswalk to less rich formats as needed (oai_dc) and as makes most sense for purpose OAI requires XML Schema (.xsd) for all formats exposed Must list available formats in ListMetadataFormats response; Should also list formats for given set in Consider QDC, DARE, MODS, MARCXML, MABXML, IMS, METS, MPEG,... See also distinct schemas in Univ. of Illinois OAI Data Provider Registry: http://gita.grainger.uiuc.edu/registry/ListSchemas.asp http://gita.grainger.uiuc.edu/registry/ListSchemas.asp

23 23 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC XML namespaces & schemas OAI mandates use of W3C XML Schema Language Allows validation of OAI records using off-the-shelf XML parsers XML schemas potentially recombinant – mix and match XML schemas exploit XML namespaces Prefer XML schemas endorsed by relevant communities XML namespaces useful to disambiguate metadata semantics XML namespace values are URIs Not always well exploited by service providers; service providers sometimes rely on XSD URI instead of namespace URI to recognize semantics Neither express meaning (need RDF, OWL,... for that)

24 24 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC A schema for Qualified DC plus Thumbnails <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:dcmitype="http://purl.org/dc/dcmitype/" xmlns="http://cicharvest.grainger.uiuc.edu/schemas/QDC/" targetNamespace="http://cicharvest.grainger.uiuc.edu/schemas/QDC/" elementFormDefault="qualified" attributeFormDefault="unqualified"> http://cicharvest.grainger.uiuc.edu/schemas/QDC/2004/07/14/ ThumbnailElements.xsd http://imlsdcc.grainger.uiuc.edu/registry/qualifieddc.xsd

25 25 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Crosswalking Logic Not Bi-Directional; map metadata from more robust formats to simpler ones (some lossiness inevitable) Crosswalks both map elements from one schema to another AND transform values to meet requirements of target schema Don’t conflate values; repeat elements when target schema allows Include appropriate context in when crosswalking Try to include a title element when crosswalking to DC Exclude indications of unknown or inapplicable data Exclude artifacts of descriptive practices not useful in target schema Consider multi-step crosswalks LC MARC to DC/QDCLC MARC to DC/QDC; OCLC Crosswalk Repository [D-Lib Article]OCLC Crosswalk RepositoryD-Lib Article

26 Honoré Daumier Lithograph (Brandeis University) MARCXML oai_dc MODS Qualified DC Same object, same information, different metadata formats

27 27 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Elements – Title Service providers rely on element values Try to provide a title in every oai_dc record; if necessary, supply one according to established standards DCMI: Title = “A name given to the resource” Express added forms of title in repeated fields. Some service providers rely on order to pick display title Distinguish title from sub-title through metadata format if allowed, or else using standard punctuation Partially fixed dunes north of the lighthouse, Ludington, Michigan

28 28 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Elements – Name Include all known names associated with resource Reflect best practice for your community Format names consistently for all items within an OAI set or repository Use community-accepted authorities when available But prefer complete form of name when allowable Provide as granular an encoding of a name as possible given native data and metadata formats used Typically do not include affiliation as part of name value Fuller, George D. (George Damon), 1869-1961

29 29 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Elements – Date For OAI, dates should contain values important for discovery of the resource by end-users Only one date value per metadata element But only provide multiple date elements when metadata format allows way to clearly differentiate meaning of each date value in life cycle of resource – use of labels is deprecated Present dates in a consistent format, according to established machine-readable standards Format for imprecise dates and period or eras still problematic 1916-06-16

30 30 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Elements – Subject Use relevant controlled vocabularies Be consistent, document practice Reflect community consensus when available Exploit features of metadata format To maximize precision, put values in best field available Where allowed, explicitly label values with term source Express multiple subjects in repeated fields (not concatenated together) Lighthouses -- Michigan

31 31 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Elements – Language Use when relevant to the resource (i.e., for text resources) Format value according to the metadata format being used Be consistent; prefer machine-readable values Language element within a metadata record typically used to give language of the resource: DC: German MODS: eng Attributes commonly used to give language for value of a metadata record element: MODS: Broadside advertising a funeral ceremony...

32 32 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Elements – Place name Choose geographic place name values from relevant terminologies / controlled vocabularies Be consistent Document terminology controlled vocabulary used MODS: North and Central America Canada Quebec Matapedia, Lac QDC: Calgary (Alta.) DC: Santa Cruz County (Calif.) - in simple DC case, document use of LCSH in SetDescription or on Website

33 33 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Character encoding OAI-PMH mandates UTF-8 encoding Files containing only 7-bit ASCII characters (no byte values > 127) are simultaneously UTF-8 But many control characters (values < 32) not allowed in XML Must convert all extended ASCII (byte values > 127) to appropriate UTF-8 (preferred) or to character references In UTF-8 characters above 127 become multi-byte sequences: 2 to 3 bytes for Unicode Plane 0 (Basic Multilingual Plane) Transform Unicode files not in UTF-8 to UTF-8

34 34 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC Character references (not named entities) As alternative to UTF-8, OAI-PMH allows character references, for example: © or © for © -- do not use © ä or ä for ä -- do not use ä etc. < | > | & | " | &apos; are allowed Avoid embedded HTML in your XML, thus prefer: UTF-8 or &#x00B2 for 2 UTF-8 best at facilitating string searching

35 35 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC 11:15 recap (1 of 2) Technical compliance (most popular errors listed first): Use standard & well tested software libraries or packages if you can. Follow protocol carefully and check your site with the validator at http://www.openarchives.org/Register/ValidateSite This performs a number of tests for errors that would not show up with RE. http://www.openarchives.org/Register/ValidateSite Be careful with character encoding, schema validation, use of datestamps, use of resumptionTokens, and responses to invalid OAI Requests.

36 36 Shareable Metadata OAI4 @ CERN, 20 October 2005 t-cole3@uiuc.edu University of Illinois at UC 11:15 recap (2 of 2) Shareable metadata best practices: Metadata is created for purpose; shareable metadata is more than just high-quality metadata created for a local application. Consider needs of service provider in constructing OAI metadata. In OAI context you must consider granularity of records disseminated and include or provide pointers to context. Document metadata rules & practices used. Use multiple metadata formats.


Download ppt "Tutorial 1 -- OAI-PMH repositories: quality issues regarding metadata and protocol compliance Part II – Shareable Metadata Timothy W. Cole"

Similar presentations


Ads by Google