1 Some standards, some examples, and a UK perspective Paul Miller Interoperability Focus UK Office for Library & Information Networking (UKOLN) UKOLN is funded by the Library and Information Commission, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.
2 Outline Scoping the problem Cultural Heritage information Other ‘memory organizations’ The Internet Standard solutions Catalogues, Metadata, Terminology control… Localised developments Dublin Core XML/RDF Z39.50 Examples AHDS and ADS CIMI The Distributed National Electronic Resource.
3 Cultural Heritage Information traditionally analogue Tradition of preservation Complex in nature Often transcends (current) national boundaries, raising political issues Currently ‘cool’ potentially profitable (Micro$oft) Important for Lifelong Learning and other hot phrases (NGfL).
4 Other memory organizations Similar analogue tradition Similar focus on preservation Greater tendency to single format Books Musical scores Archival manuscripts Traditionally complex cataloguing paradigm.
5 The Internet Traditionally digital Now adding access to analogue Two million web sites Half a billion addressable pages Everyone’s an author Everyone’s a publisher Everyone can assert authority Sustainable models prove elusive Still waiting for the ‘killer app’ ?.
6 Standard solutions The nice thing about standards… …is that there are so many to choose from!
7 Standard solutions
8 Different solutions to information management have evolved Detailed catalogues –Curatorial tradition –Principally for internal management? –MARC/AACR, SPECTRUM… ‘Metadata’ catalogues –Access tradition –Principally for external use –Dublin Core… etc Control of semantics and syntax useful.
9 Semantics, Structure, Syntax Semantic Interoperability Structural Interoperability Syntactic Interoperability “Let’s talk English” Standardisation of content Standardisation of form “Here’s how to make a sentence” Standardisation of expression “These are the rules of grammar” “cat milk sat drank mat ” “Cat sat on mat. Drank milk.” “The cat sat on the mat. It drank some milk.”
10 What is a catalogue? A database of holdings/resources within defined collection policy, with stated cataloguing procedures, and with some intention towards comprehensiveness? –A library catalogue –A museum collection management system –A national register of monuments Can be a single resource –OPAC97, the British Library catalogue A union of other catalogues –COPAC –vCUC (virtual) –ADS (virtual) –AHDS (virtual).
11 What is ‘Metadata’? –meaningless jargon –or a fashionable term for what we’ve always done –or “a means of turning data into information” –and “data about data” –and the name of a film director (‘Luc Besson’) –and the title of a book (‘The Lord of the Flies’).
12 What is ‘Metadata’? Metadata exists for almost anything; People Places Objects Concepts Web pages Databases.
13 What is ‘Metadata’? Metadata fulfils three main functions; Description of resource content –“What is it?” Description of resource form –“How is it constructed?” Description of resource use –“Can I afford it?”.
14 Introducing the Dublin Core An attempt to improve resource discovery on the Web –now adopted more broadly Building an interdisciplinary consensus about a core element set for resource discovery –simple and intuitive –cross–disciplinary — not just libraries!! –international –flexible. See
15 15 elements of descriptive metadata All elements optional All elements repeatable The whole is extensible –offers a starting point for semantically richer descriptions Interdisciplinary –libraries, government, museums, archives… International –available in 20 languages, with more on the way... Introducing the Dublin Core
16 Title Creator Subject Description Publisher Contributor Date Type Format Identifier Source Language Relation Coverage Rights Introducing the Dublin Core
17 Extending DC (semantic) Improve descriptive precision by adding sub–structure (subelements and schemes) –Greater precision = lesser interoperability Should ‘dumb down’ gracefully Creator First Name Surname Contact Info Affiliation Based on a slide by Stu Weibel Element qualifierValue qualifier
18 Extending DC (modularity) Modular extensibility… Additional elements to support local needs Complementary packages of metadata …but only if we get the building blocks right! DescriptionSpatial character Terms & Conditions Based on a slide by Stu Weibel
19 Extending DC? DC offers a semantic framework Through use of further substructure, meaning can often be clarified… “John” John Inc. ? John xyz ? xyz John ? “John” John Inc. John xyz xyz John.
20 Extending DC? DC offers a semantic framework Use of domain–specific schemes greatly increases precision “Washington” Washington State ? Washington DC ? Washington monument ? “Washington” Washington State Washington DC Washington monument “North and Central America, United States, Washington”
21 Introducing XML eXtensible Markup Language World Wide Web Consortium recommendation Simplified subset of SGML for use on the Web Addresses HTML’s lack of evolvability Easily extended Supported by major vendors Increasingly used as a transfer syntax, but capable of far more…. See
22 Introducing RDF Resource Description Framework World Wide Web Consortium recommendation Fully compliant application of XML Improves upon XML, HTML, PICS… Machine understandable metadata! Supports structure Encourages authenticity assertions. See
23 Data Integration “The author of this document is Paul” “Paul is the author of this document” “This document is authored by Paul” “The author of this document is Paul” 3 Representation(s) in XML: Paul Paul <document href = “ author = “Paul” />
24 Data Integration Querying XML documents is hard N ways of mapping XML to logical structure Requires the normalization of all possible representations for effective query Mean the same thing to a person Mean very different things to a machine RDF much less flexible less flexible = more interoperable! consistent way of representing statements
25 RDF Data Model Imposes structural constraints on the expression of application models for consistent encoding, exchange and processing of metadata Enables resource description communities to define their own semantics Provides for structural interoperability
26 RDF Data Model basics Resource Property Value Resource Statement
27 A simple example Resource Author “Paul”
28 RDF Model Example URI:R “Maastricht Presentation” title creator dc: “Paul Miller”
29 RDF Syntax Example URI:R “Maastricht Presentation” title creator dc: “Paul Miller” <RDF xmlns = “ xmlns:dc = “ Maastricht Presentation Paul Miller
30 Where do you stop…? Model provides enabling technology for almost infinite cross–linking How far any one community goes should be governed by Domain needs, best practice and experience Organizational/ institutional policy Economics
31 RDF Schemas Declaration of vocabularies properties defined by a particular community characteristics of properties and/or constraints on corresponding values Schema Type System - Basic Types Property, Class, SubClassOf, Domain, Range Minimal (but extensible) at this time minimize significant clashes with typing system designed for XML NG DTDs (1999?) Expressible in the RDF model and syntax Interest in trying this with some of the Getty thesauri…
32 Schema Vocabularies Enables communities to share machine readable tokens and locally define human readable labels. dc:Creator “Nom” rdfs:label “Author” rdfs:label “$100 $a” rdfs:label
33 Relationships between elements URI:R “John Smith” ms:Kgrip dc:Creatorms:Kgrip rdfs: subPropertyOf rdfs:label “Key Grip” dc:Creator
34 Some reading eXtensible Markup Language Resource Description Framework Dublin Core Expressing Dublin Core in RDF resources/dc/datamodel/WD-dc-rdf/
35 Introducing Z39.50 International Standard (ISO 23950) Originally library–centric Permits remote searching of databases Access via Z client or over web Relies upon ‘Profiles’ CIMI profile for cultural heritage See
36 Z39.50 Challenges Profiles for each discipline Defeats interoperability? Bib–1 bloat Largely invisible Seen as complicated Seen as expensive Seen as old–fashioned Surely no match for XML/RDF/whatever.
37 Z39.50 Futures International Interoperability Profile Cross–Domain Attribute Set Attribute Architecture Bib–2 XER DNER/ RDN/ NGDF/ New Library?.
38 Examples: AHDS Arts & Humanities Data Service Funded by JISC to preserve and provide access to digital arts and humanities resources Five ‘service providers’ for archaeology, history, text, visual and performing arts –Each Service provider offers its own access to holdings –AHDS–wide access also provided through Z39.50/DC gateway. See
39 Examples: ADS Archaeology Data Service Service provider of the AHDS Specialising in archaeological data, as well as advising on geospatial data issues to the other four services ArcHSearch catalogue system, making data available for local government agencies, national agencies, universities, contractual organizations… –Dublin Core used to extract ‘essence’ of legacy rich data –Perceived as neutral ‘honest broker’ Holdings also visible through AHDS Gateway Working with SCRAN and others to link cultural heritage data of relevance to the UK through Z See
40 Examples: CIMI Consortium for the Computer Interchange of Museum Information Membership organization, comprising museums, cultural heritage agencies and system vendors Work through series of test beds –Z39.50 –Institutional Information Management –Dublin Core Metadata –Uses XML transfer syntax –Investigating RDF See
41 Examples: DNER Distributed National Electronic Resource Vision of the Joint Information Systems Committee (JISC), funders of most home–grown network content in the UK HE sector Raise awareness of available resources (SBIGs, Data Centres…) Offer distributed cross–searching of diverse resources Interfaces –Resource specific (as now) –DNER –Institutional –Personal. DC and Z39.50 likely as key enabling technologies –International Interoperability Profile. See