RDF, XML and interoperability Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported by: URL
Managing networks: understanding new technologies, Birmingham, 13 Sep RDF, XML & interoperability Metadata : a reprise Communities, communication & XML An introduction to RDF RDF, XML and interoperability
Managing networks: understanding new technologies, Birmingham, 13 Sep What is metadata? “Data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person.” –Dempsey and Heery, 1998 “Machine understandable information about web resources or other things.” –Berners-Lee, 1997 Structured data about resources that can be used to help support a wide range of operations
Managing networks: understanding new technologies, Birmingham, 13 Sep What resources, objects, things? HTML documents digital images databases books museum objects archival records metadata records collections services physical places people abstract “works” concepts events
Managing networks: understanding new technologies, Birmingham, 13 Sep What operations? User wants to –find, identify, select, obtain / use Owner / manager / provider wants to –describe –enable and control access/use –administer Different “flavours” of metadata serve different purposes –Simple, generic vs. rich, specific
Managing networks: understanding new technologies, Birmingham, 13 Sep Communities & communication Effective transmission of information requires agreement on –semantics –what terms mean –e.g. “cat”, “to sit”, “mat” –structure –significance of arrangement of terms –e.g. sentence: subject -> verb -> object (in English….) –syntax –rules of expression –“The cat sat on the mat.” A resource description community is defined by consensus on conventions
Managing networks: understanding new technologies, Birmingham, 13 Sep Communication using XML (1) An example –I prepare a music catalogue using the (imaginary!) AlbumCat XML schema –I publish my XML document on the Web –someone else prepares a catalogue using the same XML schema and publishes their XML document I can read their XML document and locate tracks created by Don Van Vliet in their catalogue But more importantly…..
Managing networks: understanding new technologies, Birmingham, 13 Sep
9
10 Communication using XML (2) User request: Find identifiers of all tracks with creator “Don Van Vliet” Program action: Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet” … my software can search their document because I have programmed it to map:
Managing networks: understanding new technologies, Birmingham, 13 Sep Communication using XML (3) Program action: Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet” The Spotlight Kid Van Vliet, Don Grow fins Van Vliet, Don Program action: Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet” The Spotlight Kid Van Vliet, Don Grow fins Van Vliet, Don Program action: Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet” The Spotlight Kid Van Vliet, Don Grow fins Van Vliet, Don Program action: Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet” The Spotlight Kid Van Vliet, Don Grow fins Van Vliet, Don
Managing networks: understanding new technologies, Birmingham, 13 Sep Metadata use Resource users wish to –search across the boundaries of communities –combine resources from different communities Resource providers wish to –exchange descriptions with members of other communities Third parties wish to –describe resources owned/described by others Metadata is –used beyond its creator community –combined with metadata from other communities
Managing networks: understanding new technologies, Birmingham, 13 Sep Communication using XML (4) Continuing the example –a museum describes their holdings using the (imaginary...) ArtCat XML schema and publishes their XML document I can read their XML document and locate pictures created by Don Van Vliet listed in their catalogue –requires my guesswork and/or reference to semantics of ArtCat schema But….
Managing networks: understanding new technologies, Birmingham, 13 Sep
Managing networks: understanding new technologies, Birmingham, 13 Sep Communication using XML (5) User request: Find identifiers of all “works” with creator “Don Van Vliet” Program action (AlbumCat): Find values of dc:identifier attributes of track elements which have a dc:creator child element with content “Don Van Vliet” … to search across both catalogues, my software now has to be programmed with two mappings: Program action (ArtCat): Find content of dc:identifier elements which have a picture parent element with a details child element which has a dc:creator child element with content “Don Van Vliet”
Managing networks: understanding new technologies, Birmingham, 13 Sep The problem Statement –this resource (track, picture... etc!) has dc:creator “Don Van Vliet” Multiple expressions in XML –different XML schemas make different choices –all “good” (and valid) –human reader of document can interpret (maybe) –program needs prior “knowledge” of structural conventions in each XML schema Not scalable in an “open” environment –how to manage ever increasing set of conventions –always encountering unknown schemas
Managing networks: understanding new technologies, Birmingham, 13 Sep The problem (2) “XML allows users to add arbitrary structure to their documents but says nothing about what the structures mean.” –Berners-Lee, 2001 Consensus on syntax –use of XML Consensus on semantics of terms –meaning of (uniquely named through XML namespace) elements/attributes No consensus on meaning of structure –e.g. parent-child element relations
Managing networks: understanding new technologies, Birmingham, 13 Sep Introducing RDF Resource Description Framework Model & Syntax Recommendation of W3C, 1999 Generic “architecture” for metadata –set of conventions for applications exchanging metadata –allow semantics to be defined by different resource description communities –accommodate mixing of metadata from diverse sources
Managing networks: understanding new technologies, Birmingham, 13 Sep Introducing RDF (2) Defines –model for making statements about resources –conventions for encoding statements using XML syntax Object types –Resource : any object identified by URI –not necessarily accessible via Web –Property : “attribute” to describe resource –properties also uniquely identified by URI –Statement : “triple” of specific resource, named property, and value
Managing networks: understanding new technologies, Birmingham, 13 Sep The RDF model author Pete A resource has some property whose value is either (i) a simple string value (literal)… –The resource identified by the URI has a property “author” whose value is “Pete” –Or, “Pete” is the “author” of the resource identified by
Managing networks: understanding new technologies, Birmingham, 13 Sep The RDF model (2) … or (ii) another resource... author name –The value of property “author” is another resource which has a property “name” with value “Pete” and a property “ ” with value
Managing networks: understanding new technologies, Birmingham, 13 Sep The RDF model (3) … which may itself have a URI author Pete name
Managing networks: understanding new technologies, Birmingham, 13 Sep The power of RDF Extensible model –supports any vocabularies Supports arbitrary complexity of description URIs as unique fixed points to identify –resources –properties Descriptions created independently can be “merged” using URIs as “anchors”
Managing networks: understanding new technologies, Birmingham, 13 Sep First source author Pete name
Managing networks: understanding new technologies, Birmingham, 13 Sep Second source subject XML
Managing networks: understanding new technologies, Birmingham, 13 Sep Third source organisation UKOLN
Managing networks: understanding new technologies, Birmingham, 13 Sep Three descriptions merged author Pete name subject XML organisation UKOLN
Managing networks: understanding new technologies, Birmingham, 13 Sep The RDF XML syntax XML representation of model –to store/exchange descriptions Property names made unique through use of XML namespaces Variant XML syntaxes for RDF <rdf:Description about=” Pete
Managing networks: understanding new technologies, Birmingham, 13 Sep The RDF XML syntax (2) Using RDF/XML syntax means accepting conventions for the meaning of structures in XML document So, an RDF/XML processor can “know in advance” the meaning of structures –even if the description uses unanticipated vocabularies –“partial understanding” Can read multiple descriptions into store and “merge” on URIs Will be generated/consumed by software!
Managing networks: understanding new technologies, Birmingham, 13 Sep First source author Pete name Pete </rdf:Description
Managing networks: understanding new technologies, Birmingham, 13 Sep Second source subject XML XML
Managing networks: understanding new technologies, Birmingham, 13 Sep Third source organisation UKOLN UKOLN
Managing networks: understanding new technologies, Birmingham, 13 Sep Three descriptions merged <rdf:Description about=“ <rdf:Description about=“ Pete UKOLN </rdf:Description XML
Managing networks: understanding new technologies, Birmingham, 13 Sep A Dublin Core description <rdf:RDF xmlns:rdf=" xmlns:dc=" UKOLN home page Web-support Team, UKOLN digital information management; metadata The home page of the UKOLN web site. UKOLN is a national focus of expertise in digital information management. It provides policy, research and awareness services to the UK library, information and cultural heritage communities. UKOLN is based at the University of Bath. UKOLN Text text/html bytes
Managing networks: understanding new technologies, Birmingham, 13 Sep RDF, XML & interoperability Why isn’t XML enough? –simple statement could be expressed in XML in many different ways –human reader makes interpretation/guess –application program requires prior knowledge of schema/DTD design RDF/XML –imposes extra syntactic constraints on how statement expressed –both human and program can interpret description consistently Less flexibility, greater interoperability
Managing networks: understanding new technologies, Birmingham, 13 Sep RDF, XML & interoperability Tentatively…. Use XML for exchange when –partners (humans, applications) both “know” semantics conveyed by structure of (meta)data Use RDF/XML for exchange when –(meta)data potentially used by applications without prior “knowledge” of specific schema –(meta)data incorporates overlapping structures from different domains N.B. raises issues of trust –who made statements?
Managing networks: understanding new technologies, Birmingham, 13 Sep A note of caution RDF not (yet?) a widely adopted technology Addresses cross- organisation/domain problems Some scepticism? –perceived as theoretical, “academic”? –also considerable enthusiasm! Some revisions to Model & Syntax in progress at W3C –XML 1.0 is stable –RDF less so Limited tools available (at present!) But also growing number of applications
Managing networks: understanding new technologies, Birmingham, 13 Sep Exercise (optional) DC-dot – –Web-based tool –generates DC metadata for Web pages, based on existing tags, heading content etc Experiment with DC-dot to generate DC metadata for pages of your choice View the RDF/XML representations
Managing networks: understanding new technologies, Birmingham, 13 Sep Acknowledgements UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.