Metadata Modularization Concepts and Tools Carl Lagoze CS
Metadata Structured data about data….
Why is Metadata important? Key to organizing, managing, preserving, and locating content and services in digital libraries
Why is Metadata difficult? Cost Interoperability –Syntax –Semantics Customizability Extensibility Distribution Integrity, Authenticity, Quality Human and Machine Factors Naming
Metadata Thoughts Metadata takes a variety of forms –descriptive cataloging –specialized terms and conditions administrative content ratings provenance linkage
More Metadata Thoughts New metadata sets will continually evolve Many metadata sets are “community- specific” –administration –use Human and machine use
Dublin Core Metadata Set for Simple Resource Discovery 15 elements allowing simple descriptive sentences about document like objects: –“Document has title Hamlet” –“Document has creator William Shakespeare” –“Document has subject love and anguish”
The Dublin Core 15 Title Creator Subject /Keywords Description Publisher Other Contributor Date Resource Type Format Resource Identifier Source Language Relation Coverage Rights Management
A Scope for the Dublin Core Increase or decrease number of elements? Structured or Unstructured value syntax? Accommodate community extensions?
Warwick Framework Provide context for Dublin Core effort Integrate multiple sets of metadata addressing issues of: –individual integrity –distinct audiences –separate realms of responsibility and management
Warwick Framework Design Containers for aggregating … Packages of typed metadata sets General principles - information hiding: –only operation defined at container level returns sequence of contained packages –packages are opaque at the container level –access to package contents subject to terms and conditions
Package Types Simple metadata set –segregating distinct metadata into separate packages Recursive container –nesting semantically related metadata sets Indirect reference –allowing distribution and sharing of metadata sets
Metadata Container Container Package Dublin Core Package MARC record Package Indirect Reference Package Terms and Conditions URI
Open Implementation Issues Data encoding Semantic interaction of overlapping sets –between semantically-related packages –between semantically distinct packages Type registry
Modeling & Encoding Metadata Components: XML Namespaces Prevent term clash: –record?, creator? Establish concept spaces through URIs xmlns:dc=“ xmlns:abc=“ Herbert Van de Sompel Cornell University
Modeling & Encoding Metadata Components: RDF RDF (Resource Description Format) The instantiation of the Warwick Framework on the Web Provides enabling technology for richly- structured metadata Rich data model supporting notions of distinct entities and properties Syntax expressed in XML
RDF Components Formal data model Syntax for interchange of data Schema Type system (schema model)
RDF Data Model Directed labeled graphs Model elements –Resource –Property –Value –Statement –Containers
RDF Model Primitives Resource Property Value Resource Statement
RDF Syntax Example URI:R “CIMI Presentation” Title Creator dc: “Eric Miller” <RDF xmlns = “ xmlns:dc = “ CIMI Presentation Eric Miller
“Eric Miller” RDF Model Example #2 URI:R URI:ERIC oclc.org” “Eric Miller” “OCLC” bib: bib:Aff bib:Name URI:OCLC “CIMI Presentation” Title Creator oa: dc:
<RDF xmlns = “ xmlns:dc = “ xmlns:bib = “ CIMI Presentation Eric Miller RDF Syntax Example #2
RDF Containers Permit the aggregation of several values for a property Express multiple aggregation semantics –unordered –sequential or priority order –alternative
RDF Schemas Declaration of vocabularies –properties defined by a particular community –characteristics of properties and/or constraints on corresponding values Schema Type System - Basic Types –Property, Class, SubClassOf, Domain, Range –Minimal (but extensible) at this time –minimize significant clashes with typing system designed for XML Schema WG Expressible in the RDF model and syntax
Relationships among vocabularies dc:Creator ms:director marc:100 bib:Author
Bringing it together RDF Data Model –Support consistent encoding, exchange and processing of metadata… critical when aggregating data from multiple sources RDF Schema –Declare, define, reuse vocabularies RDF Metadata transmission –XML encoding
Interoperability among Metadata Vocabularies core classes Dublin Core MARC INDECSIMS
Attribute/Value approaches to metadata… Hamlet has a creator Shakespeare subjectimplied verbmetadata nounliteral Playwright metadata adjective The playwright of Hamlet was Shakespeare R1 “ Shakespeare ” “ Hamlet ” dc:creator.playwright dc:title
…run into problems for richer descriptions… Hamlet has a creator Stratford birthplace The playwright of Hamlet was Shakespeare, who was born in Stratford “ Stratford ” R1 “ Shakespeare ” dc:creator.playwright dc:creator.birthplace Hamlet has a creator Shakespeare
…because of their failure to model entity distinctions R1 “ Stratford ” creator R2 name “ Shakespeare ” birthplace title “ Hamlet ”
Understanding Metadata based on Query Capabilities Simple boolean tags? Agent, time, place questions? –Who was responsible for what and when
Applying a Model-Centric Approach Formally define common entities and relationships underlying multiple metadata vocabularies Describe them (and their inter-relationships) in a simple logical model Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies.
Conceptual Basis: Evolution of Content over Time IFLA Entity Model From Bearman, et. al., D-Lib Magazine, January 1999.
Events are key to understanding metadata relationships? Recognizing inherent lifecycle aspects of digital content - transformation of “input” resources to “output” resources and of their descriptions. (e.g., IFLA model) Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles. Clarifying attachment points facilitates mapping across common entities in different vocabularies.
Content, Events, & Descriptions
Museum Data