Metadata for the Web From Discovery to Description

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

Dublin Core for Digital Video: Overview of the ViDe Application Profile.
T. Baker / 27 March 2000 A Registry for Dublin Core Thomas Baker, GMD IuK 2000: "Information, Knowledge and Knowledge Management Darmstadt, 27 March 2000.
T. Baker / 23 Sep 2000 Dublin Core Qualifiers and A Grammar for Dublin Core Thomas Baker DC-8, National Library of Canada, Ottawa 4 October 2000.
Dublin Core Metadata Tutorial July 9, 2007 Stuart Weibel Senior Research Scientist OCLC Programs and Research.
A centre of expertise in digital information management Approaches To The Validation Of Dublin Core Metadata Embedded In (X)HTML Documents Background The.
UKOLN, University of Bath
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Developing a Metadata Exchange Format for Mathematical Literature David Ruddy Project Euclid Cornell University Library DML 2010 Paris 7 July 2010.
CS570 Artificial Intelligence Semantic Web & Ontology 2
Natalia Wehler: Dublin Core Requirements on Metadata  multiple softwares to use metadata  management of changing standards  needs to be functional,
© Tefko Saracevic, Rutgers University1 metadata considerations for digital libraries.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
Cornell CS 502 Metadata for the Web From Discovery to Description CS 502 – Carl Lagoze – Cornell University.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
Metadata for the Web A Necessary Evil? CS 431 – March 2, 2005 Carl Lagoze – Cornell University.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Resource Description Framework ( RDF ) Xinxia An.
Some URLs JODI Paper – Harmony project –
Stuart Weibel OCLC, Inc. October, 1997 Dublin Core Metadata Stuart Weibel Consulting Research Scientist OCLC Office of Research purl.org/net/weibel October.
UKOLUG - July Metadata for the Web RDF and the Dublin Core Andy Powell UKOLN, University of Bath UKOLN.
Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.
JENN RILEY METADATA LIBRARIAN IU DIGITAL LIBRARY PROGRAM Introduction to Metadata.
Metadata Modularization Concepts and Tools Carl Lagoze CS
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata Bridget Jones Information Architecture I February 23, 2009.
A Quick Introduction to Metadata Michael Day UKOLN: The UK Office for Library and Information Networking, University of Bath
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
Introduction to Metadata Jenn Riley Metadata Librarian IU Digital Library Program.
1 The ABC Metadata Ontology and Model Carl Lagoze, Cornell University Jane Hunter, DSTC.
21 June 2001Managing Information Resources for e-Government1 The Dublin Core Makx Dekkers, Managing Director, Dublin Core Metadata Initiative
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
A centre of expertise in digital information managementwww.ukoln.ac.uk DCMI Affiliates: Implications for Institutions Rosemary Russell UKOLN University.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Future of Cataloguing: how RDA positions us for the future for RDA Workshop June, 2010.
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
Metadata for the Web Beyond Dublin Core? CS 431 – March 9, 2005 Carl Lagoze – Cornell University Acknowledgements to Liz Liddy and Geri Gay.
The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.
1 RDF, XML & interoperability Metadata : a reprise Communities, communication & XML An introduction to RDF RDF, XML and interoperability.
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
prepared by Dr. Ammar Yakan
Building the Semantic Web
Metadata Standards - Types
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Catherine Lai MUMT-611 MIR January 27, 2005
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Introduction to Metadata
RDF For Semantic Web Dhaval Patel 2nd Year Student School of IT
Applications of IFLA Namespaces
Attributes and Values Describing Entities.
Introduction to Semantic Metadata & Semantic Web
Introduction to Metadata
Some Options for Non-MARC Descriptive Metadata
Attributes and Values Describing Entities.
Presentation transcript:

Metadata for the Web From Discovery to Description CS 502 – 20020224 Carl Lagoze – Cornell University Cornell CS 502

The fifteen Dublin Core Elements http://dublincore.org/usage/terms/dc/current-elements/ http://dublincore.org Cornell CS 502

A Pidgin for Digital Tourists Metadata is language Dublin Core is a small and simple language -- a pidgin -- for finding resources across domains. Speakers of different languages naturally "pidginize" to communicate E.g., tourists using simple phrases to order beer ("zwei Bier bitte" "dva pivo" "biru o san bai"...) We are all "tourists" on the global Internet. Cornell CS 502

What is the Dublin Core (1) A simple set of properties to support resource discovery on the web (fuzzy search buckets)? Domain Independent view Cornell CS 502

An extensible ontology for resource desciption? What is Dublin Core (2)? An extensible ontology for resource desciption? Greater Functionality & Cost Cornell CS 502

What is the Dublin Core (3)? A cross-domain switchboard for interoperable metadata? Dublin Core MARC INDECS IMS - projections to application-specific metadata vocabularies Switchboard Cornell CS 502

Dublin Core Qualifiers From fuzzy buckets to more specific description Model of “graceful degradation” Support both simplicity and specificity Intra-domain and inter-domain semantics Cornell CS 502

Varieties of qualifiers: Element Refinements Make the meaning of an element narrower or more specific. Narrowing implies an is a relationship a "date created“ is a "date“ an "is part of relation“ is a "relation“ If your software does not understand the qualifier, you can safely ignore it. Cornell CS 502

Varieties of Qualifiers: Value Encoding Schemes Says that the value is a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) a string formatted in a standard way (e.g., "2001-05-02" means May 3, not February 5) Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery. Cornell CS 502

A Grammar of Dublin Core http://www.dlib.org/dlib/october00/baker/10baker.html By design not as subtle as mother tongues, but easy to learn and extremely useful in practice Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives) Simple grammars: sentences (statements) follow a simple fixed pattern... Cornell CS 502

Example Dublin Core statements Resource has Title 'Grammar of Dublin Core'. Resource has Creator 'Tom Baker'. Resource has Subject 'Metadata'. Resource has Relation http://foo.org/file.htm. Cornell CS 502

Resource has property X implied verb one of 15 properties property value (an appropriate literal) DC:Creator DC:Title DC:Subject DC:Date... implied subject Resource has property X Cornell CS 502

Resource has property X implied verb one of 15 properties property value (an appropriate literal) DC:Creator DC:Title DC:Subject DC:Date... implied subject Resource has property X qualifiers (adjectives) [optional qualifier] [optional qualifier] Cornell CS 502

Resource has Subject "Languages -- Grammar" Resource has Date LCSH Resource has Date "2000-06-13" ISO8601 Revised Cornell CS 502

Dumb-Down Principle for Qualifiers The fifteen elements should be usable and understandable with or without the qualifiers Qualifiers refine meaning (but may be harder to understand) Nouns can stand on their own without adjectives If your software encounters an unfamiliar qualifier, look it up -- or just ignore it! "has a“ relations break the model E.g., a creator has a hair color Cornell CS 502

Resource has Subject "Languages -- Grammar" Resource has Date Test for “good““ qualifiers: cover and ask: -- Does the statement still make sense? -- Is it still correct? Resource has Subject "Languages -- Grammar" LCSH Resource has Date "2000-06-13" ISO8601 Revised Cornell CS 502

“Incorrect” Qualification Resource has creator “Cornell University” affiliation Resource has subject “pre-schoolers” audience Cornell CS 502

Open questions in this model Are uncontrolled and unconstrained values really useful for discovery? Is it possible for an organization (DCMI) to control the evolution of a language? How can "simple discovery metadata" be combined with complex descriptions? Is there a notion of graceful degradation? Can DC serve as a lingua franca (mapping template) among more complex models Cornell CS 502

Models for Deploying Metadata Embedded in the resource low deployment threshold Limited flexibility, limited model Linked to from resource Using xlink Is there only one source of metadata? Independent resource referencing resource Model of accessing the object through its surrogate Resource doesn’t ‘have’ metadata, metadata is just one resource annotating another Cornell CS 502

Syntax Alternatives: HTML Advantages: Simple Mechanism – META tags embedded in content Widely deployed tools and knowledge Disadvantages Limited structural richness (won’t support hierarchical,tree-structured data or entity distinctions). Cornell CS 502

Dublin Core in HTML http://www.dublincore.org/documents/2000/08/15/dcq-html/ HTML constructs <link> to establish pseudo-namespace <meta> for metadata statements name attribute for DC element (DC.element.ER) content attribute for element value scheme attribute for encoding scheme or controlled vocabulary lang attribute for language of element value Cornell CS 502

Dublin Core in HTML example <link rel="schema.DC" href="http://purl.org/dc/elements/1.1"> <meta name="DC.Title" content="Business Unusual”> <meta name=“DC.Title” lang=“es” content=“negocio inusual”> <meta name="DC.Creator" content="Carl Lagoze"> <meta name="DC.Subject" content="bibliographic control web cataloging "> <meta name="DC.Date.Created" scheme="W3CDTF" content="2000-10-23"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://lcweb.loc.gov/lagoze_paper.html"> Cornell CS 502

Unqualified Dublin Core in XML http://dublincore.org/documents/2002/09/09/dc-xml-guidelines/ Cornell CS 502

Multi-entity nature of object description Photographer Camera type Software Computer artist Cornell CS 502

Attribute/Value approaches to metadata… The playwright of Hamlet was Shakespeare subject implied verb metadata noun literal Hamlet has a creator Shakespeare Playwright metadata adjective R1 “Shakespeare” “Hamlet” dc:creator.playwright dc:title Cornell CS 502

…run into problems for richer descriptions… The playwright of Hamlet was Shakespeare, who was born in Stratford Hamlet has a creator Stratford birthplace “Stratford” R1 “Shakespeare” dc:creator.playwright dc:creator.birthplace Cornell CS 502

…because of their failure to model entity distinctions “Shakespeare” name R1 R2 creator birthplace title “Stratford” “Hamlet” Cornell CS 502

Applying a Model-Centric Approach Formally define common entities and relationships underlying multiple metadata vocabularies Describe them (and their inter-relationships) in a simple logical model Provide the framework for extending these common semantics to domain and application-specific metadata vocabularies. Cornell CS 502

Events are key to understanding resource complexity? Events are implicit in most metadata formats (e.g., ‘date published’, ‘translator’) Modeling implied events as first-class objects provides attachment points for common entities – e.g., agents, contexts (times & places), roles. Clarifying attachment points facilitates understanding and querying “who was responsible for what when”. Cornell CS 502

ABC/Harmony Event-aware metadata ontology Recognizing inherent lifecycle aspects of description (esp. of digital content) Modeling incorporates time (events and situations) as first-class objects Supplies clear attachment points for agents, roles, existential properties Resource description as a “story-telling” activity Cornell CS 502

Resource-centric Metadata Title Anna Karenina Author Leo Tolstoy Illustrator Orest Vereisky Translator Margaret Wettlin Date Created 1877 Date Translated 1978 Description Adultery & Depression ? Birthplace Moscow Birthdate 1828 Cornell CS 502

“Orest Vereisky” “Leo Tolstoy” "Moscow" “1828” “Margaret Wettlin” “illustrator” “author” “translator” “creation” “1877” “1978” “translation” “Russian” “English” “Anna Karenina” “Tragic adultery and the search for meaningful love” Cornell CS 502

Breaking the metadata bottleneck Human vs. machine generation Simple text scraping HTML tags as hint Other structural methods Natural language methods and machine learning Contextual methods Google (text and image search) Cornell CS 502

Putting metadata in its place Cornell CS 502

Query engine architecture space Cornell CS 502