Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF.

Similar presentations


Presentation on theme: "CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF."— Presentation transcript:

1 CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF

2 CS3352 Sources of Knowledge for finding documents [DeRose99] “The user, including their current explicit query and any historical or profile information the system may have gained earlier. The documents in the library or on the web, including their nominal "content" and whatever metadata has been attached The world, about which the system may have certain information, such as dictionaries and thesauri of natural language terms; basic knowledge of object categories ("dog is-a animal"), and much more…” Text, image Mark-up, Links, Catalogue database Ontologies, Thesauri Knowledge

3 CS3352 What is metadata? Data cataloging resources – Administrative cataloguing: acquisition history, author… – Structural: size, image format… Data describing the content and meaning of resources royal UK male trophy presenter, footballer trophy winner

4 CS3352 Expressive, so we can say what we want; Compositional, so that we can build complex terms out of simple pieces; Controlled, so we only say consistent and coherent things; Incremental, so we can keep adding descriptions Metadata Representation

5 CS3352 Dublin Core A standard for metadata defined by the digital library community Others: MARC, VRA… 15 Elements: – Title Subject Description – Creator Publisher Contributor – Date TypeFormat – Identifier Source Language – RelationCoverage Rights From : Metadata for images, Michael Day http://www.ukoln.ac.uk Core elements defined in RFC 2413: http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2413.txt http://www.ariadne.ac.uk http://www.ukoln.ac.uk

6 CS3352 Metadata on the web yesterday Meta tags

7 CS3352 Metadata on the Web yesterday Being a Dog Is a Full-Time Job Charles M. Schulz Snoopy Peppermint Patty 1950-10-04 extroverted beagle Peppermint Patty 1966-08-22 bold, brash and tomboyish

8 CS3352 Metadata on the web yesterday

9 CS3352 World Wide Web Tim Berners-Lee reprise… “... a goal of the Web was that, if the interaction between person and hypertext could be so intuitive that the machine-readable information space gave an accurate representation of the state of people's thoughts, interactions, and work patterns, then machine analysis could become a very powerful management tool, seeing patterns in our work and facilitating our working together through the typical problems which beset the management of large organizations.” Berners-Lee 1996

10 CS3352 Web = Data+Information-Knowledge Browse the Links Search using Words  steamer, tank Search using experience Link structure is content – rhetorical narratives Search using indexes Metadata and classifications

11 CS3352 “Find a very successful European team-based sports person” Resource describing UK soccer players and their careers Resource listing sporting competitions including FA Cup and Superbowl Resource that lists teams that have won the FA Cup Resource describing the Olympic Games Steve Redgrave’s home page ? Metadata Knowledge Inference

12 CS3352 People Sport Competition Soccer participates participants = 11 Rowing Coxless Fours participants = 4 Tournament Event Sports Tournament Olympic Games Soccer player Sports Person Rower Wimbledon win Rower win Olympic Games UK Rower win Olympic Games > 2 times Tennis FA Cup Soccer player wins FA Cup once Soccer Tournament Tennis Tournament Country nationality UK Europe partof holds

13 CS3352 A Shared Understanding Metadata – Data describing the content and meaning of resources – But everyone must speak the same language… Terminologies – Shared and common vocabularies – For search engines, agents, curators, authors and users – But everyone must mean the same thing… Ontologies – Shared and common understanding of a domain – Essential for exchange and discovery

14 CS3352 Ontologies “The [reusable] specification of conceptualizations, used to help programs and humans share knowledge” [Gruber93] An ontology will include: – a vocabulary of terms, and – some specification of their meaning – structure on the domain and constrain the possible interpretations of terms [Uschold99] – precise notion of what meaning means Ontologies provide: a shared and common understanding of a domain that can be communicated across people and applications

15 CS3352 Ontology Precise notion of what meaning means formal, explicit, rigour unambigious agents not just people machine computable from machine-readable to machine-understandable. use knowledge representation and reasoning to supply the meaning

16 CS3352 What is an Ontology? Catalog/ ID General Logical constraints Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal is-a Formal instance Value Restrs. Disjointness, Inverse, part-of… From Debbie McGuinness

17 CS3352 Ontologies and E-Anything Simple ontologies provide: Controlled shared vocabulary (search engines, authors, users, databases, programs all speak same language) Organization (and navigation support) Expectation setting (left side of many web pages) Browsing support (tagged structures such as Yahoo!) Search support (query expansion approaches such as FindUR, e-Cyc) Sense disambiguation Conflict detection Structured, comparative search Generalization/ Specialization … From Debbie McGuinness

18 CS3352 The Semantic Web http://www.semanticweb.org

19 CS3352 Metadata on the web tomorrow Resources annotated with metadata using knowledge as a shared vocabulary – Metadata held outside the resource Knowledge structures for holding the ontology – XML DTDs Product classifications – Directories Home > Recreation > Sports > Events > International Games > Olympic Games > W3C: RDF and RDFS – Resource Description Framework Topic maps DAML+OIL

20 CS3352 XML is not good for describing ontologies XML defines grammars to verify and structure documents The grammar enforces constraints on tags Different grammars define the same content XML lacks a semantic model – it only has a surface model which is a tree. course teachertitlestudents namehttp............... node = label + attr/values + contents

21 CS3352 XML is not good for describing ontologies Meaning of XML documents is intuitively clear – “semantic” markup tags are domain terms But computers do not have intuition – Tag names per se do not provide semantics – The semantics are encoded outside the XML specification XML makes no commitment on:  Domain specific ontological vocabulary  Ontological modelling primitives  requires pre-arranged agreement on  &  Feasible for closed collaboration – agents in a small & stable community – pages on a small & stable intranet

22 CS3352 XML DTDs and XML Schema DTD does not distinguish between objects and relations XML Schema’s type extension mechanism is a red herring – it can’t be used to model ontological subtypes XML has been used as a serialisation syntax for other markup languages – e.g. SMIL, XOL person year-of-birth 1

23 CS3352 Requirements for an Ontology-language Well designed – Useful and proven modelling primitives – Intuitive to human users – Can say simple things simply – Expressive enough to capture many ontologies – Efficient, sound and complete reasoning support Well defined – clear syntax - read ontologies – Formal semantics – understand (process) ontologies - to facilitate machine interpretation of that semantics; – Expressive enough to capture many ontologies Compatible – Easy mapping to/from other ontology languages – Maximum compatibility with XML and RDF(S);

24 CS3352 Sem Web Research Issues Ontology creation – Millions of ontologies will be built – Ontology Engineering is difficult and time-consuming – Ontology Learning – Scalable RDF Repositories (all is built on top of the same data model !) Infrastructure – Scalable reasoning services for different languages – Resource-ID Management – Versioning of ontologies and corresponding metadata

25 CS3352 Sem Web Research Issues Metadata Management – legacy data (HTML, XML,...) -> legacy data migration: – Annotation of Web documents (HTML, PDF,...) – Semi-automation using information extraction – XML-Wrapper / Transformer – Database Converter / Exporter Maintenance of Metadata, ontologies and resources – sources, ontologies, and metadata have to be maintained in a consistent way organizational process is needed tools are needed Metadata have to reflect changes of the sources metadata have to reflect changes of the ontologies

26 CS3352 Selected Semantic Web Projects COHSE – http://inanna.ecs.soton.ac.uk/cohse/http://inanna.ecs.soton.ac.uk/cohse/ Ontobroker – http://ontobroker.aifb.uni-karlsruhe.de/ SHOE – http://www.cs.umd.edu/projects/plus/SHOE/


Download ppt "CS3352 Metadata The Semantic Web Directories and Thesauri XML is not enough Topic maps RDF."

Similar presentations


Ads by Google