Nancy Ide Vassar College USA Resource Definition Framework A Tutorial EUROLAN 2003 July 28 - August 8 Bucharest - Romania
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania The Semantic Web: Where RDF fits in RDF overview Concepts Data Model RDF Syntax RDF Schema RDF, RDFS and language technology Outline
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania What is the Semantic Web? “a conceptual information space in which resources identified by URIs can be processed by machines” Relies on three key elements: identification of resources defining the semantics of resource descriptions and relationships among resources inferring new knowledge from available information All of this must be done using common, machine- processable notations Overview
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Supporting Technologies The Layer-cake model XML RDF RDF Schema Ontologies (OWL) Rules Logic Framework
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Provides a common syntax for marking up documents Data model: ordered, labeled tree The Base: XML The Royal Navy Sir Edward Bulwer-Lytton Barron Lytton of Kenworth bookinfo title surNametitle author persName foreName placeName roleName
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Why Do We Need RDF? The Royal Navy Sir Edward Bulwer-Lytton Barron Lytton of Kenworth XML provides only impoverished semantics The Royal Navy type=“pen name”> Sir Edward Bulwer-Lytton Barron Lytton of Kenworth What the human sees What the computer sees
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania No agreement on structure what does nesting mean? Part-of? Something else? is bookInfo an object? class? attribute? relation? something else? vocabulary do both title elements mean the same thing? is author the same as creator? XML “semantics”
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Provides a way to give meaning to information that is machine-processable W3C Recommendation A data model for describing data about data (metadata) RDF Resource Definition Framework
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Three object types Resources Things being described by RDF expressions. Resources are always named by URIs e.g., HTML Document, specific XML element within the document source, a collection of pages, a book Properties Specific aspect, characteristic, attribute or relation used to describe a resource e.g., Creator, Title, Name Statements Resource (Subject) + Property (Predicate) + Property Value (Object) RDF
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania RDF Statements Three parts: subject, predicate, object describe properties of resources Resource Anything that can be described by a URI a document, part of a document, image, on the Web a real world object e.g. a book: isbn:// The Data Model
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Uniform Resource Identifier The generic set of all names/addresses consisting of short strings that refer to resources URLs (Uniform Resource Locators) are a particular type of URI, used on the WWW URIs look like URLs, sometimes with fragment identifiers to point at specific parts of a document URIs ent
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Basic element is the triple a resource (the subject) is linked to another resource (the object) via an arc labeled by a relation (the predicate) has a property valued by Example RDF Nancy Ide Encoding Syntactic Annotation author-of
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Statements The English word “car” translates to the French word “voiture” The word “car” is a noun Nancy Ide the author of “Encoding Syntactic Annotation” Examples translates-to CARvoiture noun Nancy Ide Encoding Syntactic Annotation is-a author-of SUBJECTPREDICATEOBJECT CAR translates-to voiture CAR is-a noun Nancy Ide author-of Encoding Syntactic Annotation
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania The subject of one statement can be the object of another statement RESULT: a labeled directed graph RDF Triples Nancy Ide Encoding Syntactic Annotation author-of employee Vassar College
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania One syntax for expressing RDF statements is XML Tags and attributes have a specific meaning Description element describes a resource every attribute or nested element inside a Description is a property of that resource RDF Syntax Encoding Syntactic Annotation Does this solve the structure and vocabulary problems?
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Different ways to express the same model RDF/XML Syntax is Just a Syntax Encoding Syntactic Annotation Encoding Syntactic Annotation <Description about=” author-of=”Encoding Syntactic Annotation”/>
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Use namespaces to indicate where the defining RDF schema exists Namespaces <rdf:RDF xmlns:rdf=" xmlns:vassar=” xmlns:biblio=” Encoding Syntactic Annotation
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Make explicit statements about web resources The computer knows that these are statements, knows how the statements relate, can compare values But...we still lack a way to define a vocabulary Should we use author or creator? Is Nancy Ide an author? Are there other authors? What properties can authors have? What is RDF Used For?
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania RDF is a data model that allows you to assert relation(s) between two objects RDFS (RDF schemas) are a means to define classes and sub-classes of objects and the relations that may hold between these objects RDF and RDFS
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania RDF provides a data model for metadata annotation and a way to express it in XML, but it cannot define the vocabulary for a domain RDF Schema allow you to define vocabulary terms and the relations between these terms Adds semantics to RDF predicates and resources define how a term should be interpreted by specifying its properties and the kinds of objects that can be the values of these properties RDF Schema
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania RDF Schema core primitives Class, Property type, subClassOf, domain, range Vocabulary definition with these primitives: Some RDF Schema Terminology These are just RDF statements, but in RDF Schema they have special meaning
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania The semantics of RDF Schema are expressed in natural language: rdfs:subClassOf The semantics of RDF Schema “This property specifies a subset/superset relation between classes. The rdfs:subClassOf property is transitive. If class A is a subclass of some broader class B, and B is a subclass of C, then A is also implicitly a subclass of C. Consequently, resources that are instances of class A will also be instances of class C, since A is a subset of both B and C. Only instances of rdfs:Class can have the rdfs:subClassOf property and the property value is always of rdf:type rdfs:Class. A class may be a subclass of more than one class.”
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Set-theoretical semantics for RDF and RDFS specifies entailment rules, for example: [rdfs7b] (reflexivity) (xxx, rdf:type, rdfs:Class) => (xxx, rdfs:subClassOf, xxx) [rdfs8] (transitivity) (xxx, rdfs:subClassOf, yyy) & (yyy, rdfs:subClassOf, zzz) => (xxx, rdfs:subClassOf, zzz) RDF Model Theory
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Example RDF Schema Part-of-Speech NounVerb Motion VerbCommon Noun Subject-of sub-class of domainrange Ontology Level Data Level Subject-of Dogsrun type
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Part-of-Speech NounVerb Motion VerbCommon Noun Subject-of sub-class of domainrange Ontology Level Language Level Resource sub-class of PropertyClass
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Classes and properties are modeled separately! Different from typical Object-Oriented modeling where properties (attributes) are part of a class Because of this, domain/range statements are very restrictive Observations Remember: RDF Schema is just RDF, but with some added meaning to particular terms
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Domain Restrictions Part-of-Speech NounVerb Motion VerbCommon Noun Gender domain chat bouge M M “M” is a literal value
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Problem solved... NounVerb Gender domain Part-of-Speech Moving the domain restriction up the hierarchy solves the problem But risk over-generalization properties get “loose” restrictions classes may be allowed properties they should not have e.g. now any part of speech has the GENDER property
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania RDF Schema Syntax <rdfs:domain rdf:resource=" Noun Class for nouns <rdfs:subClassOf rdfs:resource=" POS Class for the general category part of speech Class Definitions Property Definition
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Putting It All Together <rdfs:domain rdf:resource=" Noun Class for nouns <rdfs:subClassOf rdfs:resource=" POS Class for the general category part of speech <rdf:RDF xmlns:rdf=" xmlns:rdfs=" The schema file:
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Using the Schema <rdf:RDF xmlns:rdf=" xmlns:pos="
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Defining a Default Namespace <rdf:RDF xmlns:rdf=" xmlns="
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Referring To Another Resource <rdf:RDF xmlns:rdf=" xmlns="
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania One possible use of RDF is to pre-define “linguistic objects” that can be used by other resources such as lexicons, taggers, etc. An RDF schema defines a class and its properties, but does not instantiate objects of that class in previous examples, “dogs” and “run” were instantiated as objects of class Noun Creating Pre-defined Linguistic Objects
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania A “Data Category” Definition <rdf:RDF xmlns:rdf=" xmlns=" File:
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania Using the Definition <rdf:RDF xmlns:rdf=" xmlns:ling=" <rdf:RDF xmlns:rdf=" xmlns:rdfs=" Word Class for a word <rdfs:domain rdfs:resource=" Additions to the linguistics schema.rdf
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania RDF and RDFS give us the capability to provide some semantics for resources and the relations between them But there is a lot missing boolean operators, cardinality constraints, disjunction, etc. These are in the next level: OWL Beyond RDF and RDFS
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania The previous examples suggest how the Semantic Web can benefit language technology Resources Pre-defined linguistic objects can be used in lexicons, term banks, annotations, etc. Goes toward a commonly agreed-upon set of categories Language Processing applications Can exploit linguistic knowledge “attached” to data to enhance capability The Semantic Web and Language Technology
EUROLAN 2003 THE SEMANTIC WEB AND LANGUAGE TECHNOLOGY July 28 - August 8, Bucharest - Romania W3C RDF Model and Syntax Specification W3C RDF Schema Specification / W3C RDF Validation Service W3C RDF List of RDF resources SiRPAC - Simple RDF Parser & Compiler (Java) Libwww - RDF Parser (C) Resources and Tools