Computer Science Department Brigham Young University CS652 – Spring 2004 Yihong Ding XML, RDF, and OWL The Derivation of Web Ontology Language
2 Acknowledgments This presentation uses several researchers’ previous examples Special thanks to Roger L. Costello and David B. Jacobs in MITRE Corporation, Hamish Cunningham and Kalina Bontcheva in University of Sheffield, David De Roure in GGF Semantic Grid Research Group, and one anonymous researcher who provides excellent explanation of RDF syntax.
3 The Holy Grail Hamish Cunningham and Kalina Bontcheva, Ontology-Aware Information Extraction, 2002
4 Semantic Web Wedding Cake
5 XML: document = labeled tree course teachertitlestudents namehttp = XML Schema: grammars for describing legal trees and datatypes Why not use XML to represent semantics?
6 Syntax and Semantics Syntax: structure of the data Semantics: meaning of the data Two conditions necessary for interoperability: Adopt a common syntax: this enables applications to parse the data. Adopt a means for understanding the semantics: this enables applications to use the data.
7 Can XML represent semantics? … title: a heading that names a statute or legislative bill. title: the name of a work of art or literary composition etc. title: a general or descriptive heading for a section of a written work. title: the status of being a champion. title: a legal document signed and sealed and delivered to effect a transfer of property and to show the legal right to possess it … (from WordNet)
8 XML: limitations for semantic markup XML makes no commitment on: Domain-specific ontological vocabulary Ontological modeling primitives Requires pre-arranged agreement on & Only feasible for closed collaboration agents in a small & stable community pages on a small & stable intranet Not suited for sharing Web-resources
9 What is the purpose of RDF? The purpose of RDF (Resource Description Framework) is to give a standard way of specifying data "about" something. Here's an example of an XML document that specifies data about China's Yangtze river : <River id="Yangtze" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea "Here is data about the Yangtze River. It has a length of 6300 kilometers. Its startingLocation is western China's Qinghai-Tibet Plateau. Its endingLocation is the East China Sea."
10 From XML to RDF <River id="Yangtze" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea XML <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea RDF Yangtze.xml Yangtze.rdf "convert to"
11 Internationalized Resource Identifier (IRI) <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea RDF provides an ID attribute for identifying the resource being described. The ID attribute is in the RDF namespace. Add the "fragment identifier symbol" to the namespace
12 Namespaces Newest version: W3C Recommendation in February 4 th, 2004 (Namespaces in XML 1.1) A simple method for qualifying element and attribute names used in XML documents Identified by IRI references
13 RDF Namespace ID about type resource Description
14 RDF Framework Model Property TypeValue Property RDF Description Resource IRI
15 The RDF Format <Class rdf:ID="Resource" xmlns:rdf=" xmlns="uri"> value...
16 More Interpretation <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea Identifies the type (class) of the resource being described. Identifies the resource being described. This resource is an instance of River. These are properties, or attributes, of the type (class). Values of the properties
17 Uniquely Identify the Resource RDF is very concerned about uniquely identifying the type (class) and the properties. RDF is also very concerned about uniquely identifying the resource, e.g., <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea This is the resource being described. We want to uniquely identify this resource.
18 rdf:ID The value of rdf:ID is a "relative URI". The "complete URI" is obtained by concatenating the URL of the XML document with "#" and then the value of rdf:ID, e.g., <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea Suppose that this RDF/XML document is located at this URL: Thus, the complete URI for this resource is: Yangtze.rdf
19 xml:base By default, the URL of the document provided the base URI. Depending on the location of the document is brittle: it will break if the document is moved, or is copied to another location. A more robust solution is to specify the base URI in the document, e.g., <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" xml:base=" kilometers western China's Qinghai-Tibet Plateau East China Sea Resource URI = concatenation(xml:base, '#', rdf:ID) = concatenation( '#', "Yangtze") =
20 rdf:about Instead of identifying a resource with a relative URI (which then requires a base URI to be prepended), we can give the complete identity of a resource. However, we use rdf:about, rather than rdf:ID, e.g., <River rdf:about=" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea
21 rdf:Description + rdf:type There is another way of representing the XML. This way makes it very clear that you are describing something, and it makes it very clear what the type (class) is of the thing you are describing: <rdf:Description rdf:about=" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea
22 RDF Triple Model RDF “statements” consist of resources (= nodes) which have properties which have values (= nodes,strings) “6300 kilometers” = subject = predicate = object “ has a of 6300 kilometers ” resource value property
23 RDF Graph Model “6300 Kilometers” “western China's Qinghai-Tibet Plateau” “East China Sea”
24 Naming Convention The convention is to use a capital letter to start a type (class) name, and use a lowercase letter to start a property name. This helps the eye quickly discern the striping pattern. <River rdf:about=" xmlns:rdf=" xmlns=" kilometers western China's Qinghai-Tibet Plateau East China Sea uppercase lowercase
25 Complex Values RDF/XML can also represent graphs that include nodes that have no IRIrefs, i.e., the blank nodes, syntactically, values can be embedded (i.e. lexically in-line) or referenced (linked) …:location …:starting …:ending “western China's Qinghai-Tibet Plateau” “East China Sea”
26 Complex Values (RDF code) …:location …:starting …:ending “western China's Qinghai-Tibet Plateau” “East China Sea” <rdf:Description rdf:about=" xmlns:rdf=" xmlns=" western China's Qinghai-Tibet Plateau East China Sea
27 rdf:ID versus rdf:about When should rdf:ID be used? When should rdf:about be used? When you want to introduce a resource, and provide an initial set of information about a resource use rdf:ID When you want to extend the information about a resource use rdf:about The RDF philosophy is akin to the Web philosophy. That is, anyone, anywhere, anytime can provide information about a resource.
28 RDF Description Resource 1 Resource 2 Resource 3 PropertyType1PropertyType3 PropertyType2 PropertyType4 “Atomic Value”
29 RDF Parser There is a nice RDF validation Web services at the W3C Web site, which will tell you if your XML is in the proper RDF format.
30 Notes of using the RDF Format Constrained: the RDF format constrains you on how you design your XML (i.e., you can't design your XML in any arbitrary fashion). RDF uses namespaces to uniquely identify types (classes), properties, and resources. Thus, you must have a solid understanding of namespaces. Another XML vocabulary to learn: to use the RDF format you must learn the RDF vocabulary.
31 Two Main Areas of RDF RDF SchemaRDF Syntax RDF XML
32 RDF Schema (RDFS) Defines small vocabulary for RDF: Class, subClassOf, type Property, subPropertyOf domain, range Vocabulary can be used to define other vocabularies for your application domain The benefit of an RDFS is that it facilitates inferences on your data, and enhanced searching. Person StudentResearcher subClassOf Jeen type HasSupervisor domain range Frank type hasSuperVisor
33 Ocean Lake BodyOfWater River Stream Properties: length: Literal emptiesInto: BodyOfWater Sea NaturallyOccurringWaterSource TributaryBrook Inference Engine Inferences: - Yangtze is a Stream - Yangtze is an NaturallyOcurringWaterSource - is a BodyOfWater Yangtze.rdf Rivulet <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" kilometers
34 Ocean Lake BodyOfWater River Stream Properties: length: Literal emptiesInto: BodyOfWater Sea NaturallyOccurringWaterSource TributaryBrook <River rdf:ID="Yangtze" xmlns:rdf=" xmlns=" kilometers Search Engine Results: - Yangtze is a Stream, so this document is relevant to the query. "Show me all documents that contain info about Streams" Yangtze.rdf Rivulet
35 RDF Schemas is all about defining taxonomies (class hierarchies) <rdf:RDF xmlns:rdf=" xmlns:rdfs=" xml:base=" This is read as: "I hereby define a River Class. River is a subClassOf Stream." "I hereby define a Stream Class. Stream is a subClassOf NaturallyOccurringWaterSource."... NaturallyOccurringWaterSource.rdfs (snippet) All classes and properties are defined within rdf:RDF Defines the River class Defines the Stream class Since the Stream class is defined in the same document we can reference it using a fragment identifier. 1 2 Assigns a namespace to the taxonomy! 3 4 5
36 rdfs:Class This type is used to define a class. The rdf:ID provides a name for the class. The contents are used to indicate the members of the class. The contents are ANDed together. Name of the class ANDed
37 rdfs:subClassOf Stream River This represents the set of Streams, i.e., the set of instances of type Stream. This represents the set of Rivers, i.e., the set of instances of type River.
38 Multiple rdfs:subClassOf Properties Stream River SedimentContainer - a River is both a Stream and a SedimentContainer. The conjunction (AND) of two subClassOf statements is a subset of the intersection of the classes.
39 rdf:Property This type is used to define a property. The rdf:ID provides a name for the property. The contents are used to indicate the usage of the property. The contents are ANDed together. Name of the property ANDed
40 Example of multiple rdfs:range BodyOfWater range CoastalWater - the value of emptiesInto is a BodyOfWater and a CoastalWater.
41 Example of multiple rdfs:domain River domain Vessel - emptiesInto is to be used in instances that are of type River and Vessel.
42 Class and Property: different namespaces Class is in the rdfs namespace. Property is in the rdf namespace.
43 Properties are defined separately from classes RDF Schema approach is to define a class, and then separately define properties and state that they are to be used with the class. The advantage of this approach is that anyone, anywhere, anytime can create a property and state that it is usable with the class!
44 Problems Equivalent classes Cardinality constraints More … no precisely described meaning no inference model
45 Beyond RDF: & OIL (Ontology Inference Layer) extends RDF Schema to a fully-fledged knowledge representation language. logical expressions data-typing cardinality quantifiers DAML (DARPA Agent Markup Language) = US sister of OIL Merged as DAML+OIL in 2001 Becomes OWL W3C Recommendation in February 10 th, 2004
46 DARPA’s DAML/ W3C’s OWL Language Web Languages RDF/S XML DAML-ONT Formal Foundations Description Logics FACT, CLASSIC, DLP, … Frame Systems DAML+OIL (OWL) OIL
47 OWL Web Ontology Language OWL
48 OWL cannot be a simple semantic extension of RDF/S Relationship between layers Syntactically no restriction Semantically preserve meanings Russell’s paradox A very large collection of built-in sets These built-in sets include the set consisting of those sets do not contain themselves Is this set a member of itself? Yes? It contains itself, so no No? It do not contain itself, so yes Violate the very principle of set theory: set membership should be a well-defined relationship
49 OWL cannot be a simple semantic extension of RDF/S If OWL layered on top of RDF/S as a same-syntax extension There has to bee a large collection of built-in classes in any model When we want to make logical foundations of classes in the extension work correctly This collection includes the class that is defined as those resources that do not belong to the class Russell’s paradox RDF/S does not fall into this paradox because it does not need a large collection of built-in classes RDFS theory of classes and properties is very weak Not possible to give class a formula or determine which resources belong to him OWL is designed to allow for defined classes and more relationships between classes This richer theory clashes the underlying principle of RDF/S
50 OWL Extends RDF RDF-schema Class, subclass Property, subproperty + Restrictions Range, domain Local, global Existential Cardinality + Combinators Union, Intersection Complement Symmetric, transitive + Mapping Equivalence Inverse
51 Again! What is an Ontology? An ontology answers questions that are implicit in your data. ABCD ZXYZXY How many guns can have this serial number? How many people can have this driver's license number? Can this gun be registered in other gun licenses? How many guns/people are registered in a gun license?
52 Gun License Ontology answers the Questions! ABCD ZXYZXY Only one gun can have this serial number. Only one person can have this driver's license number. A gun can be registered in only one gun license. A gun license registers one gun to one person
53 Ontologies vs. Markups Ontologies contain “persistent” information Markups – data about specific instances of classes and properties E.g., general knowledge about the class River (ontology) vs. data about specific river in a country (markup) OWL does not enforce this separation
54 Ontologies vs. Markups <River rdf:ID="Yangtze" xmlns:rdf=" xmlns="
55 Properties
56 OWL Full, OWL DL, and OWL Lite OWL Full OWL DL OWL Lite Description Logics provides a careful balance between expressivity and computational complexity OWL provides sublanguages with reduced expressivity and computational complexity
57 Language Constructs: OWL Lite Class rdf:Property rdfs:subClassOf rdfs:subPropertyOf rdfs:domain rdfs:range Individual equivalentClass equivalentProperty sameIndividualAs differentFrom allDifferent inverseOf TransitiveProperty SymmetricProperty FunctionalProperty InverseFunctionalProperty allValuesFrom someValuesFrom minCardinality (only 0 or 1) maxCardinality (only 0 or 1) cardinality (only 0 or 1) intersectionOf Imports priorVersion …more
58 Language Constructs: DL & Full one of disjointWith equivalentClass (applied to class expressions) rdfs:subClassOf (applied to class expressions) unionOf intersectionOf complementOf Arbitrary Cardinality minCardinality maxCardinality cardinality hasValue
59 More OWL example
60 More OWL example
61 Differences OWL Lite Support classification hierarchy and simple constraint features. Tool support is simple Provide a quick migration path for thesauri and other taxonomies Support cardinality constraints, but only 0 or 1 OWL DL Supports maximum expressiveness without losing computational completeness and decidability of reasoning systems Support the existing Description Logic business segment A class cannot also be an individual or property, a property can not also be an individual or class OWL Full Maximum expressiveness and the syntactic freedom of RDF with no computational guarantees Allow an ontology to augment the meaning of the pre-defined (RDF or OWL) vocabulary A class can be treated simultaneously as a collection of individuals and as an individual in its own right
62 OWL Validator OWL Validator: Web-based or command-line utility Performs basic validation of OWL file OWL Ontology Validator: a "species validator" that checks use of OWL Lite, OWL DL, and OWL Full constructs
63