Semantic Web The Story So Far Ian Horrocks <ian.horrocks@comlab.ox.ac.uk> Oxford University Computing Laboratory
The Semantic Web
What is it? “… a consistent logical web of data …” Web “invented” by Tim Berners-Lee (amongst others) (Conceptual) simplicity of web has contributed to success, but is also a limiting factor Tim has ambitious goals for future of the web Objective is to overcome existing limitations This vision of the future of the Web has become known as the Semantic Web “… a consistent logical web of data …” “… information is given well-defined meaning …” - Even if he didn’t invent it, TBL was a key figure in the development of the Web. - This is what he originally intended, and what he would like to see the web become.
Why do we want it? Many tasks are difficult or impossible using existing web: Concrete motivation for why we want the semantic web - some tasks are hard work using existing web, e.g: Find images of FvH, PFPS and Alan Rector Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois
Why do we want it? Many tasks are difficult or impossible using existing web: Complex queries involving background knowledge Find information about “animals that use sonar but are neither bats nor dolphins” Locating information in data repositories Travel enquiries Prices of goods and services Results of human genome experiments Finding and using “web services” Given DNA sequence, identify genes, determine proteins they produce, and hence biological processes they control , e.g., Barn Owl Workflows widely used in eScience; hard to (automatically) find relevant services Can think of final bullet as a sort of grand challenge; web equivelent of robot football
What is the Problem? Consider a typical web page: Markup consists of: rendering information (e.g., font size and colour) Hyper-links to related content Semantic content is accessible to humans, but not (easily) to computers… - Information in natural language, and even embedded in images. - And if this is a problem for web pages, what about hypermedia, databases and other web resources?
How Will It Work? Add semantic annotations to web resources NL annotations (possibly with rendering annotation) already associated with images (only way google can find them) Augment NL with semantic annotation. Dr. Alan Rector, Professor of Computer Science, University of Manchester Dr. <Person>Alan Rector</Person>, <Job>Professor of Computer Science</Job>, University of Manchester Rev. <Person>Alan M. Gates</Person>, <Job>Associate Rector</Job> of the Church of the Holy Spirit, Lake Forest, Illinois Rev. Alan M. Gates, Associate Rector of the Church of the Holy Spirit, Lake Forest, Illinois
How Will It Work? Now... that should clear up a few things around here NL annotations (possibly with rendering annotation) already associated with images (only way google can find them) Augment NL with semantic annotation.
Giving Semantics to Annotations Agree on meaning of a set of annotation tags E.g., Dublin Core Limited flexibility and extensibility Limited number of things can be expressed Agree on language used to define meanings E.g., an ontology language Flexible and extensible New terms can be formed by combining existing ones Meaning (semantics) of such terms is formally specified Big Apple is a large fruit or a city?
The Web Ontology Language OWL Development and standardisation of (Semantic) Web ontology languages has been responsible for huge increase in the development and application of ontologies.
Web Ontology Language OWL Semantic Web led to requirement for a “web ontology language” set up Web-Ontology (WebOnt) Working Group WebOnt developed OWL language OWL based on earlier languages RDF, OIL and DAML+OIL OWL now a W3C recommendation (i.e., a standard) OWL is a family of 3 languages: OWL Lite, OWL DL and OWL Full OIL, DAML+OIL and OWL (DL & Lite) based on Description Logics Has facilitated development of wide range of high quality tools & infrastructure OWL now language of choice in many applications
What Are Description Logics? A family of logic based Knowledge Representation formalisms Descendants of semantic networks and KL-ONE Describe domain in terms of concepts (classes), roles (properties, relationships) and individuals Operators allow for composition of complex concepts Names can be given to complex concepts, e.g.: Object oriented style of modelling HappyParent ´ Parent u 8hasChild.(Intelligent t Athletic) HappyParent ´ Parent u 8hasChild.(Intelligent t Athletic) HappyParent ´ Parent u 8hasChild.(Intelligent t Athletic) HappyParent ´ Parent u 8hasChild.(Intelligent t Athletic) HappyParent ´ Parent u 8hasChild.(Intelligent t Athletic)
Why (Description) Logic? OWL exploits results of 15+ years of DL research Well defined (model theoretic) semantics Most DLs are subsets of C2, i.e., decidable fragments of FOL
Why (Description) Logic? OWL exploits results of 15+ years of DL research Well defined (model theoretic) semantics Formal properties well understood (complexity, decidability) I can’t find an efficient algorithm, but neither can all these famous people. [Garey & Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979.]
Why (Description) Logic? OWL exploits results of 15+ years of DL research Well defined (model theoretic) semantics Formal properties well understood (complexity, decidability) Known reasoning algorithms
Why (Description) Logic? OWL exploits results of 15+ years of DL research Well defined (model theoretic) semantics Formal properties well understood (complexity, decidability) Known reasoning algorithms Implemented systems (highly optimised) Pellet KAON2 CEL
Class/Concept Constructors Concept can be thought of as a FOL formula with one free variable
Knowledge Base / Ontology Axioms
OWL RDF/XML Exchange Syntax E.g., Parent u 8hasChild.(Intelligent t Athletic): <owl:Class> <owl:intersectionOf rdf:parseType=" collection"> <owl:Class rdf:about="#Parent"/> <owl:Restriction> <owl:onProperty rdf:resource="#hasChild"/> <owl:allValuesFrom> <owl:unionOf rdf:parseType=" collection"> <owl:Class rdf:about="#Intelligent"/> <owl:Class rdf:about="#Athletic"/> </owl:unionOf> </owl:allValuesFrom> </owl:Restriction> </owl:intersectionOf> </owl:Class>
Ontology based Information Systems Similar to relational databases Ontology ¼ schema; instances ¼ data Some important (dis)advantages (Relatively) easy to maintain and update schema Both schema and data are “self organising” Query answers reflect both schema and data Able to answer both intensional and extensional queries Semantics may be counter-intuitive or even inappropriate Open -v- closed world; axioms -v- constraints Query answering (logical entailment) much more difficult Can lead to scalability problems
Ontology based Information Systems Very useful, but don’t expect miracles! Similar to relational databases Ontology ¼ schema; instances ¼ data Some important (dis)advantages (Relatively) easy to maintain and update schema Both schema and data are “self organising” Query answers reflect both schema and data Able to answer both intensional and extensional queries Semantics may be counter-intuitive or even inappropriate Open -v- closed world; axioms -v- constraints Query answering (logical entailment) much more difficult Can lead to scalability problems
Ontologies and Reasoning Development and standardisation of (Semantic) Web ontology languages has been responsible for huge increase in the development and application of ontologies.
Support for Ontology Engineering Developing and maintaining quality ontolgies is very challenging Users need tools and services, e.g., to help check if ontology is: Meaningful — all named classes can have instances
Support for Ontology Engineering Developing and maintaining quality ontolgies is very challenging Users need tools and services, e.g., to help check if ontology is: Meaningful — all named classes can have instances Correct — captures intuitions of domain experts Banana split is a kind of fruit sundae?
Support for Ontology Engineering Developing and maintaining quality ontolgies is very challenging Users need tools and services, e.g., to help check if ontology is: Meaningful — all named classes can have instances Correct — captures intuitions of domain experts Minimally redundant — no unintended synonyms Banana split Banana sundae
Support for Ontology Engineering Range of new “non-standard” services supporting, e.g.: Modular design and integration What is the effect of merging O2 into O1? In general, check that O1 [ O2 ² C iff O1 ² C for any concept C constructed using vocabulary occurring in O1 Module Extraction Extract a (small) module from O capturing all “relevant” information about some vocabulary V In general, find O’ µ O s.t. O’ ² C iff O ² C for any concept C constructed using terms from V Bottom-up design Find a (small and specific) concept describing a set of individuals In general, find most specific C s.t. O ² C(i1) Æ … Æ C(in) Where C may be “small” and/or in a sub-language (of O)
Support for Ontology Engineering Range of new “non-standard” services supporting, e.g.: Error diagnosis and repair
Support for Query Answering In an Ontology based Information System (OIS), Query answering ¼ computing logical entailment Reasoner needed in order to answer queries, e.g.: C is a sub-class of D iff O ² 8 x . C(x) ! D(x) a is an instance of C iff O ² C(a) OIS with no reasoner ¼ DBMS with no query engine
Example Applications
e-Science E.g., for “in silico” investigations and “hypothesis testing” Comparing data (e.g., on proteins) to (model of) biological knowledge Characteristics of proteins captured in an ontology O Goal is to identify protein instances based on characteristics Graphic showing protein functional domains (sequences of amino acids?); the identifying characteristics of different proteins.
e-Science E.g., for “in silico” investigations and “hypothesis testing” Comparing data (e.g., on proteins) to (model of) biological knowledge Characteristics of proteins captured in an ontology O Goal is to identify protein instances based on characteristics Equivalent to answering queries of form: O ² P(i)? for protein P and instance i Result may be discovery of new kinds of protein And these may be potential drug targets if unique to a pathenogen Result may also be discovery of errors in model Which may reflect gaps/errors in existing knowledge Graphic showing protein functional domains (sequences of amino acids?); the identifying characteristics of different proteins.
Healthcare UK NHS has a £6.2 billion “Connecting for Health” IT programme Key component is Care Records Service (CRS) “Live, interactive patient record service accessible 24/7” Patient data distributed across local centres in 5 regional clusters, and a national DB Detailed records held by local service providers Diverse applications support radiology, pharmacy, etc Applications exchange messages containing “semantically rich clinical information” Summaries sent to national database SNOMED-CT ontology provides common vocabulary for data Clinical data uses terms drawn from ontology
SNOMED Over 400,000 concepts
SNOMED Over 400,000 concepts Schema only — no instances Language used is a (well known) fragment of OWL NHS version extended with 1,000s of additional classes OWL reasoner (FaCT++) used to classify and check ontology Currently takes ¼ 4 hours 180 missing subClass relationships were found, e.g.: Periocular_dermatitis subClassOf Disease_of_face Fibrin_measurement subClassOf Coagulation_factor_assay
SNOMED Vocabulary is extensible at point of use: “post coordination” Users (e.g. clinicians) may add/define new vocabulary Terminology service (reasoner) used to insert in ontology Typical new term: almond_allergy ´ “allergy caused_by almond” OWL reasoner (FaCT++) used to classify new term Takes <10 ms Classified as a kind of “nut allergy” Clearly of crucial importance to recognise patients with allergy caused by almond as kinds of patient with nut allergy
Columbia Presbyterian Medical Center Ontology used in analysis of results in path lab OWL reasoner used to check this ontology Several errors and omissions found that: “would have led to missed test results”
Recent Developments
Improving Scalability Optimisation techniques Improve performance of DL reasoners, e.g., [Tsarkov et al, JAR, 2007] New reasoning techniques Reduction to disjunctive Datalog [Motik et at, KR-04] Hybrid DL-DB systems [Horrocks et al, CADE-05] Hypertableau based algorithms [Motik et al, CADE-07] Polynomial time algorithms for sub-ALC logics Graph based techniques for EL+ [Baader et al, IJCAI-05] Database techniques for DL-Lite [Calvanese et al, AAAI-05]
Extending Tools and Infrastructure Editors/environments Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
Extending Tools and Infrastructure Editors/environments Oiled, Protégé, Swoop, TopBraid, Ontotrack, … Reasoning systems Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, … Pellet KAON2 CEL
Extending Tools and Infrastructure Editors/environments Oiled, Protégé, Swoop, TopBraid, Ontotrack, … Reasoning systems Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, … Design methodologies Modularity, foundational ontologies, etc. Entity Substantial Quality Event Achievement Stative Accomplishment Perdurant Endurant
Increasing Expressive Power Database style keys [Lutz et al, JAIR 2004] Rule language extensions W3C RIF WG (see http://www.w3.org/2005/rules/) First order extensions (e.g., SWRL) [Horrocks et al, JWS, 2005] Hybrid language extensions, e.g., [Eiter et al, KR-04; Motik et al, ISWC-04; Rosati, JoWS, 2005] LP/F-Logic/Common Logic [Chen et al, JLP, 1993; de Bruijn et al, WWW-05] Other extensions Temporal, Fuzzy, … OWL 1.1 extension to OWL Clear that OWL isn’t expressive enough for all applications; users always want/need more
OWL 1.1 Is an extension of OWL Addresses deficiencies identified by users and developers (at OWLED workshop) Is based on more expressive DL: SROIQ (OWL is based on SHOIN) W3C working group now chartered Will develop recommendation based on existing member submission Already supported by popular OWL tools Protégé, Swoop, TopBraid, FaCT++, Pellet
What’s New in OWL 1.1? Four kinds of features: More expressive logic qualified cardinality restrictions, e.g.: ObjectMinCardinality(2 friendOf hacker) property chain inclusion axioms, e.g.: SubObjectPropertyOf(SubObjectPropertyChain(parent brother) uncle) local reflexivity restrictions, e.g.: ObjectExistsSelf(likes) [for narcissists] reflexive, irreflexive, symmetric, and antisymmetric properties, e.g.: ReflexiveObjectProperty(knows); IrreflexiveObjectProperty(husbandOf) disjoint properties, e.g.: DisjointObjectProperties(childOf spouseOf)
What’s New in OWL 1.1? Four kinds of features: More expressive datatypes User-defined datatypes using facets from XML Schema Datatypes, e.g.: SubClassOf(Adult DataSomeValuesFrom(age DatatypeRestriction(xsd:integer minInclusive "18"^^xsd:integer)) Simple relationships between values of functional data-valued properties, e.g.: DataSomeValuesFrom(shoeSize IQ greaterThan)
What’s New in OWL 1.1? Four kinds of features: Metamodelling and annotations Names can be used as any or all of an individual, a class, or a property Allows for a restricted form of metamodelling (“punning”), e.g.: subClassOf(SnowLeopard BigCat) ClassAssertion(SnowLeopard EndangeredSpecies) Annotations of axioms as well as entities ClassAssertion(Comment(“source: WWF”) SnowLeopard EndangeredSpecies)
What’s New in OWL 1.1? Four kinds of features: Syntactic sugar (make things easier to say) Disjoint unions, e.g.: DisjointUnion(Element Earth Wind Fire Water) Negative assertions, e.g.: NegativeObjectPropertyAssertion(Ian hasChild Mary) NegativeDataPropertyAssertion (Ian hasAge 21)
Tractable Fragments OWL defines only one fragment (OWL Lite) And it isn’t very tractable! OWL 1.1 defines several different fragments with useful computational properties E.g., reasoning complexity in range LOGSPACE to PTIME Smaller fragments implementable using RDBs
Tractable Fragments
Summary Semantic Web aims to make web content more accessible to automated processes Adds semantic annotations to web resources OWL Ontologies provide vocabulary for annotations Terms have well defined meaning OWL now being used in a wide range of applications e-Science, medicine, geography, geology, … Reasoning enabled tools are of crucial importance For both design and deployment of ontologies Active research area Expressive power, scalability, methodologies, tools, …
Thank you for listening
Thank you for listening FRAZZ: © Jeff Mallett/Dist. by United Feature Syndicate, Inc. Any questions?