Department of Artificial Intelligence Ontologies Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt
Last Lecture Introduction to Description logics and its relevance to knowledge representation Problems? CSA 3210 Ontologies ©
Lecture Outline What is an Ontology? Components Ontology development CSA 3210 Ontologies ©
What is an Ontology? An ontology is an explicit specification of a conceptualization (Gruber 1993) Ontologies are defined as a formal specification of a shared conceptualization (Borst 1997) Studer explains: Conceptualization: refers to an abstract model of some phenomenon in the world identified by concepts of that phenomenon Explicit: both type of concepts and constraints on these concepts are explicitly defined Formal: refers to the fact that the ontology should be machine-readable Shared: reflects the notion that the ontology captures consensual knowledge, accepted by a group. CSA 3210 Ontologies ©
Why develop an Ontology? To share common understanding of the structure of information among people or software agents facilitating information extraction and aggregation by software agents To enable reuse of domain knowledge reuse, extend and integrate existing ontologies To make domain assumptions explicit explicit definition of such knowledge (instead of hard-coding it) makes it easier to change To separate domain knowledge from operational knowledge E.g. use different ontologies but same algorithm (to solve different problems) To analyse domain knowledge possible once a declarative specification of the terms is available (valuable when reusing and extending ontologies) CSA 3210 Ontologies ©
Existing Domain-Specific Ontologies Medical domain: Cancer ontology from the National Cancer Institute in the United States Cultural domain: Art and Architecture Thesaurus (AAT) with 125,000 terms in the cultural domain Union List of Artist Names (ULAN), with 220,000 entries on artists Iconclass vocabulary of 28,000 terms for describing cultural images Geographical domain: Getty Thesaurus of Geographic Names (TGN), containing over 1 million entries CSA 3210 Ontologies ©
Integrated Vocabularies Merge independently developed vocabularies into a single large resource E.g. Unified Medical Language System integrating 100 biomedical vocabularies The UMLS meta-thesaurus contains 750,000 concepts, with over 10 million links between them The semantics of a resource that integrates many independently developed vocabularies is rather low But very useful in many applications as starting point CSA 3210 Ontologies ©
Upper-Level Ontologies Some attempts have been made to define very generally applicable ontologies Describe very general concepts that are common across the domains Provide general notions under which all the terms in existing ontologies should be linked OpenCyc, with 60,000 assertions on 6,000 concepts DOLCE: a Descriptive Ontology for Linguistic and Cognitive Engineering Suggested Upper Merged Ontology (SUMO) CSA 3210 Ontologies ©
Linguistic Resources Describes semantic constructs rather then model specific domains Used mostly in nlp They are bound to the semantics of grammatical units (words, adjectives etc) E.g. WordNet (focus on word meaning) CSA 3210 Ontologies ©
Main Components of an Ontology Concepts or Classes related to a domain of discourse Slots (Roles or Properties) properties of concepts describing various features and attributes of concepts Facets (Slot/Property Restrictions ) restrictions on slots/properties Instances Individual instances of classes (constitute a knowledge base) CSA 3210 Ontologies ©
Classes Represent concepts in a particular domain E.g. locations (cities, villages), lodgings (hotels, hostels), means of transportation (planes, trains) Classes are organised in taxonomies through which inheritance mechanisms can be applied E.g. a taxonomy of travel packages (economy, business) Classes can represent both abstract and specific concepts abstract (beliefs, intentions) specific (people, computers) CSA 3210 Ontologies ©
Properties or Slots Describe properties of classes and instances E.g subClassOf(Hotel,Lodging) for classification They represent binary relations E.g. arrivalPlace(flight, location) Slots are inherited by subclasses E.g. arrivalPlace slot can be applied to the class Travel but it also applies to its subClass Flight CSA 3210 Ontologies ©
Facets or Property Restrictions Facets describe the value type, allowed values, cardinality, domain and range of slots slot:flightNumber domain: Flight range: string cardinality:1 CSA 3210 Ontologies ©
Building Ontologies No one correct way to model a domain Alternatives will always exist Best solution depends on application and anticipated extensions Development is an iterative process Start with a rough draft, then revise and refine, filling details along the way Concepts should be close to objects and relationships in the domain Most likely to be nouns (objects) or verbs (relationships) in sentences that describe the domain CSA 3210 Ontologies ©
Ontology Development Noy, McGuinness method: 7 steps identified Determine the domain and scope of the ontology Consider reusing existing ontologies List important terms in the ontology Define the classes and class hierarchy Define the properties of the classes- slots Define the facets for the slots Create Instances 8th step: Check for anomalies CSA 3210 Ontologies ©
1. Determine the domain and scope Answer basic questions to identify scope and intended use. What is the domain to be covered? What is the use of the ontology? Which terms are relevant? Define a set of competency questions that the ontology will have to answer. These questions are just a sketch and need not be exhaustive Which characteristics of travelling need to be considered? Which types of lodging to consider? Which transportation methods need to be considered? E.g. domain: travelling, purpose: knowledge used by travel agencies or tourists catalogue system relevant terms: places, lodging, types of lodging, transport, types of transport CSA 3210 Ontologies ©
2. Consider Reuse Look for libraries of reusable ontologies Search the Web or use Swoogle Some libraries: Ontolingua ontology library Open Directory Project (www.dmoz.org) Almost always worth checking if its possible to refine and extend existing ontologies. There is almost always an ontology available from a third party that provides at least a useful starting point for our own ontology CSA 3210 Ontologies ©
3. Enumerate important terms Elaborate the list of important terms List the properties related to these terms Initially the list should be comprehensive without caring about overlap between concepts relations among terms properties related to concepts whether concepts are classes or slots CSA 3210 Ontologies ©
4. Define classes and class hierarchy Several approaches Top-down: start with the most general concepts Bottom-up: start with the most specific A combination: define the more important concepts and then generalise and specialise as required. Organise classes into a hierarchal taxonomy by following the rule: If class A is a superclass of class B, then every instance of B is also an instance of A CSA 3210 Ontologies ©
Classes in the Travelling Domain Lodging Hotel LuxuryHotel Transport Train Class Hierarchy Hotel is-a-kind of Lodging LuxuryHotel is-a-kind of Hotel Train is-a-kind of Transport CSA 3210 Ontologies ©
5. Define Properties After identifying classes from the list of terms and defined them formally, choose the terms that define properties of these classes. These properties become attached to these classes Since properties are inherited by subclasses, these should be attached to the most general class. Examples of properties: departureDate, arrivalDate: attached to class Travel flightNumber: attached to class Airplane hasRating, hasLocation: attached to class Lodging CSA 3210 Ontologies ©
6. Define Facets (Restrictions) Several common facets can be defined Property cardinality Only one departureDate (cardinality =1) Property-value type flightNumber is of type string Domain and range properties transportMeans: Travel (domain), Transport(range) singleFare: Travel(domain), currencyQuantity( range) hasRating: Lodging(domain), Rating(range) hasLocation: Lodging(domain), Location(range) CSA 3210 Ontologies ©
7. Create Instances Defining an individual instance requires Choosing a class Creating an individual instance of that class Filling in the property values A flight instance KM2561 is a type of flight Fare currency for KM2561 is in Euros Departure date for KM2561 is 24-12-2007 Destination of KM2561 is Madrid CSA 3210 Ontologies ©
8. Check Anomalies Detect possible inconsistencies In ontology or ontology+ instances Examples of common inconsistencies incompatible domain and range definitions for transitive, symmetric, or inverse properties cardinality properties requirements on property values can conflict with domain and range restrictions CSA 3210 Ontologies ©
Ontology Mapping Problem: Many Ontologies available that describe the same domain Different Ontologies emphasis different aspects Different vocabulary for same Classes/Properties Different degree of detail Applications that are based on different Ontologies need to collaborate We need to translate one Ontology into another one! Manual mapping has a high precision, but it does not scale! Large Ontologies require an automatic mapping. CSA 3210 Ontologies ©
Ontology Mapping II CSA 3210 Ontologies ©
Ontology Mapping - Techniques tag name mapping: names of XML tags are mapped Problem: Same expressions for different Classes/Properties element content: searches for same/similar content in two Instances Problem: Text in elements has to be unique and expressive structure analyzer: considers XML document as graph, tries to map subgraphs with each other Problem: Is structure sufficient to extract the semantics? Overall problem: Precision Combination of techniques possible, increases precision. Other approaches to increase precision: Semi-automatic mapping: Tools that support an efficient manual mapping by recommending mapping candidates. CSA 3210 Ontologies ©
Ontology Mapping –Related Techniques Ontology merging: Join two different Ontologies into one coherent Ontology. Final Ontology consists of the full, unchanged source Ontologies. Ontology alignment: Generate links between two Ontologies (with usually complementary domains) Ontology integration: Integrate two or more Ontologies into one (with some changes) CSA 3210 Ontologies ©
Merging Example Two ontologies, one about the ear and another about the lungs are merged. Concepts from the second ontology are added to the preferred ontology one at a time. CSA 3210 Ontologies ©
Suggested Reading Jena RDF tutorial, http://jena.sourceforge.net/documentation.html Ontology languages for the Semantic Web, http://www.cs.umbc.edu/771/papers/ieeeIntelligentSystems/webservices/ontologyLanguages.pdf Ontology development 101: A guide to creating your first ontology, OWL guide, http://www.w3.org/TR/owl-guide/ CSA 3210 Ontologies ©
Next Lecture OWL: Web Ontology Language CSA 3210 Ontologies ©
Ontology Building Methodologies Uschold and King method: uses the following process Identify the purpose of the ontology Build it Evaluate it Document it Gruninger and Fox method:uses the following process Identify the main scenarios (ie. possible applications for the ontology) Identify a set of competency questions (to determine scope) Use answers to extract the main concepts, properties, relations and axioms Formally express the knowledge in first order logic CSA 3210 Ontologies ©
Ontology Building Methodologies Methontology: Based on the IEEE standard for software development Three categories of activities are advised Ontology management activities, such as Scheduling, Control, Quality assurance Ontology development oriented activities Pre-development: environment study, feasibility study Development: specification, conceptualisation, formalisation and implementation Post development: Maintenance Ontology support activities Knowledge acquisition, Evaluation, Integration, Merging, Alignment CSA 3210 Ontologies ©
Ontology Languages Ontolingua and KIF (Knowledge Interchange Format) Ontolingua is an ontology language based on KIF and Frame Ontology KIF is based on first-order predicate calculus OCML (Operational Conceptual Modelling Language) Frame-based language with Lisp-like syntax Provides primitives to define classes, relations, functions, axioms and instances Provides also primitives to define rules (for backward and forward chaining) F-Logic (Frame Logic) Integrates features from: OO, frame-based KR and first order logic Features: inheritance, polymorphic types, query methods and encapsulation Can be combined with HiLog to improve ontology reasoning CSA 3210 Ontologies ©