Week 7: Semantic Web and Semantic Search Arantza Aldea Ontology developed by Dr. David Sutton
Some reading David Amerlan, Google Semantic Search W3org Semantic Web Section Google how search works Google Knowledge Graph Cambridge Semantics: Semantic Search and Semantic Web Simon Penson, The Search Engine Watch Blog
Content How Semantic Search Work Google Knowledge Graph Entity Extraction and the semantic Web Semantic Search and Semantic Web The role of Ontologies on the semantic web How to develop an Ontology
Semantic Search
Summary of Semantic Search steps Crawling Indexing Indexing look for semantic tags to get as much information as as possible about what it is the web about Entity Extraction The knowledge Graph Extraction of the most relevant information from the index Semantic search means Auto-complete of the query Understanding of the meaning of the query Ranking list of results
The Knowledge Graph
Indexing: Semantic Entity Extraction Entity Detection Convert raw data in web pages into a meaningful entity. Sentence analysis and segmentation Handling synonyms Relation Detection Understand the meaning of the words and put them into context Linked data A Semantic Entity is created
Semantic Document Parsing Semantic Tags can be added to Web Pages to facilitate Semantic Entity Extraction Semantic Tools available: XML Microdata RDF, RDFa Ontology Web language (OWL) Google recommends rich snippets using microformats, microdata, and RDFa Rich snippets are a few lines of text that appear every result www.schema.org Microdata are based on RDFa
Query Understanding Understand what the user wants Extract the meaningful words from the query Context of the query User Profile Location Current Trends Ontology has an important role in the semantic search The knowledge graph is based on them
The Semantic Search and the Semantic Web Semantic Search aims to understand the query and returns meaningful information taking into account Context Location User profile Meaning of the words Semantic web is set of technologies used to represent, query and store information Semantic search use some of semantic web technology Schema.org represents a merge between semantic search and semantic web
Semantic Web technologies Linked Data Web of Data (RDF, RDFa) Inference Reasoning about Data through rules OWL, Rule Interchange Format Vocabularies Ontology (RDF/OWL, Turtle) Queries SPARQL
Ontology Development What is an ontology and how to build one
What is an Ontology? Some definitions: An ontology is a data model that represents a set of concepts within a domain and the relationships between those concepts (wikipedia). An ontology is a specification of a conceptualisation (Gruber 1993). [Ontologies are] “explicit formal specifications of the terms in the domain and relations among them” (Noy and McGuinness)
How do we create an ontology. Various languages and development environments have been created in order to allow the formal expression of ontologies. These include: Frame based systems (e.g. Protégé Frames) Web Based Systems (e.g.OWL) Systems based on logic (e.g. OWL-DL). Note that these terms are not mutually exclusive. For instance OWL-DL is a web based language that uses a particular kind of logic known as description logic.
What Does An Ontology Describe? Concepts (classes). Properties of concepts and relationships between concepts. Constraints on properties and relationships. Instances (sometimes but not always)
Semantic Languages XML RDF RDFS OWL From HTML to XML XML document structure Relationships to other web knowledge reps RDF Universal Resource Identifiers RDF statements Representation of sets of RDF statements Literals, types and datatypes containers RDFS OWL
An HTML Document <html> <head> <title>David Sutton's Contact Details</title> </head> <body> <h1>Contact Details</h1> <p>My name is <strong>David Sutton</strong>. My office is <i>WHE – T2.13</i> , and my telephone number is <em>+44 (0) 1865 484576</em>. </body> </html> Angle brackets <..> enclose HTML Tags.
Markup For Structure. HTML marks up text in order to help the browser display that document to a human user An HTML document can only contain predefined HTML tags. XML is one of a series of technologies that allow us to mark up text in ways that allow it to be processed by computers. It is extensible in that you can define your own tags.
An XML Document <?xml version = "1.0"> <contact staffid="p0012345"> <name>David Sutton</name> <address>WHE–T2.13</address> <phone>+44 (0) 1865484576</phone> </contact> Contains user-defined tags
Technologies for Representing Data and Knowledge on the Web The Web Ontology Language (OWL) provides a rich language for describing properties and classes. XML RDF RDFS OWL RDF Schema (RDFS) allows entities to be arranged into classes and subclasses. RDF Schema (RDFS) allows entities to be arranged into classes and subclasses The Resource Description Framework (RDF) allows data to be described in terms of entities, attributes, and relationship. The Extensible Markup Language (XML) allows users to define their own tags to describe structured data.
Valid Documents. XML provides two mechanisms for defining the structure of an XML document, i.e. what tags are allowed and what attributes and content such tags can have. A Document Type Definition (DTD) defines the structure using an EBNF grammar. An XML Schema uses XML itself to define the structure. A document is valid if, and only if, it conforms to the structure defined by its schema or DTD.
Example Document invalid because DTD does not define a “kontact” tag, <!-- contact.dtd --> <!ELEMENT contact (name)> <!ELEMENT name (#PCDATA)> Document invalid because DTD does not define a “kontact” tag, <?xml version = "1.0"?> <!DOCTYPE contact SYSTEM "contact.dtd"> <kontact> <kname>Peter Marshall </kname> </kontact> 35 mins to here. N.B. This example uses a DTD rather than a schema solely because it was easier to fit it on the slide that way!
From XML to RDF XML allows us to make statements about entities and their properties. It would be useful to be able to: Indicate that two separate statements are describing the same entity. Use properties to describe relationships between entities. Make statements about properties themselves, e.g. give them natural language descriptions. The Resource Description Framework (RDF) provides us with mechanisms for doing these things.
RDF RDFa can be used to describe (add meaning to) specific (HTML) information on the web. RDFa is added to web pages in order to make them understandable for computers and people. By adding RDFa, browsers, search engines, and other software can understand more about the pages, and thus better results. RDF are used to talk about entities and their properties: Person, Product, events RDF requires a RDF Schema that defines the entities and their properties To indicate that a part of a web page is an instance of an entity, attributes in the html tag span and div are used
RDF example <div xmlns:v=http://rdf.data-vocabulary.org/# typeof="v:Person"> My name is <span property="v:name“>Muhammad Younas</span> People call me <span property="v:lastname">Younas</span> Here is my homepage: <a href=" http://cms.brookes.ac.uk/staff/MYounas" rel="v:url"> http://cms.brookes.ac.uk/staff/MYounas</a> I live in Oxford, Oxfordshire, and work as a <span property="v:title">lecturer</span> at <span property="v:affiliation“>Oxford Brookes University</span> </div> Extracted from week 5 slides
From RDF to RDF Schema RDF allows us to represent resources and their properties. What we cant do yet is to … Group resources into classes. Specify the properties that we expect resources of a given class to have. Arrange classes into inheritance hierarchies. These facilities are provided by RDF Schema (RDFS).
Properties rdfs:Property exs:Person rdf:type rdfs:Class rdf:type A Property of a class is described by a resource whose type is the predefined URI rdfs:Property. To indicate that a property applies to a particular class we use the predicate rdfs:domain. To indicate that the values of a property are instances of a particular class or datatype, we use the predicate rdfs:range. rdfs:Property exs:Person rdf:type rdfs:Class rdf:type rdfs:range exs:author rdf:type exs:Book rdfs:domain
Subclasses We use the predicate rdfs:subClassOf to indicate that one class is a subclass of another. rdfs:Class rdf:type rdf:type rdf:type exs:Vehicle rdf:subClassOf rdf:subClassOf exs:Car exs:Bus
Interpreting RDF Schema Although RDF Schema resemble the type systems of Object Oriented Programming Languages, there are some important differences. The most important of these is that OO class definitions are interpreted as constraints on objects, whereas RDF schema can be interpreted more freely. For instance suppose that an RDF Schema defines the range of property exs:author to be the class exs:Person. An application processing this RDF can intepret this as either: A constraint: the author of a book must be explicitly declared to be a Person. A rule of inference: the application will infer that the author of a book is a Person, even if no explicit statement to that effect has been made.
OWL: The Web Ontology Language. RDFS and RDF allow us to define classes, properties, and instances. However it would be useful to be able to make more complex statements about classes and properties, and draw inferences from these statements. Examples: Restrict the cardinality of a property: e.g . “ An instance of class Person has a mother property with exactly one value”. State that one property is the inverse of another: e.g. “if X is the parent of Y then Y is the child of X”. Indicate transitivity of properties: e.g. “if X is the ancestor of Y and Y is the ancestor of Z then X is the ancestor of Z”. The Web Ontology Language (OWL) defines RDF resources that allow us to make such statements, and defines the inferences that can be drawn from them.
Species of OWL OWL comes in three varieties. OWL Full: Any set of RDF statements can be interpreted as an OWL FULL ontology. However there is no guarantee that any reasoning software will be able to work out all the inferences that can be drawn from an OWL FULL ontology. OWL DL: Only some RDF statements are allowed in an OWL DL ontology. These restrictions mean that in principle it is possible to construct reasoning software that correctly processes all inferences of the ontology. OWL Lite: An entry level version of OWL that provides simple classification and constraint facilities.
Which to use? OWL Lite Vs OWL DL OWL DL Vs OWL Full Both can have full reasoning support so do you need the more expressive constructs provided by OWL DL OWL DL Vs OWL Full If you require full reasoning support choose OWL DL If you require the meta-modelling facilities of RDF Schema (defining classes of classes, attaching properties to classes) choose OWL Full
OWL Lite Provides… RDF Schema Features Class (Thing, Nothing) rdfs:subClassOf rdf:Property rdfs:subPropertyOf rdfs:domain rdfs:range individual
Class <Declaration> <Class IRI="#Monster"/> A class defines a group of individuals that belong together because they share some properties. There is a built in most general class: Thing There is a built in most specific class: Nothing Here we define some root classes (owl:class is a subclass of rdf:class) Owl/xml Xml/rdf <owl:Class rdf:about="http://www.brookes.ac.uk/p0073862/Ontology1.owl#Monster"> </owl:Class> <Declaration> <Class IRI="#Monster"/> </Declaration>
subClassOf Used to create hierarchies. <SubClassOf> <Class IRI="#Dragon"/> <Class IRI="#Monster"/> </SubClassOf> <owl:Class rdf:about="http://www.brookes.ac.uk/p0073862/Ontology1.owl#Humanoid"> <rdfs:subClassOf rdf:resource="http://www.brookes.ac.uk/p0073862/Ontology1.owl#Monster"/> </owl:Class>
Defining Classes in Protégé
Disjoint Classes In OWL classes are not assumed to be disjoint. In other words a given individual can be a member of two classes that are not related by inheritance (e.g. it could be both a Dragon and a Humanoid). We can make them disjoint by using the Disjoint widget in Protégé to add disjointWith statements.
Disjoint Classes Disjoints Widget
Disjoint Classes <DisjointClasses> <Class IRI="#Dragon"/> <Class IRI="#Humanoid"/> </DisjointClasses> <owl:Class rdf:about="http://www.brookes.ac.uk/p0073862/Ontology1.owl#Dragon"> <rdfs:subClassOf rdf:resource="http://www.brookes.ac.uk/p0073862/Ontology1.owl#Monster"/> <owl:disjointWith rdf:resource="http://www.brookes.ac.uk/p0073862/Ontology1.owl#Humanoid"/> </owl:Class>
Properties Two main types of property Object properties link an individual to another individual. Datatype properties link an individual to a literal value (e.g. an integer or a string) <Declaration> <ObjectProperty IRI="#isWeapon"/> </Declaration> <DataProperty IRI="#hasIntelligence"/> </Declaration>
Properties
Characteristics of Properties Properties in OWL can have a variety of different characteristics: A property can have an inverse (as in frame based ontologies). A property can be functional , that is to say it can only have one value for a given individual. For instance a student can have only one surname. A property can be inverse functional that is to say that its inverse is functional. A property can be transitive in the sense that if A is related to B and B is related to C then A is related to C. For example if Babs is the sister of Joy, and Teddie is the sister of Babs than Teddie is also the sister of Joy.
Characteristics of Properties
Characteristics of Properties <owl:ObjectProperty rdf:ID="hasLair"> <owl:inverseOf rdf:resource="#isLairOf"/> </owl:ObjectProperty> <owl:ObjectProperty rdf:ID="hasSibling"> <rdf:type rdf:resource="&owl;TransitiveProperty"/>
Property Restrictions We can place restrictions on the individuals that belong to a class by imposing conditions on the values of their properties. These restrictions include: Existential restrictions, where we insist that a certain property must have at least one value of a certain kind. Universal restrictions, where we insist that all the values of a certain property must be of a certain kind.
Existential Restrictions We can to impose a restriction that all dragons must have at least one lair by Creating an anonymous class consisting of all individuals with at least one lair, and Imposing a restriction that the named class Dragon is a subclass of this anonymous class.
Existential Restrictions
Existential Restrictions <owl:Class rdf:ID="Dragon"> <rdfs:subClassOf rdf:resource="#Monster"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasLair"/> <owl:someValuesFrom rdf:resource="#Lair"/> </owl:Restriction> </rdfs:subClassOf> <owl:disjointWith rdf:resource="#Humanoid"/> </owl:Class>
Universal Restrictions We can create a class Warrior, and impose a restriction that the only possessions that a Warrior may have are his weapons by Creating an anonymous class consisting of all individuals whose only possessions are weapons, and Making the Warrior class a subclass of this anonymous class.
Universal Restrictions Superclasses of Warrior
Universal Restrictions <owl:Class rdf:ID="Warrior"> <rdfs:subClassOf rdf:resource="#Humanoid"/> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasPossession"/> <owl:allValuesFrom rdf:resource="#Weapon"/> </owl:Restriction> </rdfs:subClassOf> </owl:Class>
Summary of XML XML can be used to mark up documents in ways that reveal their structure to applications that process them. XML documents are structured by dividing them into elements. Each element begins with a start tag and ends with an end tag. The start tag of an element may define values for attributes of that element. An XML document contains a single root element. An XML document is well-formed if it conforms to the syntax of XML. It is valid if it uses tags in ways that are allowed by an associated DTD or Schema. XSLT stylesheets may be used to convert XML documents into other forms (e.g. into HTML documents). RDF, RDFS, and OWL are technologies that use XML to transmit information over the Web in ways that allow this information to be processed by computers.
Summary of RDF An RDF document contains a set of statements. Each statement has a subject, a predicate, and an object. Statements can be represented as graphs, as triples, and in XML. RDF uses URIs to identify entities (which it refers to as “resources”). The object of an RDF statement can be a URI, a typed or plain literal, or a container. A container can be a Bag, a Seq, or an Alt.
Summary of RDFS and OWL RDFS allows us to define classes of resources, to indicate subclass and instance relationships and to describe properties of classes. OWL allows us to make statements about classes from which reasoning software may draw inferences. OWL comes in three varieties, OWL Full, Owl DL, and OWL Lite.