SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented.

1 SemSearch: A Search Engine for the Semantic Web Yuangui Lei, Victoria Uren, Enrico Motta Knowledge Media Institute The Open University EKAW 2006 Presented by Jungyeon, Yang

2 Copyright  2008 by CEBT Outline  Research background  SemSearch overview  Query interface  Search process  Implementation & examples  Conclusions

3 Copyright  2008 by CEBT Research background  Semantic search: extending traditional search with the semantic web technology Exploiting the explicit meaning of documents (i.e., ontology-based metadata)  Current semantic search tools Form-based, e.g., SHOE, Magnet QA-based, e.g., AquaLog, ORAKEL Keyword-based, e.g., TAP, Squiggle, DOSE

4 Copyright  2008 by CEBT Support for ordinary end users  Form-based tools Forms are intuitive Issues: knowledge overhead; scalability  QA-based tools Easy to use Issue: heavy NLP.  Keyword-based tools Easy to post queries; quick response Issue: typically one keyword only; general knowledge of the problem domain required

5 Copyright  2008 by CEBT The goal of our search engine  Hide the complexity of semantic search from end users: Low barrier to access: easy to post queries – Avoiding the form-based routine Dealing with relatively complex queries – Supporting multiple keywords Precise and self-explanatory results: – Results satisfy user queries – Results are easy to understand Quick response – Avoiding linguistic processing

6 Copyright  2008 by CEBT SemSearch Architecture Google-like User Interface Layer Semantic Query Layer Formal Query Language Layer (SPARQL, SERQL, etc.) Semantic Data Layer End users  Semantic entity indexing engine  Semantic entity search engine  Formal query construction engine  Query engine  Ranking engine  Google-like query interface Text Search Layer

7 Copyright  2008 by CEBT The Google-like query interface  Extending the traditional keyword search languages by allowing the specification of: The queried subject (the type of expected search results) The combination of keywords  Three operations are used: Operator “:” captures the query subject “and”/”or” specifies the combination of keywords  Query formats: One keyword: finding entities that have relations with the keyword match Multiple keywords: “subject:keyword1 and/or keyword2 and/or keyword3”, e.g., “ ”,  Advantages: More flexible than form-based query interface More powerful than state-of-art keyword-based semantic search interfaces

8 Copyright  2008 by CEBT The search process  Step1: making sense of the user queries  Step2: translating user queries into formal queries  Step3: Querying the back-end semantic data repository  Step4: Ranking the querying results

9 Copyright  2008 by CEBT Making sense of user queries  Finding out the semantic meaning of keywords Class, (e.g., the keyword “phd students”) Relation, (e.g., “author”) Instance, (e.g., “Enrico”, ”KMi director”)  Method: text search labels (rdfs:label) Short literals also used in the case of instances matching – When searching for “KMi director”, the instances can be picked up.  Two components in the search engine The semantic entity index engine The semantic entity search engine

10 Copyright  2008 by CEBT Translating user queries into formal queries  The search engine takes as input the semantic matches of user search terms  The search engine takes outputs an appropriate formal query according to the semantic meanings of keywords  One user query  Each keyword  multiple matches  SEARCH ENGINE  multiple formal queries.

11 Copyright  2008 by CEBT Simple user queries  There are only two keywords involved:  Fixed number of combination types Subject matchKeyword matchExample Class Property Instance InstanceProperty Instance PropertyInstance Property The SeRQL query templates are defined

12 Copyright  2008 by CEBT select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Is} R {Ik} union select {Is}, {R}, {Ik} from {Is} rdf:type {Cs}, {Ik} rdf:type {Ck}, {Ik} R {Is} A template example  Pattern: Subject -> Class Cs; Keyword -> Class Ck  Results: associated with exploratory links.  Example: news stories about phd students  A simplified template in Sesame SeRQL:

13 Copyright  2008 by CEBT Complex user queries   Instances of the subject which either have relations with all the keywords or have relations with some of the keywords.  Operational problem the number of combination gets big when there are many keywords involved and there are lots of matches for each keyword.  Rules for combination reduction: Only considering the subject keyword as class entities Choosing the closest matches to the keyword as possible Choosing the most specific class match among the class matches.

14 Copyright  2008 by CEBT Query construction  In SeRQL Three building blocks – Head block: what needs to be retrieved, i.e., – Body block: how to retrieve the triples – Condition block: conditions need to be satisfied Union block : in order to cover bidirectional relations SELECT DISTINCT label(ArtefactTitle), MuseumName FROM {Artefact} arts:created_by {} arts:first_name {"Rembrandt"}, {Artefact} arts:exhibited {} dc:title {MuseumName}, {Artefact} dc:title {ArtefactTitle} WHERE isLiteral(ArtefactTitle) AND lang(ArtefactTitle) = "en" AND label(ArtefactTitle) LIKE "*night*"

15 Copyright  2008 by CEBT Query construction algorithm No Adding query blocks for class-property relations retrieval Yes Adding query blocks for class-class relations retrieval Yes Adding blocks for class-instance relations retrieval Has keyword match? Yes Initializing the query blocks Composing queries using the blocks No Is class? Is property? Is instance? Yes No

16 Copyright  2008 by CEBT Simple query example

17 Copyright  2008 by CEBT Refinement support

18 Copyright  2008 by CEBT Complex query example

19 Copyright  2008 by CEBT Conclusions  A keyword-based semantic search engine has been developed Google-like query interface Supporting relatively complex queries Providing relatively quick response

20 Copyright  2008 by CEBT Opinions  Pros Google-like query interface (intuitive) Supporting relatively complex queries  Cons Limitation of the target data form. (RDF) Ranking Simple semantic matching  Issues Finding out the semantic meaning of keyword Storage modeling Strategy of the semantic match between keyword and semantic entity

