Presentation is loading. Please wait.

Presentation is loading. Please wait.

Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel.

Similar presentations


Presentation on theme: "Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel."— Presentation transcript:

1 Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel Sachs Department of Computer Science and Electronic Engineering University of Maryland Baltimore County CIKM ‘04 ------- Dongmin Shin IDS Lab 2008.10.22

2 Copyright  2008 by CEBT Index  Introduction  Semantic Web Documents  Swoogle Architecture  Finding SWDs  SWD Metadata  Ranking SWDs  Indexing and Retrieval of SWDs  Conclusions  Evaluation and Discussion Center for E-Business Technology

3 Copyright  2008 by CEBT Introduction  Semantic Web documents(SWDs) are characterized by semantic annotation and meaningful references to other SWDs  Conventional search engines do not take advantage of these features  A search engine customized for SWDs is needed Center for E-Business Technology  Swoogle is a crawler-based indexing and retrieval system for the Semantic Web

4 Copyright  2008 by CEBT Introduction  Three Activities of Swoogle Finding appropriate ontologies – Allows users to query for ontologies that contain specified terms anywhere in the document – The ontologies returned are ranked Finding instance data – Enables querying SWDs with constraints on what classes and properties being used/defined by them Characterizing the Semantic Web – Be collecting metadata about the Semantic Web, Swoogle reveals interesting structural properties Center for E-Business Technology  Swoogle automatically discovers SWDs, indexes their metadata and answers queries about it

5 Copyright  2008 by CEBT Semantic Web Documents  SWD A document in a semantic web language that is online and accessible to web users and software agents  Two kinds of documents of SWD SWOs (Semantic Web Ontologies) – Correspond to T-Boxes – Significant proportion of the statements it makes define new terms or extend the definitions of terms defined in other SWDs SWDBs (Semantic Web Databases) – Correspond to A-Boxes – It does not define or extend a significant number of terms – It can introduce individuals and make assertions about them or make assertions about individuals defined in other SWDs Center for E-Business Technology

6 Copyright  2008 by CEBT Swoogle Architecture  SWD discovery Discovers potential SWDs throughout the Web  Metadata creation Caches a snapshot of a SWD and generates objective metadata about SWDs  Data analysis Uses the cached SWDs and the created metadata to derive analytical reports  Interface Providing data services to the Semantic Web community Center for E-Business Technology

7 Copyright  2008 by CEBT Finding SWDs  Google Crawler Using Google Web Service Start with type extensions Append some constraints(keywords) to construct more specific queries, and then combine their results  Focused Crawler Crawls documents within a given website Extension constraint – e.g. not “.jpg” or “.html” Focus constraint – only crawl URLs relative to the given base URL Center for E-Business Technology

8 Copyright  2008 by CEBT Finding SWDs  Web interface Registered users can submit a URL of either a SWD or a web directory  JENA2 based Swoogle Crawler Analyzes the content of a SWD and discovers new SWDs – E.g. Use URIref, owl:imports, rdfs:seeAlso, foaf:Person Center for E-Business Technology

9 Copyright  2008 by CEBT SWD Metadata – Basic Metadata  Language feature Properties describing the syntactic or semantic features of a SWD – Encoding : syntactic encoding of a SWD : RDF/XML, N-TRIPLE, N3 – Language : Semantic Web language used by a SWD : OWL, DAML, RDFS, RDF – OWL Species : language species of a SWD written in OWL : OWL-LITE, OWL- DL, OWL-FULL  RDF statistics Properties summarizing node distribution of the RDF graph Focus on how SWDs define new classes, properties and individuals – SWDB & SWO by ontology-ratio R(foo) Center for E-Business Technology

10 Copyright  2008 by CEBT SWD Metadata – Basic Metadata  Ontology annotation Properties that describe a SWD as an ontology – label.i.e. rdfs:label – comment.i.e. rdfs:comment – versionInfo.i.e. owl:versionInfo and daml:versionInfo Center for E-Business Technology

11 Copyright  2008 by CEBT SWD Metadata – Relations among SWDs  TM/IN Term reference relations between two SWDs – i.e. a SWD is using terms defined by some other SWDs  IM An ontology imports another ontology  EX An ontology extends another – i.e. ontology A defines class AC which has the “rdfs:subClassOf” relation with class BC defined in ontology B  PV An ontology is a prior version of another  CPV An ontology is a prior version of and is compatible with another  IPV An ontology is a prior version of but is incompatible with another Center for E-Business Technology

12 Copyright  2008 by CEBT Ranking SWDs  Random surfing model(PageRank) not appropriate for the Semantic Web – Semantics of links lead to a non-uniform probability of following a particular outgoing link  Rational random surfing model Inter-SWD links into four categories – imports(A,B), uses-term(A,B), extends(A,B), asserts(A,B) The more terms in B referenced by A, the more likely a surfer will follow the link from A to B Center for E-Business Technology

13 Copyright  2008 by CEBT Ranking SWDs  Google Center for E-Business Technology A B D C PR(A) = (1-d) + d( 1/4 + 1/2 + 1/3)  Swoogle A B D C rawPR(A) = (1-d) + d( 0.4/(0.4+0.3+0.2+0.4) + 0.6/(0.6+0.1) +0.5/(0.5+0.1+0.7)) 0.4 0.3 0.2 0.4 0.1 0.6 0.5 0.7 0.1

14 Copyright  2008 by CEBT Ranking SWDs Center for E-Business Technology

15 Copyright  2008 by CEBT Indexing and Retrieval of SWDs  Using traditional IR techniques Reasoning over large collections of documents can be expensive IR techniques have the advantage of being faster, while taking a somewhat more coarse view of the text Including well researched method for ranking matches, computing similarity between documents  Using N-grams Can result in a larger vocabulary Inter-word relationships are preserved Somewhat resistant to certain kinds of errors Center for E-Business Technology

16 Copyright  2008 by CEBT Conclusions  Current web search engines Do not work well with SWDs, as they are designed to work with natural languages and expect documents to contain unstructured text composed of words  Swoogle A prototype crawler-based indexing and retrieval system for Semantic Web documents Center for E-Business Technology

17 Copyright  2008 by CEBT Evaluation and Discussion  Pros Clear contribution on the method: – How to discover potential SWDs – How to rank SWDs  Cons Poor explanation about ranking algorithm – The reason they differentiated between SWOs and SWDBs – How the ranking formula(which are different depend on type of SWD) comes out  Discussion How can Semantic Web retrieval system process conflict between SWDs By ranking? Or by TF-IDF? Or else method? Center for E-Business Technology

18 Copyright  2008 by CEBT Current Status (@ 2005)  Referenced from Li Ding et al., "Finding and Ranking Knowledge on the Semantic Web", Proceedings of the 4th International Semantic Web Conference, November 2005."Finding and Ranking Knowledge on the Semantic Web" Tim Finin et al., "Swoogle: Searching for knowledge on the Semantic Web", AAAI 05 (intelligent systems demo), July 2005"Swoogle: Searching for knowledge on the Semantic Web"  System architecture Metadata creation -> Digest – Computes metadata for SWDs and semantic web terms(SWTs) as well as identifies relations among them Center for E-Business Technology

19 Copyright  2008 by CEBT Current Status (@ 2005)  Size SWDs : 135K -> 368K SWDs SWOs : 13.29% of SWDs -> 1% of SWDs  Ranking SWDs and SWTs Center for E-Business Technology


Download ppt "Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel."

Similar presentations


Ads by Google