Roi Adadi David Ben-David.  Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages,

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

DAML Ontology Library Mike Dean OntoLog Forum 28 February

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:

Natural Language Processing WEB SEARCH ENGINES August, 2002.

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.

1 Semantic Web Technologies: The foundation for future enterprise systems Okech Odhiambo Knowledge Systems Research Group Strathmore University.

Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

Ontology Notes are from:

Information Retrieval in Practice

Search Engines and Information Retrieval

WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.

Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.

Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.

Web Mining Research: A Survey

Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.

WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.

ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.

RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.

Samad Paydar Web Technology Laboratory Computer Engineering Department Ferdowsi University of Mashhad 1389/11/20 An Introduction to the Semantic Web.

Swoogle Swoogle Semantic Search Engine Web-enhanced Information Management Bin Wang.

Information Retrieval

Overview of Web Data Mining and Applications Part I

Overview of Search Engines

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Presented By: - Chandrika B N

RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.

Semantic Web outlook and trends May The Past 24 Odd Years 1984 Lenat’s Cyc vision 1989 TBL’s Web vision 1991 DARPA Knowledge Sharing Effort 1996.

Search Engines and Information Retrieval Chapter 1.

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

Logics for Data and Knowledge Representation

Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.

Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.

@ Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 SwoogleSwoogle SwoogleSwoogle search and metadata for the semantic web Partial research support.

Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.

Semantic Web - an introduction By Daniel Wu (danielwujr)

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.

Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.

EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.

Algorithmic Detection of Semantic Similarity WWW 2005.

User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.

- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.

Text Based Similarity Metrics and Delta for Semantic Web Graphs Krishnamurthy Koduvayur Viswanathan Monday, June 28,

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.

UMBC an Honors University in Maryland 1 Finding and Ranking Knowledge on the Semantic Web Li Ding, Rong Pan, Tim Finin, Anupam Joshi, Yun Peng and Pranam.

A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.

Characterizing Knowledge on the Semantic Web with Watson Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Sofia Angeletou, Marta Sabou, Enrico Motta.

@ eBiquity Lab, CSEE, UMBC Swoogle Tutorial (Part I: Swoogle R & D) A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle.

Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.

Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.

26/02/ WSMO – UDDI Semantics Review Taxonomies and Value Sets Discussion Paper Max Voskob – February 2004 UDDI Spec TC V4 Requirements.

Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.

Swoogle: A Semantic Web Search and Metadata Engine Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel.

Data mining in web applications

Information Retrieval in Practice

Service-Oriented Computing: Semantics, Processes, Agents

Search Engine Architecture

SWD = SWO + SWI SWD Rank SWD IR Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Presented by ebiqity UMBC Nov, 2004

Data Mining Chapter 6 Search Engines

The ultimate in data organization

Web Mining Research: A Survey

Information Retrieval and Web Design

Presentation transcript:

Roi Adadi David Ben-David

 Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages, i.e. RDF/XML, N-TRIPLE or N3.  Semantic Web Term (SWT) ◦ An RDF resource that represents an instance of rdfs:Class or rdf:Property, and can be universally referenced by its URI reference (URIref).  Semantic Web Ontology (SWO) ◦ An SWD is considered to be an SWO when a significant proportion of the statements it makes defines new SWTs.  Semantic Web Database (SWDB) ◦ An SWD that does not define or extend a significant number of terms. ◦ Introduces individuals and makes assertions about them. ◦ Make assertions about individuals defined in other SWDs. … Computer Science Object Oriented Programming 3.0 … SWD SWT

Class Document Class Organization Property mbox FOAF Contain 12 classes and 51 properties (in 466 triples) (No individuals) FOAF Contain 12 classes and 51 properties (in 466 triples) (No individuals)

Name statement Nick Name statement FOAF description for Tim Finin Defines three individuals and make statements about them (No classes or properties) FOAF description for Tim Finin Defines three individuals and make statements about them (No classes or properties)

 Current form of the Semantic Web ◦ web of Semantic Web Documents (SWD)  Navigating the Semantic Web is difficult ◦ Paucity of explicit hyperlinks (beyond NS in URIrefs). ◦ Relations such as rdfs:seeAlso and owl:imports are rare.  There is a need for a search engine customized for SWD ◦ Find and analyze SWDs on the web. ◦ Suggest a measure for SWDs’ importance (ranking).

 Semantic Web researchers ◦ Search for SWTs and SWOs for publishing their knowledge.  Software Agents ◦ Search SWDs for external knowledge. ◦ Retrieve SWOs to fully understand SWTs. Find the most popular ontology to publish a personal profile

 Conventional web navigation and ranking models are not suitable for the Semantic Web.  They do not differentiate SWDs from other web pages.  They do not parse and use the internal structure of SWD and the external semantic links among SWDs ◦ Designed to work with NL and unstructured text The FOAF ontology is not among the 10 search results in Google for “person ontology”

 Finding appropriate ontologies ◦ Qualified search (Terms + Types) ◦ Ontologies are sorted by their popularity.  Finding instance data ◦ Querying SWDs with constraints on the classes and properties used by them. ◦ Helps to integrate Semantic Web data on the web.  Characterizing the Semantic Web ◦ Structural properties

 Ontology Based Annotation Systems ◦ SHOE, Ontobroker, webKB, QuizRDF, CREAM, … ◦ Annotating online documents. ◦ Document indexes based on the annotations, but not on the entire document. ◦ Use their own ontologies that might not suit some SWDs

 Ontology Repositories ◦ DAML Ontology Library, SemWebCentral, Schema Web, … ◦ Collect ontologies (simply store the entire RDF document). ◦ Do not automatically discover SWDs but rather require people to submit URLs. ◦ Constitute a small portion of the Semantic Web.

 Semantic Web Browsers ◦ W3C’s Ontaria  Searchable and browsable directory of RDF documents developed by the W3C. ◦ Do not automatically discover SWDs. ◦ Stores the full RDF graphs. ◦ Indexes individuals of well known classes  e.g. foaf:Person, rss:Item Experiments show: outperforms them all!

 Crawler-based indexing and retrieval system for the Semantic web.  Discover semantic web documents  Computes relations between documents  Store and reason over extracted metadata ◦ The system is designed to scale up to handle tens of millions of documents  Enables rich query constraints on semantic relations

 Collects candidate URLs to find and cache SWDs ◦ Submitted URLs. ◦ A Web crawler. ◦ A customized meta-crawler (using conventional search engines). ◦ SwoogleBot Semantic Web Crawler.  Analyzes SWDs to produce new candidates. Up until now Swoogle has found over 1.7M SWDs with more than 1G triples!

 Analyzes the discovered SWDs  Generates the bulk of Swoogle’s metadata about the Semantic Web ◦ Characterizes features associated with SWDs and SWTs. ◦ Tracks relations among SWDs and SWTs. How SWDs use/define/populate a given SWT? How two SWTs are associated?…

 Analyzes the generated metadata. ◦ Classification of SWOs and SWDBs.  Hosts the modular ranking mechanisms. ◦ Ontology Rank.

 provides search services to software agents and users, allowing them to access metadata and navigate the semantic web ◦ Swoogle Search – searches SWDs using constraints on URLs, SWTs being used or defined, etc. ◦ Ontology Dictionary – searches ontologies at the term level and offers more navigational paths.

 SWD metadata is collected to make SWD search more efficient and effective.  Derived from the content of SWD as well as the relations among SWDs  3 categories of metadata: ◦ Basic metadata ◦ Relations among SWDs ◦ Analytical results

 Language Features – properties describing the syntactic or semantic features of an SWD. ◦ Encoding – syntactic encoding of an SWD.  “RDF/XML”, “N-TRIPLE” and “N3”. ◦ Language – the language used by an SWD.  “OWL”, “DAML+OIL”, “RDFS” and “RDF”. ◦ OWL Species – the language species of an SWD written in OWL.  “OWL-LITE”, “OWL-DL” and “OWL-FULL”

 RDF Statistics – properties summarizing node distribution of the RDF graph of an SWD. ◦ How an SWD defines new classes, properties and individuals. ◦ Let foo be an SWD and let C(foo), P(foo), I(foo) be the set of classes, properties and individuals defined in the SWD foo respectively. The onology-ratio R(foo) is calculated by: ◦ R(foo) ranges from 0 to 1, where 0 implies that foo is a pure SWDB and 1 implies that foo is a pure SWO. Computer Science Object Oriented Programming 3.0

 Ontology Annotations– properties that describe an SWD as an ontology. ◦ The SWD has an instance of OWL:Ontology ◦ Swoogle records the following properties:  label (rdfs:label)  comment (rdfs:comment)  versionInfo (owl:versionInfo/daml:versionInfo)

 Capturing and analyzing relations at the RDF node level is hard.  Swoogle generalizes RDF node level relations and Focuses on SWD level relations.  Swoogle captures the following SWD level relations: ◦ TM/IN – SWD is using terms defined by some other SWDs. ◦ IM – an ontology imports another ontology. ◦ EX – an ontology extends another ontology ◦ PV – an ontology is a prior version of another. ◦ CPV – an ontology is a prior version of another and is compatible with it. ◦ IPV - an ontology is a prior version of another and is incompatible with it.

Indicators of inter-ontology relation

 OntologyRank inspired by Google’s PageRank algorithm.  Underlying Random Surfing Model: ◦ Surfer jumps to a random URL ◦ With probability d randomly chooses a link to follow. ◦ With probability 1-d jumps to another random URL.

 Given a document A, A ’s Page rank is computed by: where are web documents that link to A ; C(T) is the total outlinks of T ; and d is a damping factor, typically set to 0.85.

 The graph formed by SWDs has a richer set of relations. ◦ The edges have explicit semantics  Users can navigate the Semantic Web whithin or across the web and RDF graph through 7 groups of navigational paths

 The semantics of links lead to a non-uniform probability of following a particular outgoing link.  Given SWD’s A and B, Swoogle classifies inter-SWD links into four categories: ◦ imports(A,B) – A import all content of B. ◦ uses-term(A,B) – A uses some of the terms defined by B (without importing B). ◦ extends(A,B) – A extends the definitions of terms defined by B. ◦ asserts(A,B) – A makes assertions about the individuals defined by B.  Each category is assigned a different weight, which represents the probability of following that kind of link.

 Given an SWD a, Swoogle computes its raw rank by: where L(a) is the set of SWDs that link to a, T(x) is the set of SWDs that x links to.

 Then, Swoogle computes the rank for SWDB and SWO by: where T(c) is the transitive closure of SWOs imported by a.

 The problem of Indexing and Searching SWDs ◦ Significant semantic information encoded in marked documents. ◦ Reasoning over large collection of documents can be expensive.  Traditional information retrieval techniques ◦ Faster (coarse view of the text). ◦ Can quickly retrieve a set of SWD’s based on similarities of the source text alone.

 SWDs are not entirely markup. ◦ Search should be applied to both structured and unstructured components of the document.  We may want SWDs to be available to commonly used search engins ◦ Documents must be transformed to a form that a standard IR engine can understand and manipulate.  Well researched methods for ranking matches, computing similarities between documents and employing relevance feedback.

 Look at a document as a collection of either tokens or N-Grams.  URIrefs of classes, properties and individuals corresponds to words in natural languages.  Apply the following process to an SWD ◦ Reduce it to triples. ◦ Extract URIrefs (with duplicates). ◦ Discard URIrefs of blank nodes. ◦ Hash each URI to a token. ◦ Index the document. indexes by either N-Gram or URIrefs Matching “time” to: