Download presentation
Presentation is loading. Please wait.
1
Roi Adadi David Ben-David
2
Semantic Web Document (SWD) ◦ A web page that serializes an RDF graph. ◦ Uses one of the recommended RDF syntax languages, i.e. RDF/XML, N-TRIPLE or N3. Semantic Web Term (SWT) ◦ An RDF resource that represents an instance of rdfs:Class or rdf:Property, and can be universally referenced by its URI reference (URIref). Semantic Web Ontology (SWO) ◦ An SWD is considered to be an SWO when a significant proportion of the statements it makes defines new SWTs. Semantic Web Database (SWDB) ◦ An SWD that does not define or extend a significant number of terms. ◦ Introduces individuals and makes assertions about them. ◦ Make assertions about individuals defined in other SWDs. … Computer Science Object Oriented Programming 3.0 … SWD SWT
3
Class Document Class Organization Property mbox FOAF http://xmlns.com/foaf/spec/index.rdf Contain 12 classes and 51 properties (in 466 triples) (No individuals) FOAF http://xmlns.com/foaf/spec/index.rdf Contain 12 classes and 51 properties (in 466 triples) (No individuals)
4
Name statement Nick Name statement FOAF description for Tim Finin www.cs.umbc.edu/~finin//foaf.rdf Defines three individuals and make statements about them (No classes or properties) FOAF description for Tim Finin www.cs.umbc.edu/~finin//foaf.rdf Defines three individuals and make statements about them (No classes or properties)
5
Current form of the Semantic Web ◦ web of Semantic Web Documents (SWD) Navigating the Semantic Web is difficult ◦ Paucity of explicit hyperlinks (beyond NS in URIrefs). ◦ Relations such as rdfs:seeAlso and owl:imports are rare. There is a need for a search engine customized for SWD ◦ Find and analyze SWDs on the web. ◦ Suggest a measure for SWDs’ importance (ranking).
6
Semantic Web researchers ◦ Search for SWTs and SWOs for publishing their knowledge. Software Agents ◦ Search SWDs for external knowledge. ◦ Retrieve SWOs to fully understand SWTs. Find the most popular ontology to publish a personal profile
7
Conventional web navigation and ranking models are not suitable for the Semantic Web. They do not differentiate SWDs from other web pages. They do not parse and use the internal structure of SWD and the external semantic links among SWDs ◦ Designed to work with NL and unstructured text The FOAF ontology is not among the 10 search results in Google for “person ontology”
8
Finding appropriate ontologies ◦ Qualified search (Terms + Types) ◦ Ontologies are sorted by their popularity. Finding instance data ◦ Querying SWDs with constraints on the classes and properties used by them. ◦ Helps to integrate Semantic Web data on the web. Characterizing the Semantic Web ◦ Structural properties
9
Ontology Based Annotation Systems ◦ SHOE, Ontobroker, webKB, QuizRDF, CREAM, … ◦ Annotating online documents. ◦ Document indexes based on the annotations, but not on the entire document. ◦ Use their own ontologies that might not suit some SWDs
10
Ontology Repositories ◦ DAML Ontology Library, SemWebCentral, Schema Web, … ◦ Collect ontologies (simply store the entire RDF document). ◦ Do not automatically discover SWDs but rather require people to submit URLs. ◦ Constitute a small portion of the Semantic Web.
11
Semantic Web Browsers ◦ W3C’s Ontaria Searchable and browsable directory of RDF documents developed by the W3C. ◦ Do not automatically discover SWDs. ◦ Stores the full RDF graphs. ◦ Indexes individuals of well known classes e.g. foaf:Person, rss:Item Experiments show: outperforms them all!
12
Crawler-based indexing and retrieval system for the Semantic web. Discover semantic web documents Computes relations between documents Store and reason over extracted metadata ◦ The system is designed to scale up to handle tens of millions of documents Enables rich query constraints on semantic relations
14
Collects candidate URLs to find and cache SWDs ◦ Submitted URLs. ◦ A Web crawler. ◦ A customized meta-crawler (using conventional search engines). ◦ SwoogleBot Semantic Web Crawler. Analyzes SWDs to produce new candidates. Up until now Swoogle has found over 1.7M SWDs with more than 1G triples!
15
Analyzes the discovered SWDs Generates the bulk of Swoogle’s metadata about the Semantic Web ◦ Characterizes features associated with SWDs and SWTs. ◦ Tracks relations among SWDs and SWTs. How SWDs use/define/populate a given SWT? How two SWTs are associated?…
16
Analyzes the generated metadata. ◦ Classification of SWOs and SWDBs. Hosts the modular ranking mechanisms. ◦ Ontology Rank.
17
provides search services to software agents and users, allowing them to access metadata and navigate the semantic web ◦ Swoogle Search – searches SWDs using constraints on URLs, SWTs being used or defined, etc. ◦ Ontology Dictionary – searches ontologies at the term level and offers more navigational paths.
18
SWD metadata is collected to make SWD search more efficient and effective. Derived from the content of SWD as well as the relations among SWDs 3 categories of metadata: ◦ Basic metadata ◦ Relations among SWDs ◦ Analytical results
19
Language Features – properties describing the syntactic or semantic features of an SWD. ◦ Encoding – syntactic encoding of an SWD. “RDF/XML”, “N-TRIPLE” and “N3”. ◦ Language – the language used by an SWD. “OWL”, “DAML+OIL”, “RDFS” and “RDF”. ◦ OWL Species – the language species of an SWD written in OWL. “OWL-LITE”, “OWL-DL” and “OWL-FULL”
20
RDF Statistics – properties summarizing node distribution of the RDF graph of an SWD. ◦ How an SWD defines new classes, properties and individuals. ◦ Let foo be an SWD and let C(foo), P(foo), I(foo) be the set of classes, properties and individuals defined in the SWD foo respectively. The onology-ratio R(foo) is calculated by: ◦ R(foo) ranges from 0 to 1, where 0 implies that foo is a pure SWDB and 1 implies that foo is a pure SWO. Computer Science Object Oriented Programming 3.0
21
Ontology Annotations– properties that describe an SWD as an ontology. ◦ The SWD has an instance of OWL:Ontology ◦ Swoogle records the following properties: label (rdfs:label) comment (rdfs:comment) versionInfo (owl:versionInfo/daml:versionInfo)
22
Capturing and analyzing relations at the RDF node level is hard. Swoogle generalizes RDF node level relations and Focuses on SWD level relations. Swoogle captures the following SWD level relations: ◦ TM/IN – SWD is using terms defined by some other SWDs. ◦ IM – an ontology imports another ontology. ◦ EX – an ontology extends another ontology ◦ PV – an ontology is a prior version of another. ◦ CPV – an ontology is a prior version of another and is compatible with it. ◦ IPV - an ontology is a prior version of another and is incompatible with it.
23
Indicators of inter-ontology relation
24
OntologyRank inspired by Google’s PageRank algorithm. Underlying Random Surfing Model: ◦ Surfer jumps to a random URL ◦ With probability d randomly chooses a link to follow. ◦ With probability 1-d jumps to another random URL.
25
Given a document A, A ’s Page rank is computed by: where are web documents that link to A ; C(T) is the total outlinks of T ; and d is a damping factor, typically set to 0.85.
27
The graph formed by SWDs has a richer set of relations. ◦ The edges have explicit semantics Users can navigate the Semantic Web whithin or across the web and RDF graph through 7 groups of navigational paths
29
The semantics of links lead to a non-uniform probability of following a particular outgoing link. Given SWD’s A and B, Swoogle classifies inter-SWD links into four categories: ◦ imports(A,B) – A import all content of B. ◦ uses-term(A,B) – A uses some of the terms defined by B (without importing B). ◦ extends(A,B) – A extends the definitions of terms defined by B. ◦ asserts(A,B) – A makes assertions about the individuals defined by B. Each category is assigned a different weight, which represents the probability of following that kind of link.
30
Given an SWD a, Swoogle computes its raw rank by: where L(a) is the set of SWDs that link to a, T(x) is the set of SWDs that x links to.
31
Then, Swoogle computes the rank for SWDB and SWO by: where T(c) is the transitive closure of SWOs imported by a.
32
The problem of Indexing and Searching SWDs ◦ Significant semantic information encoded in marked documents. ◦ Reasoning over large collection of documents can be expensive. Traditional information retrieval techniques ◦ Faster (coarse view of the text). ◦ Can quickly retrieve a set of SWD’s based on similarities of the source text alone.
33
SWDs are not entirely markup. ◦ Search should be applied to both structured and unstructured components of the document. We may want SWDs to be available to commonly used search engins ◦ Documents must be transformed to a form that a standard IR engine can understand and manipulate. Well researched methods for ranking matches, computing similarities between documents and employing relevance feedback.
34
Look at a document as a collection of either tokens or N-Grams. URIrefs of classes, properties and individuals corresponds to words in natural languages. Apply the following process to an SWD ◦ Reduce it to triples. ◦ Extract URIrefs (with duplicates). ◦ Discard URIrefs of blank nodes. ◦ Hash each URI to a token. ◦ Index the document. indexes by either N-Gram or URIrefs Matching “time” to: http://foo.com/timeont.owl#timeInterval http://foo.com/timeont.owl#calendarClockInterval http://purl.org/upper/temporal/t13.owl#timeThing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.