Presentation is loading. Please wait.

Presentation is loading. Please wait.

@ Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 SwoogleSwoogle SwoogleSwoogle search and metadata for the semantic web Partial research support.

Similar presentations


Presentation on theme: "@ Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 SwoogleSwoogle SwoogleSwoogle search and metadata for the semantic web Partial research support."— Presentation transcript:

1 @ Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 SwoogleSwoogle SwoogleSwoogle search and metadata for the semantic web Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.

2 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 2 Outline Motivation Concepts Demo Architecture  document discovery  metadata creation  ontology rank Status Summary http://swoogle.umbc.eduhttp://swoogle.umbc.edu/

3 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 3 Motivation (Google + Web) has made us all smarter something similar is needed by people and software agents for information on the semantic web

4 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 4 Motivation – Common Questions Find an ontology  What are the ontologies about “time” ?  Shall I use an existing ontology or create one? Find instance data  Show me the instances of a class “http://foo.com/Person”?  Gather relevant information for my application. Characterize the Semantic Web  How many RDF documents are online?  What are the most popular ontologies ?  What graph properties does the semantic web have?  Does namespace URI link to the corresponding ontology?

5 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 5 The Role of Swoogle in Semantic Web Semantic Web Services Data Service Software Agents, Applications SW data service database (Web) document RDF document uses Directory/Digest Service Service Finder digests searches Data Finder SwoogleSwoogle

6 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 6 Related work Ontology based annotation & search  Annotate web documents SHOE (UMCP, 1997) Ontobroker (AIFB, karlsruhe, 1998), WebKB (Martin & Eklund, 1999), QuizRDF (BT,2002)  Annotate proper reference & relations CREAM (AIFB,2003) Ontology repositories  Ontology level DAML Ontology Library Schema Web SemWebCentral  Term level W3C’s Ontaria (2004) Ontology management systems  Stanford’s Ontolingua  IBM’s Snobase  Based on both ontology and instance document  Automated discovery  Search and rank ontologies and terms  Digest but not store  Create metadata based on RDF and OWL semantics  Provide services to both human and software agents Swoogle aims to be a Google-like online ontology repository

7 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 7 Concepts Document  A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL).  An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic.  An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic. Term  A term is a non-anonymous RDF resource which is the URI reference of either a class or a property. Individual  An individual refers to a non-anonymous RDF resource which is the URI reference of a class member. In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple. *JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htmhttp://www.hpl.hp.com/semweb/jena2.htm rdf:type rdfs:Class foaf:Person rdf:type foaf:Person http://.../foaf.rdf#finin

8 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 8 Concepts Example wordNet:Agent rdf:type rdfs:Class rdfs:subClassOf foaf:Person http://xmlns.com/foaf/1.0/ foaf:mbox rdfs:domain rdf:type rdf:Property Property Class SWO http://foo.com/foaf.rdf#finin foaf:mbox rdf:type finin@umbc.edu foaf:Person http://foo.com/foaf.rdf#finin SWI Individual SWD Term NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: => http://www.w3.org/2000/01/rdf-schema foaf: => http://xmlns.com/foaf/1.0/ wordNet: => http://xmlns.com/wordnet/1.6/

9 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 9 Demo Find “Time” Ontology (Swoogle Search) 1 2 3 4 Digest “Time” Ontology Document view Term view Find Term “Person” (Ontology Dictionary) Digest Term “Person” Class properties (Instance) properties 5 Swoogle Statistics

10 Find “Time” Ontology We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology. Demo 1

11 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 11 Usage of Terms in SWD foaf:mbox rdf:type finin@umbc.edu foaf:Person http://www.cs.umbc.edu/~finin/foaf.rdf wordNet:Agent rdf:type rdfs:Class rdfs:subClassOf foaf:Person http://xmlns.com/foaf/1.0/ foaf:mbox rdfs:domain rdf:type rdf:Property populated Class defined Class populated Property defined Property http://foo.com/foaf.rdf#finin foaf:mbox rdf:type finin@umbc.edu foaf:Person http://foo.com/foaf.rdf defined Individual

12 Digest “Time” Ontology (term view) Demo 2(a) …………. TimeZone before intAfter

13 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 13 Document Metadata Web document metadata  When/how discovered/fetched  Suffix of URL  Last modified time  Document size SWD metadata  Language features OWL species RDF encoding  Statistical features Defined/used terms Declared/used namespaces Ontology Ratio  Ontology Rank Ontology annotation  Label  Version  Comment Related Relational Metadata  Links to other SWDs Imported SWDs Referenced SWDs Extended SWDs Prior version  Links to terms Classes/Properties defined/used

14 Digest “Time” Ontology (document view) Demo 2(b)

15 Find Term “Person” Demo 3 Not capitalized! URIref is case sensitive!

16 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 16 Term Metadata: An integrated definition Class Definition rdfs:subClassOf -- foaf:Agent rdfs:label – “Person” Properties (from SWI) foaf:name dc:title Properties (from SWO) foaf:mbox foaf:name foaf:mbox rdfs:domain Onto 1 owl:Class rdf:type “Person” rdfs:label foaf:Agent rdfs:subClassOf Onto 2 foaf:name rdf:type “Tim Finin” SWD3 foaf:Person

17 Digest Term “Person” Demo 4 167 different properties 562 different properties

18 Demo 5 Swoogle Statistics

19 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 19 Swoogle Architecture metadata creation data analysis interface SWD discovery SWD Metadata Web Service Web Server SWD Cache The Web Candidate URLs Web Crawler SWD Reader IR analyzerSWD analyzer Agent Service

20 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 20 1. SWD Discovery Swoogle uses three crawlers to discover likely SWD URLs  A Google Crawler uses Google to find URLs using keywords: http://www.w3.org/2000/01/rdf-schema,... File type suffices:.rdf,.owl  A Focused Crawler crawls through HTML files recursively within the given website.  A SWD Crawler crawls through SWDs and discover URLs according to term semantics. To determine the likely SWD URLs:  Non-swd extension filter:.jpg,.mp3, and etc.  Protocol filter: file://, urn:, and etc.  Namespace of RDF resources in SWD

21 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 21 2. Metadata Creation Document metadata  General metadata  SWD metadata  Ontology metadata Term Metadata (definition)  Class property  (Instance) property: i.e. class-property bond Relational metadata TermDocument Term rdfs:subClassOf, rdfs:domain…rdfs:seeAlso, … Document Uses, Defines,…owl:imports,…

22 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 22 2.1 Ontology Ratio Why?  The fuzzy distinction between ontology and instance document Given a SWD foo, and let  C(foo): the set of classes defined in foo  P(foo): the set of properties defined in foo  I(foo): the set of instances defined in foo Ontology Ratio as a heuristic to do the classification  0: pure SWI  1: pure SWO  > 0.8: foo is said to be an ontology.

23 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 23 2.2 Relational Metadata Inter-document relation  rdfs:seeAlso  IMport (IM) e.g. owl:import  Similar/Equal SWD Inter-term relation  EXtension (EX) e.g. rdfs:subClassOf  use-TerM (TM)e.g. rdf:range  use-INdividual (IN)e.g. owl:sameAs  Prior Version (PV, IPV, CPV) Generalized inter-document relations  Generalized from individual level relation  Capture more relations while with less complexity Usage  Link SWDs  Ontology rank

24 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 24 SWOs SWIs HTML documents Images Audio files Video files 3. Data analysis: Ranking SWD Why?  Ranking captures page importance and popularity  Ranking has been proven useful in HTML search.  SWD is different from HTML and has more semantics  So, a new SWD ranking mechanism is needed ! Related ideas?  Google ’ s PageRank  Kleinberg ’ s HITS

25 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 25 3.1 Random surfer model (PageRank) How PageRank is computed?  page A’s rank is  Where {T i } are the pages that link to A C(X): # of page X’s out links d is a damping factor (e.g., 0.85)  Compute by iterating until converge Uniform probability of following any link is convention in the Web but not in the SW  Links have semantics that influence the probability of following them  Rational users read an ontology and all ontologies it referenced. Jump to a random page Follow a random link bored? no yes read page

26 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 26 3.2 Rational Random Surfer Model Weighted random behavior Rational behavior  Rank of a SWI  Rank of a a SWO Jump to a random page Follow a random link bored? no yes read page Read referenced SWOs SWO? yes no where TC(A) is transitive closure of SWOs referencing A. 1 2 1 2

27 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 27 3.3 Ontology Rank Example foaf:mbox rdf:type finin@umbc.edu foaf:Person http://www.cs.umbc.edu/~finin/foaf.rdf wordNet:Person rdf:type rdfs:Class rdfs:subClassOf foaf:Person http://xmlns.com/foaf/1.0/ TM http://www.w3.org/2000/01/rdf-schema rdfs:subClassOf rdf:Property rdf:type http://xmlns.com/wordnet/1.6/ rdfs:Class rdf:type wordNet:Individual rdfs:subClassOf wordNet:Person EX

28 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 28 3.3 Ontology Rank Example (cont’d) http://www.cs.umbc.edu/~finin/foaf.rdf http://xmlns.com/wordnet/1.6/ http://xmlns.com/foaf/1.0/ EX TM http://www.w3.org/2000/01/rdf-schema rawPR =0.2 rawPR =100 rawPR =3 rawPR =300 PR =0.2 PR =100 PR =103 PR =403

29 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 29 Current Status Swoogle Watch reported (Nov 7, 2004)  40 M triples  270 K SWDs: 4k ontologies  144 K terms: 91K classes & 51K properties Ongoing work  Ontology Dictionary  Swoogle Statistics  Web Service interface (see Swoogle website)  IR with the Semantic Web ( Content search) Character N-Grams Bag of URIrefs Swangling

30 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 30 Summary Swoogle (Mar, 2004) Swoogle2 (Sep, 2004) Swoogle3  Automated SWD discovery  SWD metadata creation and search  Ontology rank (rational surfer model)  Swoogle watch  Web Interface  Ontology dictionary  Swoogle statistics  Web service interface (WSDL)  Bag of URIref IR search  Better crawl & refresh strategies  More metadata (ontology mapping)  More IR features  Better web service interfaces  Capture and store all triples  More reasoning 2005 2004

31 @ SwoogleSwoogle SwoogleSwoogle MotivationConceptsDemoArchitectureStatusSummary Swoogle, cikm'04 -- http://swoogle.umbc.edu/ 31 The End Website: http://swoogle.umbc.eduhttp://swoogle.umbc.edu Slides at: http://ebiquity.umbc.edu/v2.1/resource/html/id/66/http://ebiquity.umbc.edu/v2.1/resource/html/id/66/ Demo: http://ebiquity.umbc.edu/v2.1/resource/html/id/65/http://ebiquity.umbc.edu/v2.1/resource/html/id/65/


Download ppt "@ Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004 SwoogleSwoogle SwoogleSwoogle search and metadata for the semantic web Partial research support."

Similar presentations


Ads by Google