@ eBiquity Lab, CSEE, UMBC Swoogle Tutorial (Part I: Swoogle R & D) A brief introduction to Swoogle An overview of Swoogle research A summary of Swoogle development Presented by eBiquity Lab, CSEE, UMBC
1. Introduction Motivation Swoogle in the Semantic Web Glossary Swoogle Architecture SwoogleSwoogle SwoogleSwoogle
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Motivation (Google + Web) has made us all smarter something similar is needed by people and software agents for information on the semantic web
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC The Role of Swoogle in Semantic Web Semantic Web Services Semantic web data Software Agents, Applications SW data service database (Web) document RDF document uses Directory/Digest Service Service Finder digests searches Data Finder SwoogleSwoogle
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Concepts Explained wordNet:Agent rdf:type rdfs:Class rdfs:subClassOf foaf:Person foaf:mbox rdfs:domain rdf:type rdf:Property Property Class SWO foaf:mbox rdf:type foaf:Person SWI Individual SWD Term NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows rdf: => rdfs: => foaf: => wordNet: =>
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Glossary Document A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL). An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic. An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic. Term A term is a non-anonymous RDF resource which is the URI reference of either a class or a property. Individual An individual refers to a non-anonymous RDF resource which is the URI reference of a class member. In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple. *JENA is a Java framework for writing Semantic Web applications. rdf:type rdfs:Class foaf:Person rdf:type foaf:Person
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Swoogle Architecture metadata creation data analysis interface SWD discovery SWD Metadata Web Service Web Server SWD Cache The Web Candidate URLs Web Crawler SWD Reader IR analyzerSWD analyzer Agent Service
2. Swoogle Research Discovery Digest Search & Navigation Rank Statistics SwoogleSwoogle SwoogleSwoogle
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Discovery - research Discovering URLs of possible SWD automatically Google-crawler Focused-crawler Semantic-Web-crawler, e.g. scutter Revisiting URLs
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Discovery -- results Crawler performance Google crawler is the best Focused crawler needs to be improved Verified pure SWDs are only 1/3 of discovered URLs Some NSWDs contains embedded RDF graph. SWDNSWDUndecidedTOTAL Focused Crawler1,4657%10,58052%8,29220,337 google crawler273,02336%369,37149%110,794753,188 swd_crawler61,87015%285,50670%57,709405,085 TOTAL336, , ,7951,178,610 Source: Swoogle (2005-Jan-05) SELECT `discovered_by`, sum(isRDF), sum(1-isRDF), count(*) FROM `digest_url` WHERE 1 group by discovered_by
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Digest -- research Document metadata Annotative General metadata SWD metadata Ontology metadata Inter-document relations Document-term relations Term metadata Term Definition Inter-term Relation Class-property bond (C-P bond): rdfs:domain Property-Class bond (P-C bond): rdfs:range
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Document Metadata Web document metadata When/how discovered/fetched Suffix of URL Last modified time Document size SWD metadata Language features OWL species RDF encoding Statistical features # of Defined/used terms # of Declared/used namespaces Ontology Ratio Ontology Rank Ontology annotation Label Version Comment Relations Links to other SWDs Imported SWDs Referenced SWDs Extended SWDs Prior version Links to terms Classes/properties defined Classes/properties used
Digest “Time” Ontology (document view) Demo 2(a)
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Document-Term Relation foaf:mbox rdf:type foaf:Person wordNet:Agent rdf:type rdfs:Class rdfs:subClassOf foaf:Person foaf:mbox rdfs:domain rdf:type rdf:Property populated Class defined Class populated Property defined Property foaf:mbox rdf:type foaf:Person defined Individual
Digest “Time” Ontology (term view) Demo 2(b) ………….
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Term Metadata Term Definition rdfs:subClassOf -- foaf:Agent rdfs:label – “Person” C-P bond (from SWI) foaf:name dc:title C-P bond (from SWO) foaf:mbox foaf:name foaf:mbox rdfs:domain Onto 1 owl:Class rdf:type “Person” rdfs:label foaf:Agent rdfs:subClassOf Onto 2 foaf:name rdf:type “Tim Finin” SWD3 foaf:Person
Digest Term “Person” Demo 4
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Term Distribution (grouped by local name) case-insensitivecase-sensitive Name656 1 name source129 Person399 2 Person Title349 3 title Book124 Location334 4 description address121 Description288 5 location Event117 Date257 6 type Location114 Type242 7 date author111 country236 8 value Animal111 Address212 9 Organization Country104 organization country language103 total total 76827
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Digest -- result typePop.Def.# term Total Terms# populated Total populated class01 83,60288% 00% 10 3,9544% 1,002,96113% 11 7,0657%94,6216,483,48587%7,486,446 property01 42,85373% 00% 10 8,31214% 2,438,4556% 11 7,83613%59,00136,899,84294%39,338,297 Ontological Term Distribution (populated, defined) Source: Swoogle (2005-Jan-05) SELECT res_type,sign(cnt_instance_populate>0), sign(cnt_swd_def>0),count(*), sum(cnt_instance_populate) FROM `digest_term` WHERE 1 group by res_type, sign(cnt_instance_populate>0), sign(cnt_swd_def>0)
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Search & Navigation -- research The Semantic Web is not the Web Search service Document search – RDF document is not free text Term search – URIref and compound local name Navigation service The RDF graph – Typed links The web of RDF documents – Few hyperlinks The social network of agents – trust & provenance
Find “Time” Ontology We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology. Demo 1
Find Term “Person” Demo 3 Not capitalized! URIref is case sensitive!
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Current Swoogle Navigation Model A URIref refers to A term, i.e. instance of RDFS class/property An individual, i.e. populated terms A SWD could be SWO: term definition SWI: individuals Observations RDF Resources are semantically linked in RDF graph SWDs are poorly linked due to the absence of explicit hyperlink concept Ontologies are more interesting Approach Build inter-document relations Rational surfing model SWOs SWIs HTML documents Images Audio files Video files
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC URL URIref Semantic Web Navigation Model new! Resource RDF Document populatesClass populatesProperty refersClass refersProperty definesClass definesProperty rdfsOntology owldlOntology owl:imports owl:priorVersion owl:backwardCompatibleWith owl:imcompatiableWith rdfs:seeAlso rdfs:isDefinedBy Ontology Namespace isDefinedBy isUsedBy usesNamespace rdfs:subClassOf sameNamespace sameLocalname RDF Graph Navigation … Term Search Document Search
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Ranking -- research Surfing models Ranking method PageRank variation What to rankScopeIdea Rational surfing modelSWDSemantic WebSummarize inter-document relation as EX, TM, IM, PV Plain Graph ModelResourceRDF graphRDF graph is browsed as a weighted directed graph RDFS-based ModelResourceRDF graphRDF graph is browsed only with RDFS semantics SW navigation modelResource & SWD Semantic WebAssume Swoogle is used in navigation
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Ranking with Rational Surfing Model: An Example foaf:mbox rdf:type foaf:Person wordNet:Person rdf:type rdfs:Class rdfs:subClassOf foaf:Person TM rdfs:subClassOf rdf:Property rdf:type rdfs:Class rdf:type wordNet:Individual rdfs:subClassOf wordNet:Person EX
Demo 6 Swoogle’ top 10 This report is dynamically generated based on the latest data, and it will take 5 to 10 seconds. Swoogle use PageRank like algorithm to rank semantic web documents. Well-known ontologies are highly ranked.
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Statistics – research Summarize the dataset collected by Swoogle Swoogle Watch Swoogle Today Distribution of visited URLs Document discovery log Term discovery log Semantic Web Watch SWD distribution by last-modified month SWD distribution by website SWD distribution by suffix Ontology Watch Term (class/property) usage Namespace usage
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Demo 5(a) Swoogle Today
Demo 5(b) Swoogle Statistics FOAF Trustix W3C Stanford
Demo 5(c) Swoogle Statistics
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Miscellaneous Submit URL for focused Crawler Swoogle Web Service (Delivered in Sept.) Search document Search term Term digest
When you can’t find your ontologies in Swoogle, it may be the case that your ontologies are not indexed by swoogle yet. Please submit it and increase its visibility. From site map When your query fails Demo 7 Submit URL for focused crawler
3. Summary Summary Current Status SwoogleSwoogle SwoogleSwoogle
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Summary Swoogle (Mar, 2004) Swoogle2 (Sep, 2004) Swoogle3 Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Better discovery & revisit strategies Better navigation models Semantic web dataset Index Instance data More metadata (ontology mapping) Better web service interfaces
@ SwoogleSwoogle SwoogleSwoogle eBiquity Lab, CSEE, UMBC Current Status Swoogle Watch reported (Jan 6, 2005) 46.7 M triples 336 K SWDs: 4k ontologies 153 K terms: 94K classes & 59K properties Ongoing work Research Self-adaptive SWD Discovery Efficient SWD digest and RDF Graph Abstract Semantic Web navigation model Engineering Enhancing Web Service interface