Download presentation
Presentation is loading. Please wait.
Published byGiles Lamb Modified over 9 years ago
1
UMBC an Honors University in Maryland 1 Search Engines for Semantic Web Knowledge Tim Finin University of Maryland, Baltimore County Joint work with Li Ding, Anupam Joshi, Yun Peng, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602- 97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.http://creativecommons.org/licenses/by-nc-sa/2.0/
2
UMBC an Honors University in Maryland 2 This talk Motivation Semantic web 101 Swoogle Semantic Web search engine Use cases and applications State of the Semantic Web Conclusions
3
UMBC an Honors University in Maryland 3 Once there were only a few large computers
4
UMBC an Honors University in Maryland 4 Then there were many,
5
UMBC an Honors University in Maryland 5 All connected 24x7, Cellular telephony RFID 802.11 Bluetooth TCP/IP Ultra Wide Band Software Radio IRDA
6
UMBC an Honors University in Maryland 6 Interoperating; tcp/ip ftp smtp rpc corba ssh http html xml gif jpg mpg mp3 pdf …
7
UMBC an Honors University in Maryland 7 Access to the world’s knowledge del.icio.us
8
UMBC an Honors University in Maryland 8 Google has made us smarter
9
UMBC an Honors University in Maryland 9 But what about our agents? tell register Agents still have a very minimal understanding of text and images.
10
UMBC an Honors University in Maryland 10 This talk Motivation Semantic web 101 Swoogle Semantic Web search engine Use cases and applications State of the Semantic Web Conclusions
11
UMBC an Honors University in Maryland 11 XML helps “XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.” -- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.
12
UMBC an Honors University in Maryland 12 “The Semantic Web will globalize KR, just as the WWW globalize hypertext” -- Tim Berners-Lee Semantic Web adds semantics
13
UMBC an Honors University in Maryland 13 Semantic Web 101 <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf=http://xmlns.com/foaf/0.1/ xmlns:uni=http//ebiquity.umbc.edu/ontologies/uni/> Li Ding RDF/XML rdf:RDF tag namespaces ontologies Semantic graph, URIs as nodes & links triples Li Ding foaf:name uni:Student rdf:type
14
UMBC an Honors University in Maryland 14 Where’s the semantics? URIs as common “rigid designators” Conventions let URIs denote things in the “real world” Namespaces + URIs give an unambiguous shared vocabulary RDF, RDFS and OWL have semantics defined using model theory and also axioms Ontologies allow agents to draw inferences –uni:Student is a subclass of foaf:Person –Every uni:Student uni:attends at least one uni:School –A foaf:Person with a uni:school is necessarily a uni:Student
15
UMBC an Honors University in Maryland 15 Much of the RDF data will come from databases, just like HTML content.
16
UMBC an Honors University in Maryland 16
17
UMBC an Honors University in Maryland 17 RDF/a RDF/a is a W3C proposal for embedding RDF in XHTML documents Jo Lambda's Home Page Hello. This is Jo Lambda 's home page. Work If you want to contact me at work, you can either email me, or call +1 777 888 9999. <> foaf:name "Jo Lambda"^^rdf:XMLLiteral ; foaf:mbox ; foaf:phone "+1 777 888 9999"^^rdf:XMLLiteral. An HTML Document with RDF embedded The triples in ntriple format.
18
UMBC an Honors University in Maryland 18 But what about our agents? A Google for knowledge on the Semantic Web is needed by software agents and programs Swoogle tell register
19
UMBC an Honors University in Maryland 19 This talk Motivation Semantic web 101 Swoogle Semantic Web search engine Use cases and applications State of the Semantic Web Conclusions
20
UMBC an Honors University in Maryland 20 http://swoogle.umbc.edu/ Running since summer 2004 1.4M RDF documents, 250M RDF triples, 10K ontologies
21
UMBC an Honors University in Maryland 21 Analysis Index Discovery IR Indexer Search Services Semantic Web metadata Web Service Web Server Candidate URLs Bounded Web Crawler Google Crawler SwoogleBot SWD Indexer Ranking document cache SWD classifier human machine htmlrdf/xml … the Web Semantic Web Information flowSwoogle‘s web interface Legends Swoogle Architecture
22
UMBC an Honors University in Maryland 22 A Hybrid Harvesting Framework Manual submission RDF crawlingBounded HTML crawlingMeta crawling Seeds MSeeds H Seeds R Swoogle Sample Dataset Inductive learner the Web Google API call crawl true would google
23
UMBC an Honors University in Maryland 23 Performance – Site Coverage SW06MAR - Basic statistics (Mar 31, 2006) – 1.3M SWDs from 157K websites – 268M triples – 61K SWOs including >10K in high quality –1.4M SWTs using 12K namespaces Significance –Compare with existing works ( DAML crawler, scutter ) –Compare SW06MAR with Google ’ s estimated SWDs SWDs per website Website
24
UMBC an Honors University in Maryland 24 Performance – crawlers’ contribution High SWD ratio: 42% URLs are confirmed as SWD Consistent growth rate: 3000 SWDs per day RDF crawler: best harvesting method HTML crawler: best accuracy Meta crawler: best in detecting websites # of documents
25
UMBC an Honors University in Maryland 25 This talk Motivation Semantic web 101 Swoogle Semantic Web search engine Use cases and applications State of the Semantic Web Conclusions
26
UMBC an Honors University in Maryland 26 Applications and use cases Supporting Semantic Web developers –Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors,statistics, etc. Searching specialized collections –Spire: aggregating observations and data from biologists –InferenceWeb: searching over and enhancing proofs –SemNews: Text Meaning of news stories Supporting SW tools –Triple shop: finding data for SPARQL queries
27
UMBC an Honors University in Maryland 27
28
UMBC an Honors University in Maryland 28 By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size. 80 ontologies were found that had these three terms Let’s look at this one
29
UMBC an Honors University in Maryland 29 Basic Metadata hasDateDiscoveredhasDateDiscovered: 2005-01-17 hasDatePinghasDatePing: 2006-03-21 hasPingStatehasPingState: PingModified typetype: SemanticWebDocument isEmbeddedisEmbedded: false hasGrammarhasGrammar: RDFXML hasParseStatehasParseState: ParseSuccess hasDateLastmodifiedhasDateLastmodified: 2005-04-29 hasDateCachehasDateCache: 2006-03-21 hasEncodinghasEncoding: ISO-8859-1 hasLengthhasLength: 18K hasCntTriplehasCntTriple: 311.00 hasOntoRatiohasOntoRatio: 0.98 hasCntSwthasCntSwt: 94.00 hasCntSwtDefhasCntSwtDef: 72.00 hasCntInstancehasCntInstance: 8.00
30
UMBC an Honors University in Maryland 30
31
UMBC an Honors University in Maryland 31
32
UMBC an Honors University in Maryland 32 These are the namespaces this ontology uses. Clicking on one shows all of the documents using the namespace. All of this is available in RDF form for the agents among us.
33
UMBC an Honors University in Maryland 33 Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.
34
UMBC an Honors University in Maryland 34 We can also search for terms (classes, properties) like terms for “person”.
35
UMBC an Honors University in Maryland 35 10K terms associatged with “person”! Ordered by use. Let’s look at foaf:Person’s metadata
36
UMBC an Honors University in Maryland 36
37
UMBC an Honors University in Maryland 37
38
UMBC an Honors University in Maryland 38
39
UMBC an Honors University in Maryland 39
40
UMBC an Honors University in Maryland 40
41
UMBC an Honors University in Maryland 41
42
UMBC an Honors University in Maryland 42
43
UMBC an Honors University in Maryland 43
44
UMBC an Honors University in Maryland 44
45
UMBC an Honors University in Maryland 45 UMBC Triple Shop http://sparql.cs.umbc.edu/ Online SPARQL RDF query processing based on HP’s Jena and Joseki with several interesting features Selectable level of inference over model Automatically finds SWDs for give queries using Swoogle backend database –Provide dataset creation wizard –Dataset can be stored on our server or downloaded –Tag, share and search over saved datasets
46
UMBC an Honors University in Maryland 46 Web-scale semantic web data access agent data access servicethe Web ask (“person”) Search vocabulary ask (“?x rdf:type foaf:Person”) inform (“foaf:Person”) Fetch docs Populate RDF database Query local RDF database inform (doc URLs) Search URIrefs in SW vocabulary Search URLs in SWD index Compose query Index RDF data
47
UMBC an Honors University in Maryland 47 Who knows Anupam Joshi? Show me their names, email address and pictures
48
UMBC an Honors University in Maryland 48 The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles
49
UMBC an Honors University in Maryland 49 No FROM clause! Constraints on where the data comes from
50
UMBC an Honors University in Maryland 50 PREFIX foaf: SELECT DISTINCT ?p2name ?p2mbox ?p2pix WHERE { ?p1 foaf:name "Anupam Joshi". ?p1 foaf:mbox ?p1mbox. ?p2 foaf:knows ?p3. ?p3 foaf:mbox ?p1mbox. ?p2 foaf:name ?p2name. ?p2 foaf:mbox ?p2mbox. OPTIONAL { ?p2 foaf:depiction ?p2pix }. } ORDER BY ?p2name
51
UMBC an Honors University in Maryland 51
52
UMBC an Honors University in Maryland 52 Swoogle found 292 RDF data files that appear relevant to answering our query
53
UMBC an Honors University in Maryland 53 Let’s save the dataset before we use it
54
UMBC an Honors University in Maryland 54
55
UMBC an Honors University in Maryland 55 And tag it so we and others can find it more easily.
56
UMBC an Honors University in Maryland 56 Here we are using it to get an answer to “Who knows Anupam Joshi”
57
UMBC an Honors University in Maryland 57 He has many friends!
58
UMBC an Honors University in Maryland 58
59
UMBC an Honors University in Maryland 59 This talk Motivation Semantic web 101 Swoogle Semantic Web search engine Use cases and applications State of the Semantic Web Conclusions
60
UMBC an Honors University in Maryland 60 Will it Scale? How? Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling System/dateTermsDocumentsIndividualsTriplesBytes Swoogle21.5x10 5 3.5x10 5 7x10 6 5x10 7 7x10 9 Swoogle32x10 5 7x10 5 1.5x10 7 7.5x10 7 1x10 10 20061x10 6 5x10 7 5x10 9 5x10 11 20085x10 6 5x10 9 5x10 11 5x10 13 We think Swoogle’s centralized approach can be made to work for the next few years if not longer.
61
UMBC an Honors University in Maryland 61 How much reasoning? SwoogleN (N<=3) does limited reasoning –It’s expensive –It’s not clear how much should be done More reasoning would benefit many use cases –e.g., type hierarchy Recognizing specialized metadata –E.g., that ontology A some maps terms from B to C
62
UMBC an Honors University in Maryland 62 This talk Motivation Semantic web 101 Swoogle Semantic Web search engine Use cases and applications State of the Semantic Web Conclusions
63
UMBC an Honors University in Maryland 63 Conclusion The web will contain the world’s knowledge in forms accessible to people and computers –We need better ways to discover, index, search and reason over SW knowledge SW search engines address different tasks than html search engines –So they require different techniques and APIs Swoogle like systems can help create consensus ontologies and foster best practices –Swoogle is for Semantic Web 1.0 –Semantic Web 2.0 will make different demands
64
UMBC an Honors University in Maryland 64 http://ebiquity.umbc.edu/ Annotated in OWL For more information
65
UMBC an Honors University in Maryland 65 backup
66
UMBC an Honors University in Maryland 66
67
UMBC an Honors University in Maryland 67
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.