Download presentation
Presentation is loading. Please wait.
Published byKellie Heath Modified over 9 years ago
1
UMBC an Honors University in Maryland 1 Knowledge Sharing on the Semantic Web Tim Finin University of Maryland, Baltimore County Department of Homeland Security Advanced Scientific Computing Program Text Analysis Workshop 25 May 2005 http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP. tell register tell register
2
UMBC an Honors University in Maryland 2 This talk Motivation The knowledge sharing problem Some ongoing projects –Finding knowledge on the web –Evaluating provenance and trust Conclusions
3
UMBC an Honors University in Maryland 3 “XML is Lisp's bastard nephew, with uglier syntax and no semantics. Yet XML is poised to enable the creation of a Web of data that dwarfs anything since the Library at Alexandria.” -- Philip Wadler, Et tu XML? The fall of the relational empire, VLDB, Rome, September 2001.
4
UMBC an Honors University in Maryland 4 “The web has made people smarter. We need to understand how to use it to make machines smarter, too.” -- Michael I. Jordan, paraphrased from a talk at AAAI, July 2002 by Michael Jordan (UC Berkeley)
5
UMBC an Honors University in Maryland 5 “The Semantic Web will globalize KR, just as the WWW globalize hypertext” -- Tim Berners Lee
6
UMBC an Honors University in Maryland 6 This talk Motivation The knowledge sharing problem Some ongoing projects –Finding knowledge on the web –Evaluating provenance and trust Conclusions
7
UMBC an Honors University in Maryland 7 Knowledge Sharing 1.0 In 1990 the DARPA knowledge sharing effort defined an approach for interoperability among KB systems and agents –KIF + Shared Ontologies + KQML It was (and is) a great vision that resulted in much good research and some sound standards –Supporting knowledge interoperability, agent communication, agent tasking and cooperation, etc. It never really made it out of the lab
8
UMBC an Honors University in Maryland 8 Knowledge Sharing 2.0 The Web is a Blob, consuming all in it’s path. Resistance is futile More seriously, it promotes sharing, building on other’s content, offering your content for building upon, decentralization, community development and evolution, common identifiers (URIs), using a working infrastructure, collaborating with industry, etc. These are significant advantages The Semantic Web can be the interlingua and infrastructure for interoperability and knowledge sharing.
9
UMBC an Honors University in Maryland 9 From where will the markup come? A few authors will add it manually. More will use annotation tools. –SMORE: Semantic Markup, Ontology and RDF Editor Intelligent processors (e.g., NLP) can understand documents and add markup (hard) –Machine learning powered information extraction tools show promise Lots of web content comes from databases & we can generate SW markup along with the HTML –See http://ebiquity.umbc.edu/
10
UMBC an Honors University in Maryland 10 From where will the markup come? In many tools, part of the metadata information is present, but thrown away at output –e.g., a business chart can be generated by a tool… –…it “knows” a chart’s structure, classification, etc. –…but, usually, this information is lost –…storing it in metadata is easy! So “semantic web aware” tools can produce lots of metadata –E.g., Adobe’s use of its XMP platform
11
UMBC an Honors University in Maryland 11 This talk Motivation The knowledge sharing problem Some ongoing projects –Finding knowledge on the web –Evaluating provenance and trust Conclusions
12
UMBC an Honors University in Maryland 12 Google has made us smarter Something similar is needed by people and software agents for information on the semantic web. tell register
13
UMBC an Honors University in Maryland 13 Why use IR techniques? We will want to retrieve over the structured and unstructured parts of a Semantic Wed Document (SWD) We should prepare for the appearance of text documents with embedded SW markup We may want to get our SWDs into conventional search engines, such as Google. IR techniques also have some unique characteristics that may be very useful e.g., ranking matches, measuring similarity between documents, relevance feedback, etc.
14
UMBC an Honors University in Maryland 14 title text
15
UMBC an Honors University in Maryland 15 Swoogle Architecture metadata creation data analysis interface SWD discovery SWD Metadata Web Service Web Server SWD Cache The Web Candidate URLs Web Crawler SWD Reader IR analyzerSWD analyzer Agent Service 340K SWDs, 48M triples, 97K classes, 55K properties, 7M individuals (April 2005)
16
Find “Time” Ontology We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology. Demo 1
17
Digest “Time” Ontology (document view) Demo 2(a)
18
Digest “Time” Ontology (term view) Demo 2(b) …………. TimeZone before intAfter
19
Find Term “Person” Demo 3 Not capitalized! URIref is case sensitive!
20
Digest Term “Person” Demo 4 167 different properties 562 different properties
21
Demo 5(a) Swoogle Today
22
UMBC an Honors University in Maryland 22 Swoogle’s Triple Store lets you shop And check out your triples into any of several reasoners
23
UMBC an Honors University in Maryland 23 Summary Swoogle (Mar, 2004) Swoogle2 (Sep, 2004) Swoogle3 (July 2005) Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search Triple shopping cart Better (re-)crawling strategies Better navigation models Index instance data More metadata (ontology mapping and OWL-S services ) Better web service interfaces IR component for string literals 2005 2004
24
UMBC an Honors University in Maryland 24 Will it Scale? And How? An open question is how well our approach will scale and what techniques will work as the semantic web grows. Here’s a rough estimate of the data on the semantic web based on Swoogle’s crawling System/dateTermsDocumentsIndividualsTriplesBytes Swoogle21.5x10 5 3.5x10 5 7x10 6 5x10 7 7x10 9 Swoogle31.75x10 5 5x10 5 1x10 7 7.5x10 7 1x10 10 20052.5x10 5 5x10 6 5x10 7 5x10 8 5x10 10 20085x10 5 5x10 7 5x10 8 5x10 9 5x10 11
25
UMBC an Honors University in Maryland 25 Harnessing Google Google started indexing RDF documents some time in late 2003 Can we take advantage of this? We’ve developed techniques to get some structured data to be indexed by Google And then later retrieved Technique: give Google enhanced documents with additional annotations containing Swangle Terms ™
26
UMBC an Honors University in Maryland 26 Swangle definition swan·gle Pronunciation: ‘swa[ng]-g&l Function: transitive verb Inflected Forms: swan·gled; swan·gling /-g(&-)li[ng]/ Etymology: Postmodern English, from C++ mangle, Date: 20 th century 1: to convert an RDF triple into one or more IR indexing terms 2: to process a document or query so that its content bearing markup will be indexed by an IR system Synonym: see tblify - swan·gler /-g(&-)l&r/ noun
27
UMBC an Honors University in Maryland 27 What’s the point? We’d like to get our documents into Google The Swangle terms look like words to Google and other search engines. We use cloaking to avoid having to modify the document –Add rules to the web server so that, when a search spider asks for document X the document swangled(X) is returned Caching makes this efficient
28
UMBC an Honors University in Maryland 28 This talk Motivation The knowledge sharing problem Some ongoing projects –Finding knowledge on the web –Evaluating provenance and trust Conclusions
29
UMBC an Honors University in Maryland 29 Levels of granularity on the Semantic Web The semantic web has several levels of granularity. We’re most familiar with documents and triples. We’ve been exploring the notion of an RDF Molecule as a “meaningful” collection of RDF triples. We believe that RDF molecules will be useful for: gathering evidence to verify an RDF graph and recording the provenance. Universal RDF Graph RDF Documents Named Graphs Molecules Triples
30
UMBC an Honors University in Maryland 30 RDF Molecules An RDF graph can be decomposed into subgraphs. A lossless decomposition is one in which the original graph can be recovered by concatenating the components. The presence of “blank nodes” limits our ability to completely reduce the graph to triples. RDF molecules are subgraphs which can not be further decomposed. RDF molecules are useful as minimal units of “evidence” in support of a graph.
31
UMBC an Honors University in Maryland 31 12 4 75 6 3 An RDF graph of interest
32
UMBC an Honors University in Maryland 32 12 4 75 6 3 4 5 6 12 3 12 4 4 75 An RDF graph of interest The graph’s molecules
33
UMBC an Honors University in Maryland 33 12 4 75 6 3 4 5 6 12 3 12 4 4 75 Web pages containing one or more molecules discovered by Swoogle An RDF graph of interest The graphs molecules
34
UMBC an Honors University in Maryland 34 Blank nodes cause RDF molecule http://foo.com/john John Smith foaf:name foaf:mbox @prefix foaf:. (http://foo.com/john foaf:name “John Smith”) (http://foo.com/john foaf:mbox mailto:john@foo.com) mailto:john@foo.com John Smith foaf:name foaf:mbox mailto:john@foo.com @prefix foaf:. ( ?x foaf:name “John Smith” ) ( ?x foaf:mbox mailto:john@foo.com ) G1: RDF graph without blank node G2: RDF graph with one blank node 2 molecules 1 molecule
35
UMBC an Honors University in Maryland 35 Impact of functional dependency Smith foaf:firstName foaf:mbox mailto:john@foo.com G3 John foaf:surname @prefix foaf:. (?x foaf:firstName “John") (?x foaf:surname “Smith") (?x foaf:mbox mailto:john@foo.com ) foaf:mbox an Inverse Functional Property? One molecule { } Two molecules { } t1 t2 t3 t1 t2 t3 N Y t1t2t3 t1t3 t2t3 Molecule(s) produced after functional decomposition
36
UMBC an Honors University in Maryland 36 Propagation of functional dependency @prefix foaf:. @prefix kin:. (?y foaf:surname "Wang") (?y kin:motherOf ?x) (?x foaf:name "Li Ding") (?x foaf:mbox mailto:dingli1@umbc.edu ) Wang foaf:surname Li Ding foaf:name kin:motherOf foaf:mbox t2 mailto:dingli1@umbc.edu G4 t1 t3 t4 t1 t2 t3 t4 Terminal Molecules { } { } Non-Terminal Molecules { } { } Contextual Molecule n/a Terminal Molecules { } { } Non-Terminal Molecules { } { } Contextual Molecule n/a t1 t4 t2t4 foaf:mbox and kin:motherOf are IFP t2t4 t3t4
37
UMBC an Honors University in Maryland 37 Beyond functional dependency Our examples relied on OWL inverse functional properties A more general (and realistic) approach will be based on probabilities At issue is the conditional probability that two blank nodes S1 and S2 are equivalence if each has a P property with value O. prob(S1=S2 | P(S1,O), P(S2,O)) A set of properties can be used to get a high probability, e.g., John Smith and J. Smith share the same home phone number and office phone number
38
UMBC an Honors University in Maryland 38 Utility of Molecules Why are RDF molecules interesting? Suppose we have a graph and we seek evidence from the web to verify it’s accuracy. –E.g., verifying the information in a foaf description. Approach: –decompose the graph into molecules –Search for instances of each using Swoogle4 –Note the source and provenance of each molecule
39
UMBC an Honors University in Maryland 39 This talk Motivation The knowledge sharing problem Some ongoing projects –Finding knowledge on the web –Evaluating provenance and trust Conclusion
40
UMBC an Honors University in Maryland 40 Conclusion The web will contain the world’s knowledge in forms accessible to people and computers We need better ways to discover, index, search and reason over SW knowledge Special attention must be applied to provenance and trust We must develop, deploy and build on open, non- proprietary standards for knowledge sharing. The W3C standards RDF and OWL are a foundation for the first generation
41
UMBC an Honors University in Maryland 41 http://ebiquity.umbc.edu/ Annotated in OWL For more information
42
UMBC an Honors University in Maryland 42 Nobody ever got fired for buying IBM
43
UMBC an Honors University in Maryland 43 Nobody ever got fired for choosing Web technology
44
UMBC an Honors University in Maryland 44 This talk Motivation The knowledge sharing problem Some ongoing projects –Finding knowledge on the web –Evaluating provenance and trust –NLP meets the semantic web Conclusions
45
UMBC an Honors University in Maryland 45 NLP meets the semantic web Agents can benefit from knowledge and informa- tion extracted by sophisticated NLP systems. NLP systems can make good use of facts published on the web. The semantic web provides both an interlingua and publication method for this information exchange We’re working on a system to translate information between OntoSem and OWL
46
UMBC an Honors University in Maryland 46 O2O System Architecture NL Text OntoSem Ontology Fact Repository TMR OntoSem2OWL OWL Ontology TMRs In OWL OWL2OntoSem
47
UMBC an Honors University in Maryland 47 Issues Mismatch between NLP KR systems and Semantic Web KR languages languages, e.g. –Most NLP systems use default reasoning –Relaxing constraints for metaphorical readings Practical ontology mapping systems need to be developed –Combining distributed, partial maps is an interesting idea
48
UMBC an Honors University in Maryland 48 Types of molecule @prefix foaf:. (http://www.cs.umbc.edu/~dingli1 foaf:name "Li Ding") (http://www.cs.umbc.edu/~dingli1 foaf:knows ?x ) (?x foaf:name "Tim Finin") (?x foaf:mbox mailto:finin@umbc.edu) (?x foaf:mbox mailto:finin@cs.umbc.edu) G4 http://www.cs.umbc.edu/~dingli1 Li Ding Tim Finin foaf:knows foaf:name foaf:mbox t1 t2 t3 t4 t5 Terminal Molecules { } { } { } Non-Terminal Molecules { } { } Contextual Molecule n/a Terminal Molecules { } { } { } Non-Terminal Molecules { } { } Contextual Molecule n/a Terminal Molecule { } Non-Terminal Molecule n/a Contextual Molecule { } Terminal Molecule { } Non-Terminal Molecule n/a Contextual Molecule { } t1 t4t5 t2t4t2t5 t3t4t3t5t4t5 t1 t2t3t4t5 foaf:mbox is not IFP foaf:mbox is IFP mailto:finin@cs.umbc.edu mailto:finin@umbc.edu t1 t2 t3 t4 t5
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.