Download presentation
Presentation is loading. Please wait.
Published bySilas Watson Modified over 9 years ago
1
Creating and Exploiting a Web of Semantic Data Tim Finin, UMBC Earth and Space Science Informatics Workshop 05 August 2009 http://ebiquity.umbc.edu/resource/html/id/272/
2
Overview Introduction Semantic Web 101 Recent Semantic Web trends Examples: DBpedia, Wikitology Conclusion
3
The Age of Big Data Massive amounts of data is available today Advances in many fields driven by availability of unstructured data, e.g., text, audio, images Increasingly, large amounts of structured and semi-structured data is also online Much of this available in the Semantic Web language RDF, fostering integration and interoperability Such structured data is especially important for the sciences
4
Twenty years ago… Tim Berners-Lee’s 1989 WWW proposal described a web of rela- tionships among named objects unifying many information management tasks Capsule history Guha’s MCF (~94) XML+MCF=>RDF (~96) RDF+OO=>RDFS (~99) RDFS+KR=>DAML+OIL (00) W3C’s SW activity (01) W3C’s OWL (03) SPARQL, RDFa (08) Rules (09) http://www.w3.org/History/1989/proposal.html
5
Ten years ago …. The W3C started developing standards for the Semantic Web The vision, technology and use cases are still evolving Moving from a web of documents to a web of data
6
Today 4.5 billion integrated facts published on the Web as RDF Linked Open Data
7
Tomorrow Large collections of integrated facts published on the Web for many disciplines and domains
8
W3C’s Semantic Web Goal “The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” -- Berners-Lee, Hendler and Lassila, The Semantic Web, Scientific American, 2001
9
From a Web of linked documents
10
To a Web of linked data
11
Contrast with a non-Web approach The W3C Semantic Web approach is Distributed Open Non-proprietary Standards based
12
How can we share data on the Web? POX, Plain Old XML, is one approach, but it has deficiencies The Semantic Web languages RDF and OWL offer a simpler and more abstract data model (a graph) that is better for integration Its well defined semantics supports knowledge modeling and inference Supported by a stable, funded standards organization, the World Wide Web Consortium
13
Simple RDF Example http://umbc.edu/ ~finin/talks/idm02/ “Intelligent Information Systems on the Web and in the Aether” http://umbc.edu/ dc:Title dc:Creator bib:Aff “Tim Finin” “finin@umbc.edu” bib:name bib:email Note: “blank node”
14
The RDF Data Model An RDF document is an unordered collection of statements, each with a subject, predicate and object Such triples can be thought of as a labelled arc in a graph Statements describe properties of resources A resource is any object that can be referenced or denoted by a URI Properties themselves are also resources (URIs) Dereferencing a URI produces useful additional information, e.g., a definition or additional facts
15
RDF is the first SW language XML Encoding Graph stmt(docInst, rdf_type, Document) stmt(personInst, rdf_type, Person) stmt(inroomInst, rdf_type, InRoom) stmt(personInst, holding, docInst) stmt(inroomInst, person, personInst) Triples RDF Data Model Good for Machine processing Good for human viewing Good for storage and reasoning RDF is a simple language for graph based representations
16
XML encoding for RDF <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bib="http://daml.umbc.edu/ontologies/bib/"> Intelligent Information … and in the Aether Tim Finin finin@umbc.edu http://umbc.edu/ ~finin/talks/idm02/ “Intelligent Information Systems on the Web and in the Aether” http://umbc.edu/ dc:Title dc:Creator bib:Aff “Tim Finin” “finin@umbc.edu” bib:name bib:email
17
N3 is a friendlier encoding @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#. @prefix dc: http://purl.org/dc/elements/1.1/. @prefix bib: http://daml.umbc.edu/ontologies/bib/. dc:title "Intelligent... and in the Aether" ; dc:creator [ bib:Name "Tim Finin"; bib:Email "finin@umbc.edu" bib:Aff: "http://umbc.edu/" ]. http://umbc.edu/ ~finin/talks/idm02/ “Intelligent Information Systems on the Web and in the Aether” http://umbc.edu/ dc:Title dc:Creator bib:Aff “Tim Finin” “finin@umbc.edu” bib:name bib:email
18
RDFS supports simple inferences RDF Schema adds vocabulary for classes, properties & constraints An RDF ontology plus some RDF statements may imply additional RDF statements (not possible in XML) Note that this is part of the data model and not of the accessing or processing code. @prefix rdfs:. @prefix :. parent a rdf: property; rdfs:domain person; rdfs:range person. mother rdfs:subProperty parent; rdfs:domain woman; rdfs:range person. eve mother cain. person a class. woman subClass person. mother a property. eve a person; a woman; parent cain. cain a person.
19
OWL adds further richness OWL adds richer representational vocabulary, e.g. – parentOf is the inverse of childOf – Every person has exactly one mother – Every person is a man or a woman but not both – A man is the equivalent of a person with a sex property with value “male” OWL is based on ‘description logic’ – a logic subset with efficient reasoners that are complete – Good algorithms for reasoning about descriptions
20
That was then, this is now 1996-2000: focus on RDF and data 2000-2007: focus on OWL, developing ontologies, sophisticated reasoning 2008-…: Integrating and exploiting large RDF data collections backed by lightweight ontologies
21
A Linked Data story Wikipedia as a source of knowledge – Wikis are a great ways to collaborate on building up knowledge resources Wikipedia as an ontology – Every Wikipedia page is a concept or object Wikipedia as RDF data – Map this ontology into RDF DBpedia as the lynchpin for Linked Data – Exploit its breadth of coverage to integrate things
22
Populating Freebase KB
23
Underlying Powerset’s KB
24
Mined by TrueKnowledge
25
Wikipedia as an ontology Using Wikipedia as an ontology – each article (~3M) is an ontology concept or instance – terms linked via category system (~200k), infobox template use, inter-article links, infobox links – Article history contains metadata for trust, provenance, etc. It’s a consensus ontology with broad coverage Created and maintained by a diverse community for free! Multilingual Very current Overall content quality is high
26
Wikipedia as an ontology Uncategorized and miscategorized articles Many ‘administrative’ categories: articles needing revision; useless ones: 1949 births Multiple infobox templates for the same class Multiple infobox attribute names for same property No datatypes or domains for infobox attribute values etc.
27
Dbpedia : Wikipedia in RDF A community effort to extract structured information from Wikipedia and publish as RDF on the Web Effort started in 2006 with EU funding Data and software open sourced DBpedia doesn’t extract information from Wikipedia’s text, but from the its structured information, e.g., links, categories, infoboxes
28
DBpedia: Linked Data lynchpin
29
http://lookup.dbpedia.org/
33
Dbpedia uses WP structured data DBpedia extracts structured data from Wikipedia, especially from Infoboxes
34
Dbpedia ontology Dbpedia 3.2 (Nov 2008) added a manually constructed ontology with –170 classes in a subsumption hierarchy –880K instances – 940 properties with domain and range A partial, manual mapping was constructed from infobox attributes to these term Current domain and range constraints are “loose” Namespace: http://dbpedia.org/ontology/http://dbpedia.org/ontology/ Place248,000 Person 214,000 Work 193,000 Species 90,000 Org. 76,000 Building 23,000
35
http://dbpedia.org/sparql/ PREFIX dbp: PREFIX dbpo: SELECT distinct ?Property ?Place WHERE {dbp:Barack_Obama ?Property ?Place. ?Place rdf:type dbpo:Place.}
36
DBpedia: Linked Data lynchpin
37
Consider Baltimore, MD
38
Looking at the RDF description We find assertions equating DBpedia's object for Baltimore with those in other LOD datasets: dbpedia:Baltimore%2C_Maryland owl:sameAs census:us/md/counties/baltimore/baltimore; owl:sameAs cyc:concept/Mx4rvVin-5wpEbGdrcN5Y29ycA; owl:sameAs freebase:guid.9202a8c04000641f800000000004921a; owl:sameAs geonames:4347778/. Since owl:sameAs is defined as an equivalence relation, the mapping works both ways
39
Linked Data Cloud, March 2009
40
Four principles for linked data Use URIs to identify things that you expose to the Web as resources Use HTTP URIs so that people can locate and look up (dereference) these things. When someone looks up a URI, provide useful information Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web -- Tim Berners-Lee, 2006
41
4.5 billion triples for free The full public LOD dataset has about 4.5 billion triples as of March 2009 Linking assertions are spotty, but probably include order 10M equivalences Availability: – download the data in RDF – Query it via a public SPARQL servers – load it as an Amazon EC2 public dataset – Launch it and required software as an Amazon public AMI image
42
Wikitology We’ve been exploring a different approach to derive an ontology from Wikipedia through a series of use cases: – Identifying user context in a collaboration system from documents viewed (2006) – Improve IR accuracy by adding Wikitology tags to documents (2007) – ACE: cross document co-reference resolution for named entities in text (2008) – TAC KBP: Knowledge Base population from text (2009) – Improve Web search engine by tagging documents and queries (2009)
43
Wikitology 2.0 (2008) WordNet Yago Human input & editingDatabases Freebase KB RDF textgraphs
44
Conclusion The Semantic Web approach is a powerful approach for data interoperability and integration The research focus is shifting to a “Web of Data” perspective Many research issue remain: uncertainty, provenance, trust, parallel graph algorithms, reasoning over billions of triples, user-friendly tools, etc. Just as the Web enhances human intelligence, the Semantic Web will enhance machine intelligence The ideas and technology are still evolving
45
http://ebiquity.umbc.edu/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.