Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tutorial at WWW 2011 Scalable Integration and Processing of Linked Data Andreas Harth, Aidan Hogan, Spyros Kotoulas, Jacopo Urbani.

Similar presentations


Presentation on theme: "Tutorial at WWW 2011 Scalable Integration and Processing of Linked Data Andreas Harth, Aidan Hogan, Spyros Kotoulas, Jacopo Urbani."— Presentation transcript:

1 Tutorial at WWW 2011 Scalable Integration and Processing of Linked Data Andreas Harth, Aidan Hogan, Spyros Kotoulas, Jacopo Urbani

2 2 Outline Session 1: Introduction to Linked Data Foundations and Architectures Crawling and Indexing Querying Session 2: Integrating Web Data with Reasoning Introduction to RDFS/OWL on the Web Introduction and Motivation for Reasoning Session 3: Distributed Reasoning: Because Size Matters Problems and Challenges MapReduce and WebPIE Session 4: Putting Things Together (Demo) The LarKC Platform Implementing a LarKC Workflow

3 3 PART I: How can we query Linked Data? PART 2: How can we reason over Linked Data? (start of Session 2)

4 4 Answer: SPARQL (W3C Rec. 2008) …SPARQL 1.1 upcoming (W3C Rec. 201?)

5 5 SPARQL Protocol and RDF Query Language (SPARQL) Introducing SPARQL Standardised query language (and supporting recommendations) for querying RDF ~SQL-like language …but only if you squint …and without the vendor-specific headaches

6 6 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } } ORDER BY ?surname The anatomy of a typical SPARQL query Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname PREFIX DECLARATIONS RESULT CLAUSE QUERY CLAUSE SOLUTION MODIFIERS DATASET CLAUSE ; foaf:familyName ?surname.

7 7 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } } ORDER BY ?surname The anatomy of a typical SPARQL query Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname PREFIX DECLARATIONS RESULT CLAUSE QUERY CLAUSE SOLUTION MODIFIERS DATASET CLAUSE

8 8 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: Prefix Declarations foaf:Person ⇔ Use http://prefix.cc/ …http://prefix.cc/ PREFIX DECLARATIONS

9 9 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } } ORDER BY ?surname The anatomy of a typical SPARQL query Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname PREFIX DECLARATIONS RESULT CLAUSE QUERY CLAUSE SOLUTION MODIFIERS DATASET CLAUSE

10 10 SELECT ?name ?expertise Result Clause 1. SELECT 2. CONSTRUCT (RDF) 3. ASK 4. DESCRIBE (RDF) RESULT CLAUSE

11 11 Return all tuples for the bindings of the variables ?name and ?expertise ----------------------------------------------------------- | “Professor Robert Allen” | “Control engineering” | | “Professor Robert Allen” | “Biomedical engineering” | | “Prof Carl Leonetto Amos” | | | “Professor Peter Ashburn” | “Silicon technology” | | “Professor Robert Allen” | “Control engineering” | ----------------------------------------------------------- Result Clause 1. SELECT … SELECT ?name ?expertise RESULT CLAUSE Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname

12 12 Return all tuples for the bindings of the variables ?name and ?expertise ----------------------------------------------------------- | “Professor Robert Allen” | “Control engineering” | | “Professor Robert Allen” | “Biomedical engineering” | | “Prof Carl Leonetto Amos” | | | “Professor Peter Ashburn” | “Silicon technology” | | “Professor Robert Allen” | “Control engineering” | ----------------------------------------------------------- ?name ?expertise SELECT Result Clause 1. SELECT DISTINCT … DISTINCT unique Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname

13 13 CONSTRUCT { ?person foaf:name ?name ; ex:expertise ?expertise. } Return RDF using bindings for the variables: ex:RAllen foaf:name “Professor Robert Allen” ; ex:expertise “Biomedical engineering”, “Control engineering”. ex:PAshburn foaf:name “Peter Ashburn ” ; ex:expertise “Silicon technology”. Result Clause 2. CONSTRUCT … RESULT CLAUSE Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname

14 14 ASK … WHERE { … } Is there any results? Returns: true or false Result Clause 3. ASK … RESULT CLAUSE

15 15 DESCRIBE ?person … WHERE { ?person … } Returns some RDF which “describes” the given resource… No standard for what to return! Typically returns: Result Clause 4. DESCRIBE … RESULT CLAUSE all triples where the given resource appears as subject and/or object OR Concise Bounded Descriptions…

16 16 DESCRIBE ex:RAllen (…can give URIs directly without need for a WHERE clause.) Result Clause 4. DESCRIBE (DIRECT) … RESULT CLAUSE

17 17 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } } ORDER BY ?surname The anatomy of a typical SPARQL query Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname PREFIX DECLARATIONS RESULT CLAUSE QUERY CLAUSE SOLUTION MODIFIERS DATASET CLAUSE

18 18 FROM NAMED Dataset clause ( FROM / FROM NAMED ) DATASET CLAUSE (Briefly) Restrict the dataset against which you wish to query SPARQL stores named graphs: sets of triples which are associated with (URI) names Can match across graphs! Named graphs typically corrrespond with data provenance (i.e., documents)! Default graph typically corresponds to the merge of all graphs Many engines will typically dereference a graph if not available locally! Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname

19 19 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } } ORDER BY ?surname The anatomy of a typical SPARQL query Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname PREFIX DECLARATIONS RESULT CLAUSE SOLUTION MODIFIERS DATASET CLAUSE WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } QUERY CLAUSE

20 20 WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } Query clause ( WHERE ) QUERY CLAUSE Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname “Professor Peter Ashburn” “Silicon technology” “Professor” ✓ ex:PAshburn ex:Silicon ✓ “Ashburn”

21 21 WHERE { … {?person oo:availableToCommentOn ?expertiseURI. } UNION {?person foaf:interest ?expertiseURI. } … } Quick mention for UNION QUERY CLAUSE Represent disjunction (OR) Useful when there’s more than one property/class that represents the same information you’re interested in (heterogenity) Reasoning can also help, assuming terms are mapped (more later)

22 22 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } } ORDER BY ?surname The anatomy of a typical SPARQL query Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname PREFIX DECLARATIONS RESULT CLAUSE SOLUTION MODIFIERS DATASET CLAUSE QUERY CLAUSE

23 23 ORDER BY ?surname Solution Modifiers Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname SOLUTION MODIFIERS Order output results by surname (as you probably guessed) LIMIT OFFSET ORDER BY ?surname LIMIT 10 SOLUTION MODIFIERS ORDER BY ?surname LIMIT 10 OFFSET 20 SOLUTION MODIFIERS Only return 10 results Return results 20 ‒ 30 …also…

24 24 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name ; foaf:familyName ?surname. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } } ORDER BY ?surname Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname PREFIX DECLARATIONS RESULT CLAUSE QUERY CLAUSE SOLUTION MODIFIERS DATASET CLAUSE What are you looking for? Which results do you want? Where should we look? How should results be ordered/split? Shortcuts for URIs The summary of a typical SPARQL query

25 25 PREFIX rdf: PREFIX rdfs: PREFIX foaf: PREFIX oo: SELECT ?name ?expertise FROM NAMED WHERE { ?person foaf:name ?name. ?person rdf:type foaf:Person. ?person foaf:title ?title. FILTER regex(?title, "^Prof") OPTIONAL { ?person oo:availableToCommentOn ?expertiseURI. ?expertiseURI rdfs:label ?expertise } ORDER BY ?surname Trying out a typical SPARQL query Give me a list of names of professors in Southampton and their expertise (if available), in order of their surname ; foaf:familyName ?surname.

26 26 SparqlEndpoints (W3C Wiki) http://www.w3.org/wiki/SparqlEndpoints (or just use Google) List of Public SPARQL Endpoints:

27 27 SPARQL 1.1 Currently a W3C Working Draft http://www.w3.org/TR/sparql11-query/ (or just use Google) Coming Soon:

28 28 “SPARQL by example” By Cambridge Semantics Lee Feigenbaum & Eric Prud'hommeaux http://www.cambridgesemantics.com/2008/09/sparql-by-example/ (or just use Google) Highly recommend checking out:

29 29 After the break… Session 1: Introduction to Linked Data Foundations and Architectures Crawling and Indexing Querying Session 2: Integrating Web Data with Reasoning Introduction to RDFS/OWL on the Web Introduction and Motivation for Reasoning Session 3: Distributed Reasoning: Because Size Matters Problems and Challenges MapReduce and WebPIE Session 4: Putting Things Together (Demo) The LarKC Platform Implementing a LarKC Workflow

30 30 Question: Find the people who have won both an academy award for best director and a raspberry award for worst director Endpoint: (that is, if you want to use SPARQL… feel free to use whatever) http://dbpedia.org/sparql/ or http://google.com/ (to make it fair)http://google.com/ Hint: Look at http://dbpedia.org/page/Michael_Bayhttp://dbpedia.org/page/Michael_Bay and http://dbpedia.org/page/Woody_Allen for exampleshttp://dbpedia.org/page/Woody_Allen (The same prefixes therein are understood by the endpoint, …so no need to declare them in the query) During the break…

31 31 The Winning (?) Query: SELECT DISTINCT ?name WHERE{ ?director dcterms:subject category:Worst_Director_Golden_Raspberry_Award_winners, category:Best_Director_Academy_Award_winners ; foaf:name ?name. } The Answer: … And the answer is…

32 32 PART I: How can we query Linked Data? PART 2: How can we reason over Linked Data? …and why?!

33 33 … A Web of Data Images from: http://richard.cyganiak.de/2007/10/lod/; Cyganiak, Jentzschhttp://richard.cyganiak.de/2007/10/lod/ September 2010 August 2007 November 2007 February 2008 March 2008 September 2008 March 2009 July 2009

34 34 Reasoning explicit data implicit data How can consumers query the implicit data

35 35 …so what’s The Problem ? … …heterogeneity …need to integrate data from different sources

36 36 Take Query Answering… Gimme webpages relating to Tim Berners-Lee foaf:page timbl:i timbl:i foaf:page ?pages.

37 37 Hetereogenity in schema… webpage: properties foaf:page foaf:homepage foaf:isPrimaryTopicOf foaf:weblog doap:homepage foaf:topic foaf:primaryTopic mo:musicBrainz mo:myspace … = rdfs:subPropertyOf = owl:inverseOf

38 38 Linked Data, RDFS and OWL: Linked Vocabularies … … Image from http://blog.dbtune.org/public/.081005_lod_constellation_m.jpg:; Giasson, Bergman

39 39 Hetereogenity in naming… Tim Berners-Lee: URIs … timbl:i dblp:100007 identica:45563 adv:timblfb:en.tim_berners-lee db:Tim-Berners_Lee = owl:sameAs

40 40 Returning to our simple query… Gimme webpages relating to Tim Berners-Lee foaf:page timbl:i timbl:i foaf:page ?pages.... 7 x 6 = 42 possible patterns foaf:homepage foaf:isPrimaryTopicOf doap:homepage foaf:topic foaf:primaryTopic mo:myspace dblp:100007 identica:45563 adv:timbl fb:en.tim_berners-lee db:Tim-Berners_Lee

41 41 …reasoning to the rescue?

42 42 Challenges… …what (OWL) reasoning is feasible for Linked Data?

43 43 Linked Data Reasoning: Challenges

44 44 Scalability At least tens of billions of statements (for the moment) Near linear scale!!! Noisy data Inconsistencies galore Publishing errors Linked Data Reasoning: Challenges

45 45 Challenges (Semantic Web Wikipedia Article) Some of the challenges for the Semantic Web include vastness, vagueness, uncertainty, inconsistency and deceit. Automated reasoning systems will have to deal with all of these issues in order to deliver on the promise of the Semantic Web. Vastness: The World Wide Web contains at least 48 billion pages as of this writing (August 2, 2009). The SNOMED CT medical terminology ontology contains 370,000 class names, and existing technology has not yet been able to eliminate all semantically duplicated terms. Any automated reasoning system will have to deal with truly huge inputs. Vagueness: These are imprecise concepts like "young" or "tall". This arises from the vagueness of user queries, of concepts represented by content providers, of matching query terms to provider terms and of trying to combine different knowledge bases with overlapping but subtly different concepts. Fuzzy logic is the most common technique for dealing with vagueness. Uncertainty: These are precise concepts with uncertain values. For example, a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probability. Probabilistic reasoning techniques are generally employed to address uncertainty. Inconsistency: These are logical contradictions which will inevitably arise during the development of large ontologies, and when ontologies from separate sources are combined. Deductive reasoning fails catastrophically when faced with inconsistency, because "anything follows from a contradiction". Defeasible reasoning and paraconsistent reasoning are two techniques which can be employed to deal with inconsistency. Deceit: This is when the producer of the information is intentionally misleading the consumer of the information. Cryptography techniques are currently utilized to ameliorate this threat. Linked Data Reasoning: Challenges

46 46 Proposition 1 Web data is noisy. Proof: 08445a31a78661b5c746feff39a9db6e4e2cc5cf sha1-sum of ‘mailto:’ common value for foaf:mbox_sha1sum An inverse-functional (uniquely identifying) property!!! Any person who shares the same value will be considered the same Q.E.D. Noisy Data: Omnipotent Being

47 47 Alternate proof (courtesy of http://www.eiao.net/rdf/1.0)http://www.eiao.net/rdf/1.0 rdf:type rdf:type owl:Property. rdf:type rdfs:label “type”@en. rdf:type rdfs:comment “Type of resource”. rdf:type rdfs:domain eiao:testRun. rdf:type rdfs:domain eiao:pageSurvey. rdf:type rdfs:domain eiao:siteSurvey. rdf:type rdfs:domain eiao:scenario. rdf:type rdfs:domain eiao:rangeLocation. rdf:type rdfs:domain eiao:startPointer. rdf:type rdfs:domain eiao:endPointer. rdf:type rdfs:domain eiao:header. rdf:type rdfs:domain eiao:runs. Noisy Data: Redefining everything …and home in time for tea

48 48 foaf:Person owl:disjointWith foaf:Document. Inconsistent Data: Cannot compute…

49 49 …herein, we look at (monotonic) rules. Expressive reasoning (also) possible through tableaux, but yet to demonstrate desired scale

50 50 Rules IF ⇒ THEN Body/Antecedent/ConditionHead/Consequent ?c 1 rdfs:subClassOf ?c 2. ?x rdf:type ?c 1. ⇒ ?x rdf:type ?c 2. foaf:Person rdfs:subClassOf foaf:Agent. timbl:me rdf:type foaf:Person. ⇒ timbl:me rdf:type foaf:Agent. Schema/Terminology/ Ontological Instance/Assertional

51 51 Rules (Inconsistencies [a.k.a. Contradictions]) IF ⇒ THEN ?c 1 owl:disjointWith ?c 2. ?x rdf:type ?c 1. ?x rdf:type ?c 2. ⇒ false foaf:Person owl:disjointWith foaf:Document. ex:sleepygirl rdf:type foaf:Person. ex:sleepygirl rdf:type foaf:Document. ⇒ false Body/Antecedent/ConditionHead/Consequent

52 52 Materialisation (Forward-Chaining): Write the consequences of the rules down Executing rules: Materialisation

53 53 Materialisation Forward-chaining Materialisation Avoid runtime expense Users taught impatience by Google Pre-compute for quick retrieval Web-scale systems should scale well More data = more disk-space/machines

54 54 INPUT: Flat file of triples (quads) OUTPUT: Flat file of (partial) inferred triples (quads)

55 55 “Standard” RDFS OWL 2 RL (W3C Rec: 27 Oct. 2009) “Non-standard” DLP pD* (OWL Horst) OWL – … What rulesets?

56 56 Let’s look at a recent corpus of Linked Data and see what schema’s inside (and what the rulesets support) Open-domain crawl May 2010 1.1 billion quadruples 3.985 million sources (docs) 780 pay-level domains (e.g., dbpedia.org ) Ran “special” PageRank over documents 86 thousand docs contained some RDFS/OWL schema data (2.2% of docs... but <0.2% of triples) Summated ranks of docs using each primitive What rules?

57 57 Survey of Linked Data schema: Top 15 ranks #AxiomRank(Σ)RDFSHorstO2R 1.rdfs:subClassOf 0.295 ✓ ✓ ✓ 2.rdfs:range 0.294 ✓ ✓ ✓ 3.rdfs:domain 0.292 ✓ ✓ ✓ 4.rdfs:subPropertyOf 0.090 ✓ ✓ ✓ 5.owl:FunctionalProperty 0.063 ✘ ✓ ✓ 6.owl:disjointWith 0.049 ✘ ✘ ✓ 7.owl:inverseOf 0.047 ✘ ✓ ✓ 8.owl:unionOf 0.035 ✘ ✘ ✓ 9.owl:SymmetricProperty 0.033 ✘ ✓ ✓ 10.owl:TransitiveProperty 0.030 ✘ ✓ ✓ 11.owl:equivalentClass 0.021 ✘ ✓ ✓ 12.owl:InverseFunctionalProperty 0.030 ✘ ✓ ✓ 13.owl:equivalentProperty 0.030 ✘ ✓ ✓ 14.owl:someValuesFrom 0.030 ✘ ✓ ✓ 15.owl:hasValue 0.028 ✘ ✓ ✓

58 58 What about noise? … …need to consider the provenance of Web data

59 59 Consider source of schema data Class/property URIs dereference to their authoritative document FOAF spec authoritative for foaf:Person ✓ MY spec not authoritative for foaf:Person ✘ Allow “extension” in third-party documents my:Person rdfs:subClassOf foaf:Person. (MY spec) ✓ BUT: Reduce obscure memberships foaf:Person rdfs:subClassOf my:Person. (MY spec) ✘ ALSO: Protect specifications foaf:knows a owl:SymmetricProperty. (MY spec) ✘ Authoritative Reasoning

60 60 More proof (courtesy of http://www.eiao.net/rdf/1.0)http://www.eiao.net/rdf/1.0 rdf:type rdf:type owl:Property. rdf:type rdfs:label “type”@en. rdf:type rdfs:comment “Type of resource”. rdf:type rdfs:domain eiao:testRun. rdf:type rdfs:domain eiao:pageSurvey. rdf:type rdfs:domain eiao:siteSurvey. rdf:type rdfs:domain eiao:scenario. rdf:type rdfs:domain eiao:rangeLocation. rdf:type rdfs:domain eiao:startPointer. rdf:type rdfs:domain eiao:endPointer. rdf:type rdfs:domain eiao:header. rdf:type rdfs:domain eiao:runs. 60 Noisy Data: Redefining everything …and home in time for tea

61 61 Gong Cheng, Yuzhong Qu. "Integrating Lightweight Reasoning into Class-Based Query Refinement for Object Search." ASWC 2008. Aidan Hogan, Andreas Harth, Axel Polleres. "Scalable Authoritative OWL Reasoning for the Web." IJSWIS 2009. Aidan Hogan, Jeff Z. Pan, Axel Polleres and Stefan Decker. "SAOR: Template Rule Optimisations for Distributed Reasoning over 1 Billion Linked Data Triples." ISWC 2010. My thesis: http://aidanhogan.com/docs/thesis/http://aidanhogan.com/docs/thesis/ (or use Google). Authoritative Reasoning: read more … w/ essential plugs

62 62 Quarantined reasoning! Separate and cache hierarchy of schema documents/dependencies… Alternative to Authoritative Reasoning?

63 63 Quarantined Reasoning [Delbru et al.; 2008]

64 64 Quarantined Reasoning [Delbru et al.; 2008]

65 65 Quarantined Reasoning [Delbru et al.; 2008]

66 66 A-Box / Instance Data (e.g, a FOAF file) T-Box / Ontology Data (e.g., the FOAF ontology and its indirect imports) Quarantined Reasoning [Delbru et al.; 2008]

67 67 More proof (courtesy of http://www.eiao.net/rdf/1.0)http://www.eiao.net/rdf/1.0 rdf:type rdf:type owl:Property. rdf:type rdfs:label “type”@en. rdf:type rdfs:comment “Type of resource”. rdf:type rdfs:domain eiao:testRun. rdf:type rdfs:domain eiao:pageSurvey. rdf:type rdfs:domain eiao:siteSurvey. rdf:type rdfs:domain eiao:scenario. rdf:type rdfs:domain eiao:rangeLocation. rdf:type rdfs:domain eiao:startPointer. rdf:type rdfs:domain eiao:endPointer. rdf:type rdfs:domain eiao:header. rdf:type rdfs:domain eiao:runs. Noisy Data: Redefining everything …and home in time for tea

68 68 R. Delbru, A. Polleres, G. Tummarello and S. Decker. "Context Dependent Reasoning for Semantic Documents in Sindice. “ 4th International Workshop on Scalable Semantic Web Knowledge Base Systems, 2008. Quarantined Reasoning: read more

69 69 …what about owl:sameAs ?

70 70 Consolidation for Linked Data

71 71 Use provided owl:sameAs mappings in the data timbl:i owl:sameas identica:45563. dbpedia:Berners-Lee owl:sameas identica:45563. Store “equivalences” found timbl:i-> identica:45563-> dbpedia:Berners-Lee-> timbl:i identica:45563 dbpedia:Berners-Lee Consolidation: Baseline

72 72 For each set of equivalent identifiers, choose a canonical term timbl:i identica:45563 dbpedia:Berners-Lee Consolidation: Baseline

73 73 Afterwards, rewrite identifiers to their canonical version: Canonicalisation timbl:i rdf:type foaf:Person. identica:48404 foaf:knows identica:45563. dbpedia:Berners-Lee dpo:birthDate “1955-06-08”^^xsd:date. dbpedia:Berners-Lee rdf:type foaf:Person. identica:48404 foaf:knows dbpedia:Berners-Lee. dbpedia:Berners-Lee dpo:birthDate “1955-06-08”^^xsd:date. timbl:i identica:45563 dbpedia:Berners-Lee

74 74 Infer owl:sameAs through reasoning (OWL 2 RL/RDF) 1.explicit owl:sameAs (again) 2.owl:InverseFunctionalProperty 3.owl:FunctionalProperty 4.owl:cardinality 1 / owl:maxCardinality 1 foaf:homepage a owl:InverseFunctionalProperty. timbl:i foaf:homepage w3c:timblhomepage. adv:timbl foaf:homepage w3c:timblhomepage. ⇒ timbl:i owl:sameas adv:timbl. …then apply consolidation as before Extended Consolidation

75 75 For our Linked Data corpus: 1.~12 million explicit owl:sameAs triples (as before) 2.~8.7 million thru. owl:InverseFunctionalProperty 3.~106 thousand thru. owl:FunctionalProperty 4.none thru. owl:cardinality / owl:maxCardinality In terms of equivalences found (baseline vs. extended): ~2.8 million sets of equivalent identifiers (1.31x baseline) ~14.86 million identifiers involved (2.58x baseline) ~5.8 million URIs !!(1.014x baseline)!! Consolidation: Results

76 76 Conclusion…

77 77 Heterogeneity poses a significant problem for consuming Linked Data 1.Heterogenity in schema 2.Heterogenity in naming …but we can use the mappings provided by publishers to integrate heterogeneous Linked Data corpora (with a little caution) 1.Lightweight rule-based reasoning can go a long way 2.Deceit/Noise ≠ End Of World Consider source of data! 3.Inconsistency ≠ End Of World Useful for finding noise in fact! 4.Explicit owl:sameAs vs. extended consolidation: Extended consolidation mostly (but not entirely) for consolidating blank-nodes from older FOAF exporters Conclusions

78 78 How can we reason at Web scale? Scalable/distributed rule-based materialisation over MapReduce using the WebPIE system Next up…

79 79 timbl:i foaf:page ?pages. timbl:i identica:45563 dbpedia:Berners-Lee dbpedia:Berners-Lee foaf:page ?pages.

80 80 Authoritative Reasoning (Appendix) OWL 2 RL rule prp-inv1 ?p 1 owl:inverseOf ?p 2. ?x ?p 1 ?y. ⇒ ?y ?p 2 ?x. OWL 2 RL rule prp-inv2 ?p 1 owl:inverseOf ?p 2. ?x ?p 2 ?y. ⇒ ?y ?p 1 ?x. TBOX: foo:doesntKnow owl:inverseOf foaf:knows. (from foo: ) ABOX: bar:Aidan foo:doesntKnow bar:Axel. bar:Stefan foaf:knows bar:Jeff. AUTHORITATIVE INFERENCE: bar:Axel foaf:knows bar:Aidan. bar:Jeff foo:doesntKnow bar:Stefan. ✓ ✘


Download ppt "Tutorial at WWW 2011 Scalable Integration and Processing of Linked Data Andreas Harth, Aidan Hogan, Spyros Kotoulas, Jacopo Urbani."

Similar presentations


Ads by Google