© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Uncertainty reasoning for Linked Data Dave Reynolds
21 March 2016 Uncertainty reasoning for linked data Linked data - a strikingly successful model for exploiting semantic web technology exhibits uncertainty related issues: ambiguity, misalignment, reliability what approach could we take address this? without losing the simplicity which has enabled significant adoption
Linked data 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4. Include links to other URIs. so that they can discover more things
Uncertainty in linked data 1. Misalignment of instance matches link datasets by resolving co-references and publishing links links published as owl:sameAs (all or nothing) match errors: − match uncertainties not accessible − erroneous assumptions (e.g. clinical trial example) can partly address by use of skos mapping vocabulary
Uncertainty in linked data 2. Ambiguity from merging datasets datasets have different assumptions, definitions, context (esp. time) for different measures leads to multiple different values E.g. dbo:populationMetro ; dbp:populationMetro “12,300,000 to 13,945,000”; dbo:populationTotal ; owl:sameAs. :population
Uncertainty in linked data 3. Other issues Misalignment of models − e.g. freebase/dbpedia links generated (temporary) problems :Musician owl:equivalentClass :Person Source reliability − not unique to linked data but amplifies it
Mitigation approaches? 1. Weighted link vocabulary Develop a simple, common vocabulary for expressing uncertain co-reference links Clients or intermediates can choose how to match the link evidence to equivalence assertions void:LinkSet a ur:UncertainLinkSet ur:matchAlorithm alg:JaroStringMatch. [a ur:WeightedLink; ur:target ; ur:match ; ur:weight 0.7] …
Mitigation approaches? 2. Imprecise value vocabulary Develop a simple, common vocabulary for expressing imprecise values that can arise from known measurement uncertainty or merge ambiguity :London :population [a ur:ImpreciseValue :sampleValue [:value ; :source :dbpedia; :context :year2009]; :sampleValue [:value ; :source :okkam; :context :year2008]; :estimatedValue ].
Mitigation approaches? 3. Override graphs Allow clients to chose which parts of merged data sources they adopt (“trust”) and publish that decision Allow clients to publish deltas to public datasets correcting merge or other artefacts – per-link and per-assertion granularity ur:argGraph ur:ComputedDataSet ur:Combinator ur:Difference Union void:DataSet
Conclusion multiple issues in ambiguity and uncertainty in linked data proposed problems and solutions illustrative rather than definitive − low hanging fruit − area ripe for contribution