Download presentation
Presentation is loading. Please wait.
Published byDustin McCarthy Modified over 9 years ago
1
Probabilistic RDF Octavian Udrea 1 V.S. Subrahmanian 1 Zoran Majkić 2 1 University of Maryland College Park 2 University “La Sapienza”, Rome, Italy
2
Motivation Not all information on the Web is easily expressible in “classic” models (i.e., relational) RDF extraction from text STORY is the first, very successful prototype Need to extend RDF with temporal, uncertainty components Goal: build a logical model of RDF with uncertainty and provide query algorithms
3
The Probabilistic RDF idea An RDF theory is a set of triples (subject, property, value) (USA hasCapital Washington DC), (Washington DC hasPopulation 500,000) Probabilistic RDF extends this model with uncertainty over the set of values. (USA hasCapital {(Washington DC, 0.95), (State of Washington, 0.05)})
4
Probabilistic RDF example Extracted based on www.wrongdiagnosis.com
5
Probabilistic RDF example
8
Probabilistic RDF syntax Schema uncertainty: (c subClassOf (C,δ)) Σ dЄC δ(d) <= 1 Class-instance uncertainty: (x rdf:type (C,δ)) Σ dЄC δ(d) <= 1 Instance-based uncertainty: (x p (Y, δ)) Σ yЄY δ(y) <= 1
9
Probabilistic RDF syntax Sanity requirements (c subClassOf (C 1,δ 1 )), ((c subClassOf (C 2,δ 2 )) => (C 1 = C 2 and δ 1 = δ 2 ) or C 1 ∩ C 2 = Ø Same applies for other types of uncertainty Transitive properties Simple inferential capability Examples: associatedWith, controlledBy P-path: A set of triples connected by transitive properties
10
Example p-path
11
P-path semantics and t-norms We cannot generally assume independence between triples on a transitive path Flu, AcuteBronchitis, Pneumonia T-norms are used to express the user’s knowledge of the relationship between triples is associative, commutative 0 x = 0, 1 x = x x x z <= y w P-Path probability: t-norm applied to individual probabilities on the path
12
Example p-path (Flu, associatedWith, (Pneumonia, 0.455)) w.r.t. the product t-norm
13
pRDF semantics A world W is a set of simple triples (with no probabilities) An interpretation I associates a probability to each world I satisfies a pRDF theory: For each (s, p, (V,δ)), δ(v) <= Σ I(W), where W contains (s,p,v) Same applies to paths w.r.t. to a given t-norm
14
pRDF semantics A theory is consistent iff it has a satisfying interpretation Every pRDF theory is consistent Entailment: T entails T’ iff every satisfying interpretation of T satisfies T’ Closure of a theory: The entire set of triples entailed by the theory Maximal w.r.t. the probability values
15
pRDF fixpoint semantics The closure operator Δ adds exactly one entailed triple at each step (Flu associatedWith, (Acute Bronchitis,.7)) and (Acute Bronchitis associatedWith (Pneumonia,.65)) yields: (Flu associatedWith, (Pneumonia, 0.455)) w.r.t. the product t-norm Δ has a fixpoint which is the theory closure.
16
pRDF query processing We will consider only simple queries: a triple with a variable term Example (? associatedWith Pneumonia 4) What is associated with Pneumonia with probability above.4? Simple method: Compute the closure Select any triple in the closure that matches the query VERY expensive computationally
17
pRDF query processing Set of algorithms for answering simple queries and conjunctions: pRDF_Subject, pRDF_Property, …, pRDF_conjunction Central idea: Apply Δ in only those directions that yield tuples relevant to the query Cut off path computations when the threshold can no longer be reached. min (current_probability, threshold)
18
Experimental results Implementation Java, 1700 LOC Disk-based storage for pRDF theories Synthetically generated datasets According to varying underlying distributions Datasets extracted from Web sources
19
Experimental questions Does the underlying distribution affect query running time? From a practical point of view, which are the “fastest” types of queries? How does running time vary with the number of atoms in a conjunction? What other theory-dependent factors affect running time? Theory width Number of properties
20
Query running time (Poisson)
21
Query running time (zipf)
22
Conjunctive queries running time
23
Dependence on property width
24
Number of properties
25
Take away points RDF syntax with uncertainty Model-theory and fixpoint semantics for pRDF Efficient query algorithms for pRDF
26
The end http://om.umiacs.umd.edu/ Thank you! Questions & comments
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.