Download presentation
Presentation is loading. Please wait.
Published byAlfred Williamson Modified over 9 years ago
1
URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald
2
bornOn(Jeff, 09/22/42) gradFrom(Jeff, Columbia) hasAdvisor(Jeff, Arthur) hasAdvisor(Surajit, Jeff) knownFor(Jeff, Theory) type(Jeff, Author) [0.9] author(Jeff, Drag_Book) [0.8] author(Jeff,Cind_Book) [0.6] worksAt(Jeff, Bell_Labs) [0.7] type(Jeff, CEO) [0.4] Information Extraction YAGO/DBpedia et al. New fact candidates >120 M facts for YAGO2 (mostly from Wikipedia infoboxes) 100’s M additional facts from Wikipedia text
3
Outline Motivation & Problem Setting URDF running example: people graduating from universities Efficient MAP Inference MaxSAT solving with soft & hard constraints Grounding Deductive grounding of soft rules (SLD resolution) Iterative grounding of hard rules (closure) MaxSAT Algorithm MaxSAT algorithm in 3 steps Experiments & Future Work Query-Time Reasoning in Uncertain RDF Knowledge Bases 3
4
URDF: Uncertain RDF Data Model Extensional Layer (information extraction & integration) High-confidence facts: existing knowledge base (“ground truth”) New fact candidates: extracted facts with confidence values Integration of different knowledge sources: Ontology merging or explicit Linked Data (owl:sameAs, owl:equivProp.) Large “Uncertain Database” of RDF facts Intensional Layer (query-time inference) Soft rules: deductive grounding & lineage (Datalog/SLD resolution) Hard rules: consistency constraints (more general FOL rules) Propositional & probabilistic consistency reasoning Query-Time Reasoning in Uncertain RDF Knowledge Bases 4
5
Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints PPeople may live in more than one place livesIn(x,y) marriedTo(x,z) livesIn(z,y) livesIn(x,y) hasChild(x,z) livesIn(z,y) PPeople are not born in different places/on different dates bornIn(x,y) bornIn(x,z) y=z PPeople are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t 1 ) marriedTo(x,z,t 2 ) y≠z disjoint(t 1,t 2 ) [0.8] [0.5] Query-Time Reasoning in Uncertain RDF Knowledge Bases 5
6
Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints People may live in more than one place livesIn(x,y) marriedTo(x,z) livesIn(z,y) livesIn(x,y) hasChild(x,z) livesIn(z,y) People are not born in different places/on different dates bornIn(x,y) bornIn(x,z) y=z People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t 1 ) marriedTo(x,z,t 2 ) y≠z disjoint(t 1,t 2 ) [0.8] [0.5] Query-Time Reasoning in Uncertain RDF Knowledge Bases 6 Rule-based (deductive) reasoning: Datalog, RDF/S, OWL2-RL, etc. FOL constraints (in particular mutex): Datalog with constraints, X-tuples in Prob. DB’s owl:FunctionalProperty, etc. FOL constraints (in particular mutex): Datalog with constraints, X-tuples in Prob. DB’s owl:FunctionalProperty, etc.
7
URDF Running Example Jeff Stanford University type [1.0] Surajit Princeton David Computer Scientist Computer Scientist worksAt [0.9] type [1.0] graduatedFrom [0.6] graduatedFrom [0.7] graduatedFrom [0.9] hasAdvisor [0.8] hasAdvisor [0.7] KB: RDF Base Facts Derived Facts gradFrom(Surajit,Stanford) gradFrom(David,Stanford) Derived Facts gradFrom(Surajit,Stanford) gradFrom(David,Stanford) graduatedFrom [?] First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z Query-Time Reasoning in Uncertain RDF Knowledge Bases 7
8
Basic Types of Inference Maximum-A-Posteriori (MAP) Inference Find the most likely assignment to query variables y under a given evidence x. Compute: arg max y P( y | x) (NP-hard for propositional formulas, e.g., MaxSAT over CNFs) Marginal/Success Probabilities Probability that query y is true in a random world under a given evidence x. Compute: ∑ y P( y | x ) (#P-hard for propositional formulas) Query-Time Reasoning in Uncertain RDF Knowledge Bases 8
9
9 General Route: Grounding & MaxSAT Solving Query graduatedFrom(x, y) Query graduatedFrom(x, y) CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) (graduatedFrom(David, Stanford) graduatedFrom(David, Princeton)) (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) (graduatedFrom(David, Stanford) graduatedFrom(David, Princeton)) (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) 1000 0.4 0.9 0.8 0.7 0.6 0.7 0.9 1) Grounding – Consider only facts (and rules) which are relevant for answering the query 2) Propositional formula in CNF, consisting of – Grounded hard & soft rules – Uncertain base facts 3) Propositional Reasoning – Find truth assignment to facts such that the total weight of the satisfied clauses is maximized MAP inference: compute “most likely” possible world
10
Why are high weights for hard rules not enough? Consider the following CNF (for A,B > 0, A >> B) The optimal solution has weight A+B The next-best solution has weight A+0 Hence the ratio of the optimal over the approximate solution is A+B / A In general, any (1+ ) approximation algorithm, with > 0, may set graduatedFrom(Surajit, Princeton) to true, as A+B / A 1 for A . Query-Time Reasoning in Uncertain RDF Knowledge Bases 10 CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) A0BA0B
11
Find: arg max y P( y | x) Resolves to a variant of MaxSAT for propositional formulas URDF: MaxSAT Solving with Soft & Hard Rules Query-Time Reasoning in Uncertain RDF Knowledge Bases { graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) } { graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) } { graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) } { graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) } (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford)) (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton) 0.4 0.9 0.8 0.7 0.6 0.7 0.9 S: Mutex-const. Special case: Horn-clauses as soft rules & mutex-constraints as hard rules C: Weighted Horn clauses (CNF) Compute W 0 = ∑ clauses C w(C) P(C is satisfied); For each hard constraint S { For each fact f in S t { Compute W f+ t = ∑ clauses C w(C) P(C is sat. | f = true); } Compute W S- t = ∑ clauses C w(C) P(C is sat. | S t = false); Choose truth assignment to f in S t that maximizes W f+ t, W S- t ; Remove satisfied clauses C; t++; } Compute W 0 = ∑ clauses C w(C) P(C is satisfied); For each hard constraint S { For each fact f in S t { Compute W f+ t = ∑ clauses C w(C) P(C is sat. | f = true); } Compute W S- t = ∑ clauses C w(C) P(C is sat. | S t = false); Choose truth assignment to f in S t that maximizes W f+ t, W S- t ; Remove satisfied clauses C; t++; } Runtime: O(|S||C|) Approximation guarantee of 1/2 Runtime: O(|S||C|) Approximation guarantee of 1/2 11 MaxSAT Alg.
12
Deductive Grounding Algorithm (SLD Resolution/Datalog) /\ graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) hasAdvisor (Surajit,Jeff ) hasAdvisor (Surajit,Jeff ) worksAt (Jeff,Stanford ) worksAt (Jeff,Stanford ) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0] Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0] Query-Time Reasoning in Uncertain RDF Knowledge Bases 12 Grounded Rules hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford) gradFrom(Surajit, Stanford) gradFrom(Surajit, Princeton) Grounded Rules hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford) gradFrom(Surajit, Stanford) gradFrom(Surajit, Princeton)
13
Dependency Graph of a Query SLD grounding always starts from a query literal and first pursues over the soft deduction rules. Grounding is also iterated over the hard rules in a top- down fashion by using the literals in each hard rule as new subqueries. Cycles (due to recursive rules) are detected and resolved via a form of tabling known from Datalog. Grounding terminates when a closure is reached, i.e., when no new facts can be grounded from the rules and all subgoals are either resolved or form the root of a cycle. Query-Time Reasoning in Uncertain RDF Knowledge Bases 13
14
Weighted MaxSAT Algorithm General idea Compute a potential function W t that iterates over all hard rules S t and set the fact f S t that maximizes W t (or none of them) to true; set all other facts in S t to false. Query-Time Reasoning in Uncertain RDF Knowledge Bases 14 At iteration 0, we have At any intermediate iteration t, we compare At the final iteration t_max, all facts are assigned either true or false. W t_max is equal to the total weight of all clauses that are satisfied.
15
Step 1 Weights w(f i ) and probabilities p i Query-Time Reasoning in Uncertain RDF Knowledge Bases 15 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 S: Mutex-const. C: Weighted Horn clauses (CNF)
16
Query-Time Reasoning in Uncertain RDF Knowledge Bases 16 Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. C: Weighted Horn clauses (CNF) Weights w(f i ) and probabilities p i (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9
17
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 Weights w(f i ) and probabilities p i Query-Time Reasoning in Uncertain RDF Knowledge Bases 17 Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. C: Weighted Horn clauses (CNF) C 1 : hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford) P(C 1 ) = 1 – (1-(1-1))(1-(1-1))(1-1) = 1 C1:C1: hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford) P(C 1 ) = 1 – (1-(1-1))(1-(1-1))(1-1) = 1 single partition, negated: 1 - p i single partition, positive: p i
18
Query-Time Reasoning in Uncertain RDF Knowledge Bases 18 Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. C: Weighted Horn clauses (CNF) Weights w(f i ) and probabilities p i P(C 1 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-1) = 1 P(C 2 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-0) = 0... W 0 = 0.4 + 0.9 + 0.8 + 0.7 + 0.6 + 0.7 + 0.9 = 5.0 (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9
19
(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4 (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9 Query-Time Reasoning in Uncertain RDF Knowledge Bases 19 Step 3 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. C: Weighted Horn clauses (CNF) W 1 = 0.4 + 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.8 W 2 = 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.4
20
Experiments – Setup YAGO Knowledge Base 2 Mio entities, 20 Mio facts Soft Rules 16 soft rules (hand-crafted deduction rules with weights) Hard Rules 5 predicates with functional properties (bornIn, diedIn, bornOnDate, diedOnDate, marriedTo) Queries 10 conjunctive SPARQL queries Markov Logic as Competitor (based on MCMC) MAP inference: Alchemy employs a form of MaxWalkSAT MC-SAT: Iterative MaxSAT & Gibbs sampling Query-Time Reasoning in Uncertain RDF Knowledge Bases 20
21
YAGO Knowledge Base: URDF vs. Markov Logic URDF: SLD grounding & MaxSat solving |C| - # ground literals in soft rules |S| - # ground literals in hard rules URDF vs. Markov Logic (MAP inference & MC-SAT) First run: ground each query against the rules (SLD grounding + MaxSAT solving) & report sum of runtimes Asymptotic runtime checks: synthetic soft rule expansions Query-Time Reasoning in Uncertain RDF Knowledge Bases 21
22
Recursive Rules & LUBM Benchmark 42 inductively learned (partly recursive) rules over 20 Mio facts in YAGO URDF grounding with different maximum SLD levels Query-Time Reasoning in Uncertain RDF Knowledge Bases 22 URDF (SLD grounding + MaxSAT) vs. Jena (only grounding) over the LUBM benchmark SF-1: 103,397 triplets SF-5: 646,128 triplets SF-10: 1,316,993 triplets
23
Current & Future Topics... Temporal consistency reasoning Soft/hard rules with temporal predicates Soft deduction rules: deduce confidence distribution of derived facts Learning soft rules & consistency constraints Explore how Inductive Logic Programming can be applied to large, uncertain & incomplete knowledge bases More solving/sampling Linear-time constrained & weighted MaxSAT solver Improved Gibbs sampling with soft & hard rules Scale-out Distributed grounding via message passing Updates/versioning for (linked) RDF data Non-monotonic answers for rules with negation! Query-Time Reasoning in Uncertain RDF Knowledge Bases 23
24
Online Demo! urdf.mpi-inf.mpg.de Query-Time Reasoning in Uncertain RDF Knowledge Bases 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.