First Order Probabilistic Inference, by David Poole (2003) Rodrigo de Salvo Braz
P(x1,x2,x3,x4) = P(x3|x1,x2)P(x1)P(x2|x4)P(x4) Bayesian Networks x1 x2 x4 x3 P(x1,x2,x3,x4) = P(x3|x1,x2)P(x1)P(x2|x4)P(x4)
P(x1,x2,x3,x4) 1(x1)2(x1,x2, x3)3(x2,x4) 4(x4) Factor Networks 3 x1 x2 x4 1 4 2 x3 P(x1,x2,x3,x4) 1(x1)2(x1,x2, x3)3(x2,x4) 4(x4)
Variable Elimination (VE) in Factor Networks z1 z1 1 5 4 2 z4 z2 z4 y z2 3 z3 z3 ...y1(z1,y)2(z2,y)3(z3,y)4(z4,y)... ...1(z1,y)2(z2,y)3(z3,y)4(z4,y)... ...5 (z1,z2,z3,z4)...
VE in Factor Networks 3 x1 x2 x4 1 4 2 x3 Query P(x2) = ?
VE in Factor Networks 5 x1 x2 1 2 x3 Query P(x2) = ?
VE in Factor Networks 5 x2 6 x3 Query P(x2) = ?
VE in Factor Networks 5 x2 7 P(x2) 5(x2)7 (x2)
First Order Probabilistic Inference If a person P has been in a country C and C speaks a language L, then P speaks L with probability 0.4 else P speaks L with probability 0.01.
First Order Probabilistic Inference If a person P has been in a country C and C speaks a language L, then P speaks L with probability 0.4 else P speaks L with probability 0.01. h_b cn_spks spks T 0.4 F 0.6 0.01 0.5 h_b (P,C) cn_spks(C,L) spks(P,L)
First Order Probabilistic Inference domain: nutland, crazynia, john, mary, french P(john has been to nutland | john speaks french, mary has been to nutland, mary speaks french) = ?
First Order Probabilistic Inference domain: nutland, crazynia, john, mary, french P(john has been to nutland | john speaks french, mary has been to nutland, mary speaks french) = ? h_b(john,nutland) cn_spks(nutland,french) h_b(mary,nutland) spks(john,french) spks(mary,french) h_b(john,france) cn_spks(crazynia,french) h_b(mary,france)
First Order Probabilistic Inference Full grounding unfeasible, size increases fast.
First Order Probabilistic Inference Full grounding unfeasible, size increases fast. We may know that there are, e.g., 200 countries, 300 languages, without knowing any difference between them. ... ... h_b(john,c1) spks(john,l1) cn_spks(c1,l1) h_b(john,c100) cn_spks(c100,l1) spks(john,l1) ... ... ... ... h_b(john,c1) spks(john,l200) cn_spks(c1,l200) h_b(john,c100) cn_spks(c100,l200) spks(john,l200)
First Order Probabilistic Inference p(X) Domain U = {a,b,...,z,a1,...} p(a), p(b), ..., q(a,b), ... are random variables. X, Y are logical variables. Represents infinite factors: q(X,Y) p(a) p(b) p(z) p(a) p(b) p(z) p(a) p(b) p(z) p(a) p(b) p(z) ... ... p(a) p(b) p(z) q(a,a) q(b,a) q(z,a) . . . . . . . . . q(a,b) q(b,b) q(z,b) q(a,c) q(b,c) q(z,c) q(a,z) . . . q(b,z) . . . q(z,z) . . . q(a,a1) q(b,a1) q(z,a1)
Implicit representation Keep it implicit: the parameterized factor (parfactor) p(a) q(a,a1) q(a,z) q(a,c) q(a,b) q(a,a) . . . ... p(b) q(b,a1) q(b,z) q(b,c) q(b,b) q(b,a) p(z) q(z,a1) q(z,z) q(z,c) q(z,b) q(z,a) p(X) q(X,Y) {X U, Y U}
Implicit representation With Constraints p(a) p(X) p(a) p(a) p(a) p(a) p(a) q(a,a) q(a,b) . . . q(a,Y) q(X,Y) q(a,c) q(a,z) . . . q(a,a1) {Y U} {X = a, Y U}
Variable Elimination (VE) Example t(W) 1 {W b} p(W) 3 2 q(X,Y) p(Y) p(V) s(V) {} {V c} We want to eliminate variables with predicate p
VE Example – Grounding ... ... ... ... ... ... {W b} t(a) t(b) t(c) t(d) ... {W b} 1 1 1 1 ... p(a) p(b) p(c) p(d) 3 q(a,a) 2 p(a) 3 p(a) s(a) q(b,a) 2 What are the common variables on p? p(b) s(b) ... 2 3 q(a,b) p(c) s(c) p(b) 2 3 s(d) q(b,b) p(d) ... ... ... {V c} {}
VE Example – Grounding ... ... ... ... ... ... {W b} t(a) t(b) t(c) t(d) ... {W b} 1 1 1 1 ... p(a) p(b) p(c) p(d) 3 q(a,a) 2 p(a) 3 p(a) s(a) q(b,a) 2 Common variables: p(Y’){Y’b,Y’c} p(b) s(b) ... 2 3 q(a,b) p(c) s(c) p(b) 2 3 s(d) q(b,b) p(d) ... ... ... {V c} {}
VE Example – Common variables t(W) 1 {W b} p(W) 3 2 q(X,Y) p(Y) p(V) s(V) {} {V c} Unification(p(Y), p(W){W b}, p(V){V c}) = p(Y’){Y=V=W=Y’,Y’ b, Y’ c}
VE Example – Unified parfactors only t(Y’) 1 {Y’b,Y’c} p(Y’) 3 2 q(X,Y’) p(Y’) p(Y’) s(Y’) {Y’b,Y’c} {Y’b,Y’c} What about the remaining bindings?
VE Example – Splitting parfactors Unification Residual t(W) t(W) t(W) - = 1 1 1 {Wb,Wc} {W = c} {W b} p(W) p(W) p(W) W W W - = a b c ... a b c ... a b c ...
VE Example – Splitting parfactors Unification Residual 2 2 2 - = p(V) s(V) p(V) s(V) p(V) s(V) {V c} {Vb,Vc} {V = b} V V V - = a b c ... a b c ... a b c ...
VE Example – Splitting parfactors Unification Residual 3 3 3 - = q(X,Y) p(Y) q(X,Y) p(Y) q(X,Y) p(Y) {} {Yb,Yc} {Y{b,c}} Y Y Y - = a b c ... a b c ... a b c ...
VE Example – Grounded view for each Y t(a) 1 q(a,a) 3 p(a) s(a) q(b,a) 2 3 ... t(d) 1 q(a,d) 3 p(d) s(d) q(b,d) 2 3 ...
VE Example – Grounded view t(a) for each Y for each X,Y 11/|X| t(a) q(a,a) p(a) s(a) 1 3 21/|X| q(a,a) 3 p(a) s(a) t(a) q(b,a) 2 11/|X| 3 ... q(b,a) p(a) s(a) 3 ... 21/|X| t(d) 11/|X| t(d) q(a,d) p(d) s(d) 1 3 21/|X| q(a,d) 3 p(d) s(d) t(d) q(b,d) 2 11/|X| 3 ... q(b,d) p(d) s(d) 3 ... 21/|X|
VE Example – Grounded view t(a) for each Y for each X,Y 11/|X| t(a) t(a) q(a,a) p(a) s(a) q(a,a) s(a) 1 3 21/|X| q(a,a) 4 =p(Y) 311/|X|21/|X| 3 p(a) s(a) t(a) q(b,a) 2 t(a) 11/|X| 3 ... q(b,a) p(a) s(a) q(b,a) s(a) ... 3 ... 21/|X| 4 t(d) t(d) 11/|X| t(d) q(a,d) p(d) s(d) q(a,d) s(d) 3 21/|X| 4 1 q(a,d) 3 p(d) s(d) t(d) q(b,d) 2 11/|X| t(d) 3 ... q(b,d) p(d) s(d) q(b,d) s(d) ... 3 ... 21/|X| 4
VE Example – Grounded view t(a) for each Y for each X,Y 11/|X| t(a) t(a) q(a,a) p(a) s(a) q(a,a) s(a) 1 3 21/|X| 4 q(a,a) 3 p(a) s(a) t(a) q(b,a) 2 t(a) 11/|X| 3 t(Y’) ... q(b,a) p(a) s(a) q(b,a) s(a) 3 ... 21/|X| ... 4 q(X,Y’) s(Y’) 4 t(d) {Y’b,Y’c} t(d) Emphasize that this is almost what intuition says, but not quite. 11/|X| t(d) q(a,d) p(d) s(d) q(a,d) s(d) 1 3 21/|X| 4 q(a,d) 3 p(d) s(d) t(d) q(b,d) 2 11/|X| t(d) 3 ... q(b,d) p(d) s(d) q(b,d) s(d) ... 3 ... 21/|X| ... 4
VE Example – Lifted view t(Y’) 1 {Y’b,Y’c} p(Y’) 3 2 q(X,Y’) p(Y’) p(Y’) s(Y’) {Y’b,Y’c} {Y’b,Y’c}
VE Example – Lifted view t(Y’) 1 {Y’b,Y’c} p(Y’) 3 2 q(X,Y’) p(Y’) p(Y’) s(Y’) {Y’b,Y’c} {Y’b,Y’c} t(Y’) 11/|X| p(Y’) s(Y’) 21/|X| 3 q(X,Y’) {Y’b,Y’c} {Y’b,Y’c} {Y’b,Y’c}
VE Example – Lifted view t(Y’) 1 {Y’b,Y’c} p(Y’) 3 2 t(Y’) q(X,Y’) p(Y’) p(Y’) s(Y’) q(X,Y’) s(Y’) {Y’b,Y’c} {Y’b,Y’c} p(Y) 311/|X|21/|X| {Y’b,Y’c} t(Y’) 11/|X| p(Y’) s(Y’) 21/|X| 3 q(X,Y’) {Y’b,Y’c} {Y’b,Y’c} {Y’b,Y’c}
VE Example 2 – Losing logic variables 1 2 q(X) p(Y) p(Y) s(Z) {} {} No constraints, so already unified; Logical variable Y will disappear.
VE Example 2 – Grounded view for each Y 1 2 q(a) s(a) p(a) 2 1 q(b) s(b) ... ... 1 2 q(a) s(a) p(b) 2 1 q(b) s(b) ... ...
VE Example 2 – Grounded view for each Y for each X,Y,Z 11/|Z| 21/|X| q(a) p(a) s(a) 11/|Z| 21/|X| 1 2 q(a) p(a) s(b) q(a) s(a) p(a) ... 2 1 11/|Z| 21/|X| q(b) s(b) q(b) p(a) s(a) ... ... 11/|Z| 21/|X| q(b) p(a) s(b) ... 11/|Z| 21/|X| q(a) p(b) s(a) 11/|Z| 21/|X| 1 2 q(a) p(b) s(b) q(a) s(a) ... p(b) 2 1 11/|Z| 21/|X| q(b) s(b) q(b) p(b) s(a) 11/|Z| 21/|X| ... ... q(b) p(b) s(b) ...
VE Example 2 – Grounded view for each Y for each X,Y,Z =p(Y) 11/|Z|21/|X| 11/|Z| 21/|X| q(a) p(a) s(a) q(a) s(a) 11/|Z| 21/|X| 1 2 q(a) p(a) s(b) q(a) s(a) q(a) s(b) p(a) ... 2 ... 1 11/|Z| 21/|X| q(b) s(b) q(b) p(a) s(a) q(b) s(a) ... ... 11/|Z| 21/|X| q(b) p(a) s(b) q(b) s(b) ... ... 11/|Z| 21/|X| q(a) p(b) s(a) q(a) s(a) 11/|Z| 21/|X| 1 2 q(a) p(b) s(b) q(a) s(b) q(a) s(a) ... ... p(b) 2 1 11/|Z| 21/|X| q(b) s(b) p(b) q(b) s(a) q(b) s(a) 11/|Z| 21/|X| ... ... p(b) s(b) q(b) s(b) q(b) ... ...
VE Example 2 – Grounded view for each Y for each X,Y,Z =p(Y) 11/|Z|21/|X| 11/|Z| 21/|X| q(a) p(a) s(a) q(a) s(a) 11/|Z| 21/|X| 1 2 q(a) p(a) s(b) q(a) s(a) q(a) s(b) p(a) ... 2 ... 1 11/|Z| 21/|X| q(b) s(b) q(b) p(a) s(a) q(b) s(a) ... ... 11/|Z| 21/|X| q(b) p(a) s(b) q(b) s(b) |Y| repetitions each ... ... 11/|Z| 21/|X| q(a) p(b) s(a) q(a) s(a) 11/|Z| 21/|X| 1 2 q(a) p(b) s(b) q(a) s(b) q(a) s(a) ... ... p(b) 2 1 11/|Z| 21/|X| q(b) s(b) p(b) q(b) s(a) q(b) s(a) 11/|Z| 21/|X| ... ... q(b) p(b) s(b) q(b) s(b) ... ...
VE Example 2 – Grounded view for each Y for each X,Y,Z =p(Y) 11/|Z|21/|X| 11/|Z| 21/|X| q(a) p(a) s(a) q(a) s(a) 11/|Z| 21/|X| 1 2 q(a) p(a) s(b) q(a) s(a) q(a) s(b) p(a) ... 2 ... 1 11/|Z| 21/|X| |Y| q(b) s(b) q(b) p(a) s(a) q(b) s(a) q(a) s(a) ... ... 11/|Z| 21/|X| |Y| q(b) p(a) s(b) q(b) s(b) q(a) s(b) ... ... ... |Y| 11/|Z| 21/|X| q(b) s(a) q(a) p(b) s(a) q(a) s(a) |Y| 11/|Z| 21/|X| 1 q(b) s(b) 2 q(a) p(b) s(b) q(a) s(b) q(a) s(a) ... ... ... p(b) 2 1 11/|Z| 21/|X| q(b) s(b) q(b) s(a) q(b) p(b) s(a) 11/|Z| 21/|X| ... ... p(b) s(b) q(b) s(b) q(b) ... ...
VE Example 2 – Grounded view for each X,Y,Z =p(Y) 11/|Z|21/|X| 11/|Z| 21/|X| q(a) p(a) s(a) q(a) s(a) 11/|Z| 21/|X| q(a) p(a) s(b) q(a) s(b) ... ... 11/|Z| 21/|X| |Y| q(b) p(a) s(a) q(b) s(a) q(a) s(a) 11/|Z| 21/|X| |Y| q(b) p(a) s(b) q(b) s(b) q(a) s(b) |Y| ... ... ... q(X) s(Z) |Y| 11/|Z| 21/|X| q(b) s(a) {} q(a) p(b) s(a) q(a) s(a) |Y| 11/|Z| 21/|X| q(b) s(b) q(a) p(b) s(b) q(a) s(b) ... ... ... 11/|Z| 21/|X| q(b) p(b) q(b) s(a) s(a) 11/|Z| 21/|X| q(b) p(b) s(b) q(b) s(b) ... ...
VE Example 2 – Lifted view 1 2 q(X) p(Y) p(Y) s(Z) {} {} (p(Y) 11/|Z|21/|X|)|Y| q(X) s(Z) {}
General Case ... q1(X1,Z) q2(X2,Z) qn(Xn,Z) 1 2 n p(Z,Y) p(Z,Y)
(q1(X1,Z),…,qn(Xn,Z)) = p(Z,Y)ii(Xi,Z,Y)|C(Y)|/|C(X\Xi)| General Case q1(X1,Z) q2(X2,Z) qn(Xn,Z) ... 1 2 n p(Z,Y) p(Z,Y) p(Z,Y) {C} {C} {C} {C} ... q1(X1,Z) q2(X2,Z) qn(X1,Z) (q1(X1,Z),…,qn(Xn,Z)) = p(Z,Y)ii(Xi,Z,Y)|C(Y)|/|C(X\Xi)|
Conclusions Clear semantics; Probabilistic first order without grounding; Specialize parfactors only as needed through splitting them; Problem: no functions, but this can be fixed.
The End
VE Example 2 – Grounded view Master for each Y for each X,Y,Z =p(Y) 11/|Z|21/|X| 11/|Z| 21/|X| q(a) p(a) s(a) q(a) s(a) 11/|Z| 21/|X| 1 2 q(a) p(a) s(b) q(a) s(a) q(a) s(b) p(a) ... 2 ... 1 11/|Z| 21/|X| |Y| q(b) s(b) q(b) p(a) s(a) q(b) s(a) q(a) s(a) ... ... 11/|Z| 21/|X| |Y| q(b) p(a) s(b) q(b) s(b) q(a) |Y| repetitions each s(b) |Y| ... ... ... q(X) s(Z) |Y| 11/|Z| 21/|X| q(b) s(a) {} q(a) p(b) s(a) q(a) s(a) |Y| 11/|Z| 21/|X| 1 q(b) s(b) 2 q(a) p(b) s(b) q(a) s(b) q(a) s(a) ... ... ... p(b) 2 1 11/|Z| 21/|X| q(b) s(b) q(b) s(a) q(b) p(b) s(a) 11/|Z| 21/|X| ... ... p(b) s(b) q(b) s(b) q(b) ... ...
(q1(X1,Z),…,qn(Xn,Z)){C} = p(Z,Y)ii(Xi,Z,Y)|C(Y)|/|C(X – Xi)| General Case Parfactors i(qi(Xi,Z),p(Z,Y)){C}, i = 1,…,n Same set of constraints C for all of them Xi: logic variables unique to i p(Z,Y): literal being eliminated qi(Xi,Z): combination of all other literals in i Y: logic variables unique to p Z: logic variables in both p and all qi’s X: sequence X1,…,Xn (q1(X1,Z),…,qn(Xn,Z)){C} = p(Z,Y)ii(Xi,Z,Y)|C(Y)|/|C(X – Xi)|
Variable Elimination Same Bindings p(X) r(Y) p(X) 2 q(X,Y) q(X,Y) 2 {X U, Y U} q(X,Y) r(Y) {X U, Y U} {X U, Y U} X U X U X U Y U Y U Y U
Variable Elimination Different Bindings, Same Variables p(X) r(Y) 2 q(X,Y) {X = a, Y U} No one to one correspondence! q(X,Y) {X U, Y U} X U X = a Y U Y U
Splitting p(X) p(X) p(X) q(X,Y) q(X,Y) q(X,Y) {X U, Y U} {X = a, Y U} {X a, Y U} X U X = a X a Y U Y U Y U
Variable Elimination After Splitting p(X) r(Y) p(X) 2 q(X,Y) q(X,Y) 2 {X = a, Y U} q(X,Y) r(Y) {X = a, Y U} {X = a, Y U} X = a X = a X = a Y U Y U Y U
Example - Grounding 3 t(a) t(b) t(c) t(d) q(a,a) p(a) ... 3 1 1 1 1 q(b,a) p(a) ... ... p(a) p(b) p(c) p(d) 3 q(a,b) p(b) 2 s(a) 3 p(a) Variables in common: p(Y’){Y’b,Y’c} q(b,b) p(b) 2 p(b) s(b) ... 3 2 q(a,c) p(c) p(c) s(c) 3 2 q(b,c) p(c) p(d) s(d) ... 3 ... q(a,d) p(d) ...
Probabilistic Propositional Logic 0.2: q 0.8: p q 0.3: q r 1: (p q) r p q 0.2 q p p
VE Example – Lifted view t(Y’) 1 {Y’b,Y’c} p(Y’) 3 2 q(X,Y’) p(Y’) p(Y’) s(Y’) {Y’b,Y’c} {Y’b,Y’c} t(d) 1 q(a,d) 3 p(d) s(d) q(b,d) 2 ... 3
VE Example – Lifted view t(Y’) {Y’b,Y’c} 1 {Y’b,Y’c} 2 p(Y’) t(Y’) p(Y’) s(Y’) 11/|X| {Y’b,Y’c} 3 q(X,Y’) p(Y’) p(Y’) p(Y’) s(Y’) q(X,Y’) p(Y’) 3 21/|X| {Y’b,Y’c} {Y’b,Y’c} {Y’b,Y’c} t(d) 11/|X| t(d) q(a,d) p(d) s(d) 1 3 21/|X| q(a,d) 3 p(d) s(d) t(d) q(b,d) 2 11/|X| ... 3 q(b,d) p(d) s(d) 3 ... 21/|X|
VE Example – Lifted view t(Y’) {Y’b,Y’c} 1 {Y’b,Y’c} 2 p(Y’) t(Y’) p(Y’) s(Y’) t(Y’) 11/|X| s(Y’) {Y’b,Y’c} q(X,Y’) 3 q(X,Y’) p(Y’) p(Y’) p(Y’) s(Y’) 4 q(X,Y’) p(Y’) 3 21/|X| {Y’b,Y’c} {Y’b,Y’c} {Y’b,Y’c} {Y’b,Y’c} t(d) t(d) 11/|X| t(d) q(a,d) p(d) s(d) q(a,d) s(d) 1 3 21/|X| 4 q(a,d) 3 p(d) s(d) t(d) q(b,d) 2 11/|X| t(d) ... 3 q(b,d) p(d) s(d) q(b,d) s(d) 3 ... 21/|X| 4
VE Example – Lifted view t(Y’) {Y’b,Y’c} 1 {Y’b,Y’c} 2 p(Y’) t(Y’) p(Y’) s(Y’) t(Y’) 11/|X| q(X,Y’) s(Y’) {Y’b,Y’c} 3 q(X,Y’) p(Y’) p(Y’) p(Y’) s(Y’) 4 q(X,Y’) t(Y’) p(Y’) 3 21/|X| {Y’b,Y’c} q(X,Y’) s(Y’) {Y’b,Y’c} {Y’b,Y’c} {Y’b,Y’c} 4 t(d) {Y’b,Y’c} t(d) 11/|X| t(d) q(a,d) p(d) s(d) q(a,d) s(d) 4 1 3 21/|X| q(a,d) 3 p(d) s(d) t(d) q(b,d) 2 11/|X| t(d) ... 3 q(b,d) p(d) s(d) q(b,d) s(d) 3 ... 21/|X| 4
Probabilistic Propositional Logic 0.2: q 0.8: p q 0.3: q r 1: (p q) r
Probabilistic Propositional Logic 0.3 0.2 q 0.2: q 0.8: p q 0.3: q r 1: (p q) r p q 0.8 p p
Probabilistic Propositional Logic 0.2: q 0.8: p q 0.3: q r 1: (p q) r
Probabilistic Propositional Logic 0.3 0.2 q 0.2: q 0.8: p q 0.3: q r 1: (p q) r p q 0.8 p p
Probabilistic Propositional Logic 0.2: q 0.8: p q 0.3: q r 1: (p q) r
Probabilistic Propositional Logic 0.4 if t, 0.5 otherwise noisy-or of (3) and (4) 1 if p s and q 1 if p s and q 0.3 if q and (p s) 0.5 if q and (p s) t r 0.2 q (1) 0.2: q (2) 0.8: p q (3) 0.3: q r (4) 1: (p s) r (5) 0.4: t r 0.8 if p, 0.5 otherwise p s