First-Order Probabilistic Inference Rodrigo de Salvo Braz.

First-Order Probabilistic Inference Rodrigo de Salvo Braz

2 A remark Feel free to ask clarification questions!

3 Slides online  Just search for “Rodrigo de Salvo Braz” and check at the presentations page. (www.cs.berkeley.edu/~braz)

4 Outline  A goal: First-order Probabilistic Inference An ultimate goal Propositionalization  Lifted Inference Inversion Elimination Counting Elimination  DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference  Future Directions and Conclusion

6 How to solve commonsense reasoning How to solve Natural Language Processing How to solve Planning How to solve Vision Domain Knowledge Inference and Learning Language Knowledge Inference and Learning Domain & Planning Knowledge Inference and Learning Objects & Optics Knowledge Inference and Learning Abstracting Inference and Learning Commonsense reasoning Natural Language Processing PlanningVision... Inference and Learning AI Problems

7 Inference and Learning What capabilities should it have? Inference and Learning It should represent and use: predicates (relations) and quantification over objects: 8 X tank(X) Ç plane(X) 8 X tank(X) ) color(X)=green color(o121) = red varying degrees of uncertainty: 8 X { tank(X) ) color(X)=green } since it may hold often but not absolutely. essential for language, vision, pattern recognition, commonsense reasoning etc modal knowledge, utilities, and more.

8 Abstracting Inference and Learning Domain knowledge: tank(X) Ç plane(X) { tank(X) ) color(X)=green} color(o121) = red... Commonsense reasoning Language knowledge: { Sentence(S) Æ Verb(S,”attacks”) Æ Subject(S,X) Æ Object(S,Y) ) attacking(X,Y) }... Natural Language Processing Inference and Learning Sharing inference and learning module

9 Abstracting Inference and Learning Domain knowledge: tank(X) Ç plane(X) { tank(X) ) color(X)=green} color(o121) = red... Commonsense reasoning WITH Natural Language Processing Language knowledge: { Sentence(S) Æ Verb(S,”attacks”) Æ Subject(S,X) Æ Object(S,Y) ) attacking(X,Y) }... Inference and Learning Joint solution made simpler Verb(s13,”attacks”), Subject(s13,o121)...

10 Standard solutions fall short  Logic Has objects and properties; but statements are absolute;  Graphical models and machine learning Have varying degrees of uncertainty; but propositional; treating objects and quantification is awkward.

11 Standard solutions fall short Graphical models and objects tank(X) Ç plane(X) { tank(X) ) color(X)=green } { next(X,Y) Æ tank(X) ) tank(Y) } color(o121)=red color(o135)=green color(121) tank(o121)plane(o121) next(o121, o135) tank(o135)plane(o135) color(o135) Transformation (propositionalization) not part of graphical model framework; Graphical model have only random variables, not objects; Different data creates distinct, formally unrelated model.

13 First-Order Probabilistic Inference  Many languages have been proposed  Knowledge-based model construction (Breese, 1992)  Probabilistic Logic Programming (Ng & Subrahmanian, 1992)  Probabilistic Logic Programming (Ngo and Haddawy, 1995)  Probabilistic Relational Models (Friedman et al., 1999)  Relational Dependency Networks (Neville & Jensen, 2004)  Bayesian Logic (BLOG) (Milch & Russell, 2004)  Bayesian Logic Programs (Kersting & DeRaedt, 2001)  DAPER (Heckerman, Meek & Koller, 2007)  Markov Logic Networks (Richardson & Domingos, 2004)  “Smart” propositionalization is the main method.

14 Propositionalization  Bayesian Logic Programs Prolog-like statements such as rescue(Battalion) | in(Soldier,Battalion), wounded(Soldier). wounded(Soldier) | in(Soldier,Battalion), attacked(Battalion). associated with CPTs and combination rules. | instead of :-

15 Propositionalization rescue(Battalion) | in(Soldier,Battalion), wounded(Soldier). (plus CPT, cr) BN grounded from and-or tree: rescue(bat13) in(john,bat13) wounded(john) in(peter,bat13) wounded(peter)...

16 Propositionalization Multi-Entity Bayesian Networks (Laskey, 2004) in(Soldier,Battalion)wounded(Soldier) rescue(Battalion) in(Soldier,Battalion)attacked(Battalion) wounded(Soldier)

17 Propositionalization Multi-Entity Bayesian Networks (Laskey, 2004) in(Soldier,Battalion)wounded(Soldier) rescue(Battalion) in(john,bat13)wounded(john) rescue(bat13) in(peter,bat13)wounded(peter) rescue(bat13)... first-order fragments instantiated

18 Propositionalization Multi-Entity Bayesian Networks (Laskey, 2004) in(john,bat13)wounded(john) rescue(bat13) in(peter,bat13)wounded(peter)... in(john,bat13)wounded(john) rescue(bat13) in(peter,bat13)wounded(peter) rescue(bat13)...

20 Lifted Inference rescue(bat13) rescue(Battalion) ( in(Soldier,Battalion) Æ wounded(Soldier) { wounded(Soldier) ( in(Soldier,Battalion) Æ attacked(Battalion) } wounded(Soldier) attacked(bat13) in(Soldier, bat13) Unbound, general variable Faster More compact More intuitive Higher level – more information/structure available for optimization

22 Factor Networks (undirected) epidemic_france P(epi_france, epi_belgium, epi_uk, epi_germany) /   (epi_france, epi_belgium) *   (epi_france, epi_uk) *   (epi_france, epi_germany) *   (epi_belgium, epi_germany) epidemic_belgiumepidemic_germany epidemic_uk     factor, or potential function

26 epidemic sick_john sick_mary   (epidemic) P(epidemic) P(sick_john | epidemic) P(sick_mary | epidemic) P(sick_john) /  epidemic P(sick_john | epidemic) * P(epidemic) *  sick_mary P(sick_mary | epidemic) *   (epidemic) Inference: Variable Elimination (VE)

27 epidemic sick_john sick_mary   (epidemic) P(epidemic) P(sick_john | epidemic) P(sick_mary | epidemic) P(sick_john) /  epidemic P(sick_john | epidemic) * P(epidemic) *   (epidemic) *  sick_mary P(sick_mary | epidemic) Inference: Variable Elimination (VE)

28 epidemic sick_john   (epidemic) P(epidemic) P(sick_john | epidemic) P(sick_john) /  epidemic P(sick_john | epidemic) * P(epidemic) *   (epidemic) *   (epidemic)   (epidemic) Inference: Variable Elimination (VE)

29 sick_john   (sick_john) P(sick_john) /   (sick_john) Inference: Variable Elimination (VE)

30 A Factor Network sick(mary,measles) hospital(mary) epidemic(measles)epidemic(flu) sick(mary,flu) … … sick(bob,measles) hospital(bob) sick(bob,flu) …… … …… ……

31 First-order Representation sick(P,D) hospital(P) epidemic(D) Atoms represent a set of random variables Logical Variables parameterize random variables Parfactors, for parameterized factors Contraints to logical variables P  john

32 Semantics sick(mary,measles) hospital(mary) epidemic(measles)epidemic(flu) sick(mary,flu) … … sick(bob,measles) hospital(bob) sick(bob,flu) …… … …… ……   sick(mary,measles), epidemic(measles))

34 Inversion Elimination (IE) sick(P, D) epidemic(D) D  measles IE … sick(john, flu) epidemic(flu) sick(mary, rubella) epidemic(rubella) … Grounding … sick(john, flu) epidemic(flu) sick(mary, rubella) epidemic(rubella) … VE Abstraction epidemic(D) D  measles sick(P, D)  sick(P,D)  (epidemic(D),sick(P,D))

35 Inversion Elimination (IE)  Proposed by Poole (2003);  Lacked formalization;  Did not identify cases in which it is incorrect;  Formalized and identified correctness conditions (with Dan Roth and Eyal Amir).

36  Requires eliminated RVs to occur in separate instances of parfactor … sick(mary, flu) epidemic(flu) sick(mary, rubella) epidemic(rubella) … sick(mary, D) epidemic(D) D  measles Inversion Elimination correct Inversion Elimination - Limitations

37 IE not correct  Requires eliminated RVs to occur in separate instances of parfactor Inversion Elimination - Limitations epidemic(D2) epidemic(D1) D1  D2 month epidemic(measles) epidemic(flu) epidemic(rubella) … month epidemic(D2) epidemic(D1) D1  D2 month

39 Counting Elimination  Need to consider joint assignments;  Exponential number of those;  But actually, potential depends on histogram of values in assignment only: 00101 the same as 11000;  Polynomial number of assignments instead. epidemic(D2) epidemic(D1) D1  D2 month epidemic(measles) epidemic(flu) epidemic(rubella) … month

40 A Simple Experiment

43 BLOG (Bayesian LOGic)  Milch & Russell (2004);  A probabilistic logic language;  Current inference is propositional sampling;  Special characteristics: Open Universe; Expressive language.

44 type Battalion; #Battalion ~ Uniform(1,10); random Boolean Large(Battalion bat) ~ Bernoulli(0.6); random NaturalNum Region(Battalion bat) ~ TabularCPD[[0.3, 0.4, 0.2, 0.1]](); type Soldier; #Soldier(BatallionOf = Battalion bat) if Large(bat) then ~ Poisson(1500); else if Region(bat) = 2 then = 300; else ~ Poisson(500); query Average( {#Soldier(b) for Battalion b} ); BLOG (Bayesian LOGic) Open Universe Expressive language

45 Inference in BLOG random Boolean Attacked(bat) ~ Bernoulli(0.7); random Boolean Wounded(Soldier sol) if Attacked(BattalionOf(sol)) then if Experienced(sol) then ~ Bernoulli(0.1); else ~ Bernoulli(0.4); else = False; random Boolean Experienced(Soldier sol) ~ Bernoulli(0.1); random Boolean Rescue(Battalion bat) = exists Soldier sol BattalionOf(sol) = bat & Wounded(sol); guaranteed Battalion bat13; query Rescue(bat13);

46 Inference in BLOG - Sampling Rescue(bat13) #Soldier(bat13) Wounded(soldier1) Wounded(soldier73) Attacked(bat13)... Sampled false Open Universe false

47 Inference in BLOG - Sampling Rescue(bat13) #Soldier(bat13) Wounded(soldier1) Wounded(soldier49) Attacked(bat13)... Sampled true Open Universe true Experienced(soldier1) true

49 Temporal models in BLOG  Expressive enough to directly write temporal models type Aircraft; #Aircraft ~ Poisson[3]; random Real Position(Aircraft a, NaturalNum t) if t = 0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Pred(t)), 2); type Blip; // num of blips from aircraft a is 0 or 1 #Blip(Source = Aircraft a, Time = NaturalNum t) ~ TabularCPD[[0.2, 0.8]]() // num false alarms has Poisson distribution. #Blip(Time = NaturalNum t) ~ Poisson[2]; random Real ApparentPos(Blip b, NaturalNum t) ~ Gaussian(Position(Source(b), Pred(t)), 2));

50 Temporal models in BLOG  Inference algorithm does not use temporal structure: Markov property: state depends on previous state only; evidence and query come for successive time steps;  Dynamic BLOG (DBLOG);  analogous to Dynamic Bayesian networks (DBNs).

51 DBLOG Syntax  Only change in syntax is a special Timestep type, with same semantics as NaturalNum : type Aircraft; #Aircraft ~ Poisson[3]; random Real Position(Aircraft a, Timestep t) if t = @0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Prev(t)), 2); type Blip; // num of blips from aircraft a is 0 or 1 #Blip(Source = Aircraft a, Time = Timestep t) ~ TabularCPD[[0.2, 0.8]]() // num false alarms has Poisson distribution. #Blip(Time = Timestep t) ~ Poisson[2]; random Real ApparentPos(Blip b, Timestep t) ~ Gaussian(Position(Source(b), Prev(t)), 2));

52 Similar idea to DBN Particle Filtering DBLOG Particle Filtering evidence at t State samples for t - 1 Samples weighed by evidence resampling State samples for t Likelihood weighting

53 Weighting by evidence in DBNs evidence state t-1t

54 Weighting evidence in DBLOG evidence state Position(a1,@1) Position(a2,@1) #Aircraft t-1t obs {Blip b : Source(b) = a1} = {B1}; obs ApparentPos(B1, @2) = 3.2; ApparentPos(B1,@2) #Blip(a1,@2) Position(a1,@2) B1 Lazy instantiation

56 Retro-instantiation in DBLOG Position(a1,@1) Position(a2,@1) #Aircraft ApparentPos(B1,@2) #Blip(a1,@2) Position(a1,@2) B1 ApparentPos(B8,@9) #Blip(a2,@9) Position(a2,@9) B8 Position(a2,@2)... obs {Blip b : Source(b) = a2} = {B8}; obs ApparentPos(B8,@2) = 2.1;...

57 Coping with retro-instantiation  Analyse possible queries and evidence (provided by user) and pre-instantiate random variables that will be necessary later;  Use mixing time to decide when to stop sampling back.

59 Improving BLOG inference  Rejection sampling evidence = ? We count samples that match the evidence.

60 Improving BLOG inference  Likelihood weighting evidence We count samples using evidence likelihood as weight.

61 Improving BLOG inference {Happy(p) : Person p} = {true, true, false} Happy(john) = true Happy(mary) = false Happy(peter) =false Evidence is deterministic function, so likelihood is always 0 or 1. So likelihood weighting is no better than rejection sampling.

62 Improving BLOG inference {Salary(p) : Person p} = {90,000, 75,000, 30,000} Salary(john) = 100,000 Salary(mary) = 80,000 Salary(peter) =500,000 The more unlikely the evidence, the worse it gets.

63 Improving BLOG inference {ApparentPos(b) : Blip b} = {1.2, 0.5, 5.1} ApparentPos(b1) = 2.1 ApparentPos(b2) = 3.3 ApparentPos(b3) = 3.5 Doesn’t work at all for continuous evidence.

64 Structure-dependent automatic importance sampling {ApparentPos(b) : Blip b} = {1.2, 0.5, 5.1} ApparentPos(b1)ApparentPos(b2)ApparentPos(b3) Position(Source(b1)) = 1.5 Position(Source(b2)) = 3.9 Position(Source(b3)) = 4.0 = 1.2= 5.1= 0.5 High-level language allows algorithm to automatically apply better sampling schemes

66 For the future  Lifted inference works on symmetric, indistinguishable objects only; Adapt approximations for dealing with similar objects;  Parameterized queries: P(sick(X, measles)) = ?  Function symbols: (diabetes(X), diabetes(father(X)))  Equality (paramodulation);  Learning.

67 Conclusion  Expressive, first-order probabilistic representations in inference too!  Lifted inference equivalent to grounded inference; much faster, yet yielding the same exact answer;  DBLOG brings techniques for temporal processes reasoning to first- order probabilistic reasoning.

First-Order Probabilistic Inference Rodrigo de Salvo Braz.

Similar presentations

Presentation on theme: "First-Order Probabilistic Inference Rodrigo de Salvo Braz."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

First-Order Probabilistic Inference Rodrigo de Salvo Braz.

Similar presentations

Presentation on theme: "First-Order Probabilistic Inference Rodrigo de Salvo Braz."— Presentation transcript:

Similar presentations

About project

Feedback