First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International.

First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International

2 A remark Feel free to ask clarification questions!

3 Slides online  Just search for “Rodrigo de Salvo Braz” and check at the presentations page. (www.cs.berkeley.edu/~braz)

4 Outline  A goal: First-order Probabilistic Inference An ultimate goal Propositionalization  Lifted Inference Inversion Elimination Counting Elimination  DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference  Future Directions and Conclusion

6 How to solve commonsense reasoning How to solve Natural Language Processing How to solve Planning How to solve Vision Domain Knowledge Inference and Learning Language Knowledge Inference and Learning Domain & Planning Knowledge Inference and Learning Objects & Optics Knowledge Inference and Learning Abstracting Inference and Learning Commonsense reasoning Natural Language Processing PlanningVision... Inference and Learning AI Problems

7 Inference and Learning What capabilities should it have? Inference and Learning It should represent and use: predicates (relations) and quantification over objects: 8 X male(X) Ç female(X) 8 X male(X) ) sick(X) sick(p121) varying degrees of uncertainty: 8 X { male(X) ) sick(X) } since it may hold often but not absolutely. essential for language, vision, pattern recognition, commonsense reasoning etc modal knowledge, utilities, and more.

8 Abstracting Inference and Learning Domain knowledge: male(X) Ç female(X) { male(X) ) sick(X) } sick(o121)... Commonsense reasoning Language knowledge: { sentence(S) Æ verb(S,”had”) Æ object(S,”fever”) subject(S,X) ) fever(X) }... Natural Language Processing Inference and Learning Sharing inference and learning module

9 Abstracting Inference and Learning Domain knowledge: male(X) Ç female(X) { male(X) ) sick(X) } sick(o121)... Commonsense reasoning WITH Natural Language Processing Language knowledge: { sentence(S) Æ verb(S,”had”) Æ object(S,”fever”) subject(S,X) ) fever(X) } Inference and Learning Joint solution made simpler verb(s13,”had”), object(S, “fever”), subject(s13,o121)...

10 Standard solutions fall short  Logic Has objects and properties; but statements are absolute;  Graphical models and machine learning Have varying degrees of uncertainty; but propositional; treating objects and quantification is awkward.

11 Standard solutions fall short Graphical models and objects male(X) Ç female(X) { male(X) ) sick(X) } { friend(X,Y) Æ male(X) ) male(Y) } sick(o121) sick(o135) sick_121 male_o121female_o121 friend_o121_o135 male_o135female_o135 sick_o135 Transformation (propositionalization) not part of graphical model framework; Graphical model have only random variables, not objects; Different data creates distinct, formally unrelated model.

13 First-Order Probabilistic Inference  Many languages have been proposed  Knowledge-based model construction (Breese, 1992)  Probabilistic Logic Programming (Ng & Subrahmanian, 1992)  Probabilistic Logic Programming (Ngo and Haddawy, 1995)  Probabilistic Relational Models (Friedman et al., 1999)  Relational Dependency Networks (Neville & Jensen, 2004)  Bayesian Logic (BLOG) (Milch & Russell, 2004)  Bayesian Logic Programs (Kersting & DeRaedt, 2001)  DAPER (Heckerman, Meek & Koller, 2007)  Markov Logic Networks (Richardson & Domingos, 2004)  “Smart” propositionalization is the main method.

14 Propositionalization  Bayesian Logic Programs Prolog-like statements such as medicine(Hospital) | in(Patient,Hospital), sick(Patient). sick(Patient) | in(Patient,Hospital), exposed(Hospital). associated with CPTs and combination rules. | instead of :-

15 Propositionalization medicine(Hospital) | in(Patient,Hospital), sick(Patient). (plus CPT) BN grounded from and-or tree: medicine(hosp13) in(john,hosp13) sick(john) in(peter,hosp13) sick(peter)...

16 Propositionalization Multi-Entity Bayesian Networks (Laskey, 2004) in(Patient,Hospital)sick(Patient) medicine(Hospital) in(Patient,Hospital)exposed(Hospital) sick(Patient)

17 Propositionalization Multi-Entity Bayesian Networks (Laskey, 2004) in(Patient,Hospital)sick(Patient) medicine(Hospital) in(john,hosp13)sick(john) medicine(hosp13) in(peter,hosp13)sick(peter) medicine(hosp13)... first-order fragments instantiated

18 Propositionalization Multi-Entity Bayesian Networks (Laskey, 2004) in(john,hosp13)sick(john) medicine(hosp13) in(peter,hosp13)sick(peter)... in(john,hosp13)sick(john) medicine(hosp13) in(peter,hosp13)sick(peter) medicine(hosp13)...

20 Lifted Inference medicine(hosp13) medicine(Hospital) ( in(Patient,Hospital) Æ sick(Patient) { sick(Patient) ( in(Patient,Hospital) Æ exposed(Hospital) } sick(Patient) exposed(hosp13) in(Patient, hosp13) Unbound, general variable Faster More compact More intuitive Higher level – more information/structure available for optimization

22 Factor Networks (undirected) epidemic_france P(epi_france, epi_belgium, epi_uk, epi_germany) /   (epi_france, epi_belgium) *   (epi_france, epi_uk) *   (epi_france, epi_germany) *   (epi_belgium, epi_germany) epidemic_belgiumepidemic_germany epidemic_uk     factor, or potential function

26 epidemic sick_john sick_mary   (epidemic) P(epidemic) P(sick_john | epidemic) P(sick_mary | epidemic) P(sick_john) /  epidemic P(sick_john | epidemic) * P(epidemic) *  sick_mary P(sick_mary | epidemic) *   (epidemic) Inference: Variable Elimination (VE)

27 epidemic sick_john sick_mary   (epidemic) P(epidemic) P(sick_john | epidemic) P(sick_mary | epidemic) P(sick_john) /  epidemic P(sick_john | epidemic) * P(epidemic) *   (epidemic) *  sick_mary P(sick_mary | epidemic) Inference: Variable Elimination (VE)

28 epidemic sick_john   (epidemic) P(epidemic) P(sick_john | epidemic) P(sick_john) /  epidemic P(sick_john | epidemic) * P(epidemic) *   (epidemic) *   (epidemic)   (epidemic) Inference: Variable Elimination (VE)

29 sick_john   (sick_john) P(sick_john) /   (sick_john) Inference: Variable Elimination (VE)

30 A Factor Network sick(mary,measles) hospital(mary) epidemic(measles)epidemic(flu) sick(mary,flu) … … sick(bob,measles) hospital(bob) sick(bob,flu) …… … …… ……

31 First-order Representation sick(P,D) hospital(P) epidemic(D) Atoms represent a set of random variables Logical Variables parameterize random variables Parfactors, for parameterized factors Contraints to logical variables P  john

32 Semantics sick(mary,measles) hospital(mary) epidemic(measles)epidemic(flu) sick(mary,flu) … … sick(bob,measles) hospital(bob) sick(bob,flu) …… … …… ……   sick(mary,measles), epidemic(measles))

34 Inversion Elimination (IE) sick(P, D) epidemic(D) D  measles IE … sick(john, flu) epidemic(flu) sick(mary, rubella) epidemic(rubella) … Grounding … sick(john, flu) epidemic(flu) sick(mary, rubella) epidemic(rubella) … VE Abstraction epidemic(D) D  measles sick(P, D)  sick(P,D)  (epidemic(D),sick(P,D))

35 Inversion Elimination (IE)  Proposed by Poole (2003);  Lacked formalization;  Did not identify cases in which it is incorrect;  Formalized and identified correctness conditions (with Dan Roth and Eyal Amir).

36  Requires eliminated RVs to occur in separate instances of parfactor … sick(mary, flu) epidemic(flu) sick(mary, rubella) epidemic(rubella) … sick(mary, D) epidemic(D) D  measles Inversion Elimination correct Inversion Elimination - Limitations

37 IE not correct  Requires eliminated RVs to occur in separate instances of parfactor Inversion Elimination - Limitations epidemic(D2) epidemic(D1) D1  D2 month epidemic(measles) epidemic(flu) epidemic(rubella) … month epidemic(D2) epidemic(D1) D1  D2 month

39 Counting Elimination  Need to consider joint assignments;  Exponential number of those;  But actually, potential depends on histogram of values in assignment only: 00101 the same as 11000;  Polynomial number of assignments instead. epidemic(D2) epidemic(D1) D1  D2 month epidemic(measles) epidemic(flu) epidemic(rubella) … month

40 A Simple Experiment

43 BLOG (Bayesian LOGic)  Milch & Russell (2004);  A probabilistic logic language;  Current inference is propositional sampling;  Special characteristics: Open Universe; Expressive language.

44 type Hospital; #Hospital ~ Uniform(1,10); random Boolean Large(Hospital hosp) ~ Bernoulli(0.6); random NaturalNum Region(Hospital hosp) ~ TabularCPD[[0.3, 0.4, 0.2, 0.1]](); type Patient; #Patient(HospitalOf = Hospital hosp) if Large(hosp) then ~ Poisson(1500); else if Region(hosp) = 2 then = 300; else ~ Poisson(500); query Average( {#Patient(b) for Hospital b} ); BLOG (Bayesian LOGic) Open Universe Expressive language

45 Inference in BLOG random Boolean Exposed(hosp) ~ Bernoulli(0.7); random Boolean Sick(Patient patient) if Exposed(HospitalOf(patient)) then if Male(patient) then ~ Bernoulli(0.1); else ~ Bernoulli(0.4); else = False; random Boolean Male(Patient patient) ~ Bernoulli(0.5); random Boolean Medicine(Hospital hosp) = exists Patient patient HospitalOf(patient) = hosp & Sick(patient); guaranteed Hospital hosp13; query Medicine(hosp13);

46 Inference in BLOG - Sampling Medicine(hosp13) #Patient(hosp13) Sick(patient1) Sick(patient73) Exposed(hosp13)... Sampled false Open Universe false

47 Inference in BLOG - Sampling Medicine(hosp13) #Patient(hosp13) Sick(patient1) Exposed(hosp13) Sampled true Open Universe true Male(patient1) true

49 Temporal models in BLOG  Expressive enough to directly write temporal models type Aircraft; #Aircraft ~ Poisson[3]; random Real Position(Aircraft a, NaturalNum t) if t = 0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Pred(t)), 2); type Blip; // num of blips from aircraft a is 0 or 1 #Blip(Source = Aircraft a, Time = NaturalNum t) ~ TabularCPD[[0.2, 0.8]]() // num false alarms has Poisson distribution. #Blip(Time = NaturalNum t) ~ Poisson[2]; random Real BlipPosition(Blip b, NaturalNum t) ~ Gaussian(Position(Source(b), Pred(t)), 2));

50 Temporal models in BLOG  Inference algorithm does not use temporal structure: Markov property: state depends on previous state only; evidence and query come for successive time steps;  Dynamic BLOG (DBLOG);  analogous to Dynamic Bayesian networks (DBNs).

51 DBLOG Syntax  Only change in syntax is a special Timestep type, with same semantics as NaturalNum : type Aircraft; #Aircraft ~ Poisson[3]; random Real Position(Aircraft a, Timestep t) if t = @0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Prev(t)), 2); type Blip; // num of blips from aircraft a is 0 or 1 #Blip(Source = Aircraft a, Time = Timestep t) ~ TabularCPD[[0.2, 0.8]]() // num false alarms has Poisson distribution. #Blip(Time = Timestep t) ~ Poisson[2]; random Real BlipPosition(Blip b, Timestep t) ~ Gaussian(Position(Source(b), Prev(t)), 2));

52 Similar idea to DBN Particle Filtering DBLOG Particle Filtering evidence at t State samples for t - 1 Samples weighed by evidence resampling State samples for t Likelihood weighting

53 Weighting by evidence in DBNs evidence state t-1t

54 Weighting evidence in DBLOG evidence state Position(a1,@1) Position(a2,@1) #Aircraft t-1t obs {Blip b : Source(b) = a1} = {B1}; obs BlipPosition(B1, @2) = 3.2; BlipPosition(B1,@2) #Blip(a1,@2) Position(a1,@2) B1 Lazy instantiation

56 Retro-instantiation in DBLOG Position(a1,@1) Position(a2,@1) #Aircraft BlipPosition(B1,@2) #Blip(a1,@2) Position(a1,@2) B1 BlipPosition(B8,@9) #Blip(a2,@9) Position(a2,@9) B8 Position(a2,@2)... obs {Blip b : Source(b) = a2} = {B8}; obs BlipPosition(B8,@2) = 2.1;...

57 Coping with retro-instantiation  Analyse possible queries and evidence (provided by user) and pre-instantiate random variables that will be necessary later;  Use mixing time to decide when to stop sampling back.

59 Improving BLOG inference  Rejection sampling evidence = ? We count samples that match the evidence.

60 Improving BLOG inference  Likelihood weighting evidence We count samples using evidence likelihood as weight.

61 Improving BLOG inference {Happy(p) : Person p} = {true, true, false} Happy(john) = true Happy(mary) = false Happy(peter) =false Evidence is deterministic function, so likelihood is always 0 or 1. So likelihood weighting is no better than rejection sampling.

62 Improving BLOG inference {Salary(p) : Person p} = {90,000, 75,000, 30,000} Salary(john) = 100,000 Salary(mary) = 80,000 Salary(peter) =500,000 The more unlikely the evidence, the worse it gets.

63 Improving BLOG inference {BlipPosition(b) : Blip b} = {1.2, 0.5, 5.1} BlipPosition(b1) = 2.1 BlipPosition(b2) = 3.3 BlipPosition(b3) = 3.5 Doesn’t work at all for continuous evidence.

64 Structure-dependent automatic importance sampling {BlipPosition(b) : Blip b} = {1.2, 0.5, 5.1} BlipPosition(b1)BlipPosition(b2)BlipPosition(b3) Position(Source(b1)) = 1.5 Position(Source(b2)) = 3.9 Position(Source(b3)) = 4.0 = 1.2= 5.1= 0.5 High-level language allows algorithm to automatically apply better sampling schemes

66 For the future  Lifted inference works on symmetric, indistinguishable objects only; Adapt approximations for dealing with similar objects;  Parameterized queries: P(sick(X, measles)) = ?  Function symbols: (diabetes(X), diabetes(father(X)))  Equality (paramodulation);  Learning.

67 Conclusion  Expressive, first-order probabilistic representations in inference too!  Lifted inference equivalent to grounded inference; much faster, yet yielding the same exact answer;  DBLOG brings techniques for temporal processes reasoning to first- order probabilistic reasoning.

First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International.

Similar presentations

Presentation on theme: "First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International.

Similar presentations

Presentation on theme: "First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International."— Presentation transcript:

Similar presentations

About project

Feedback