Abductive Plan Recognition By Extending Bayesian Logic Programs Sindhu V. Raghavan & Raymond J. Mooney The University of Texas at Austin 1.

Slides:



Advertisements
Similar presentations
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Advertisements

Inference Rules Universal Instantiation Existential Generalization
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
CLASSICAL PLANNING What is planning ?  Planning is an AI approach to control  It is deliberation about actions  Key ideas  We have a model of the.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Automated Reasoning Systems For first order Predicate Logic.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
For Friday No reading Homework: –Chapter 9, exercise 4 (This is VERY short – do it while you’re running your tests) Make sure you keep variables and constants.
Agents That Reason Logically Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 7 Spring 2004.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
Methods of Proof Chapter 7, Part II. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound) generation.
Logic CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Logic.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
Department of Computer Science The University of Texas at Austin Probabilistic Abduction using Markov Logic Networks Rohit J. Kate Raymond J. Mooney.
Introduction of Probabilistic Reasoning and Bayesian Networks
Markov Logic: Combining Logic and Probability Parag Singla Dept. of Computer Science & Engineering Indian Institute of Technology Delhi.
Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.
Plan Recognition with Multi- Entity Bayesian Networks Kathryn Blackmond Laskey Department of Systems Engineering and Operations Research George Mason University.
APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,
Outline Recap Knowledge Representation I Textbook: Chapters 6, 7, 9 and 10.
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Lecture 5: Learning models using EM
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Knowledge Representation I (Propositional Logic) CSE 473.
Oregon State University – CS430 Intro to AI (c) 2003 Thomas G. Dietterich and Devika Subramanian1 Reasoning/Inference  Given a set of facts/beliefs/rules/evidence.
Methods of Proof Chapter 7, second half.
Knoweldge Representation & Reasoning
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
Propositional Logic Reasoning correctly computationally Chapter 7 or 8.
Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Extending Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12 th,
Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu Raghavan Advisor: Raymond Mooney PhD Oral Defense Nov 29 th,
Markov Logic And other SRL Approaches
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Fall 98 Introduction to Artificial Intelligence LECTURE 7: Knowledge Representation and Logic Motivation Knowledge bases and inferences Logic as a representation.
Logical Inference 2 rule based reasoning
‘In which we introduce a logic that is sufficent for building knowledge- based agents!’
Logical Agents Logic Propositional Logic Summary
CS344: Introduction to Artificial Intelligence Lecture: Herbrand’s Theorem Proving satisfiability of logic formulae using semantic trees (from Symbolic.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28– Interpretation; Herbrand Interpertation 30 th Sept, 2010.
Propositional Logic: Methods of Proof (Part II) This lecture topic: Propositional Logic (two lectures) Chapter (previous lecture, Part I) Chapter.
An Introduction to Artificial Intelligence – CE Chapter 7- Logical Agents Ramin Halavati
S P Vimal, Department of CSIS, BITS, Pilani
Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.
CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.
ARTIFICIAL INTELLIGENCE [INTELLIGENT AGENTS PARADIGM] Professor Janis Grundspenkis Riga Technical University Faculty of Computer Science and Information.
Automated Reasoning Systems For first order Predicate Logic.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Dr. Shazzad Hosain Department of EECS North South Universtiy Lecture 04 – Part B Propositional Logic.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
EA C461 Artificial Intelligence
Knowledge Representation and Reasoning
EA C461 – Artificial Intelligence Logical Agent
The Propositional Calculus
Artificial Intelligence: Agents and Propositional Logic.
Back to “Serious” Topics…
Methods of Proof Chapter 7, second half.
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

Abductive Plan Recognition By Extending Bayesian Logic Programs Sindhu V. Raghavan & Raymond J. Mooney The University of Texas at Austin 1

Plan Recognition  Predict an agent’s top-level plans based on the observed actions  Abductive reasoning involving inference of cause from effect  Applications  Story Understanding  Strategic Planning  Intelligent User Interfaces 2

$ cd test-dir $ cp test1.txt my-dir $ rm test1.txt $ cd test-dir $ cp test1.txt my-dir $ rm test1.txt What task is the user performing? move-file Which files and directories are involved? test1.txt and test-dir Plan Recognition in Intelligent User Interfaces 3 Data is relational in nature - several files and directories and several relations between them

Related Work  First-order logic based approaches [Kautz and Allen, 1986; Ng and Mooney, 1992]  Knowledge base of plans and actions  Default reasoning or logical abduction to predict the best plan based on the observed actions  Unable to handle uncertainty in data or estimate likelihood of alternative plans  Probabilistic graphical models [Charniak and Goldman, 1989; Huber et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005]  Encode the domain knowledge using Bayesian networks, abstract hidden Markov models, or statistical n-gram models  Unable to handle relational/structured data  Statistical Relational Learning based approaches  Markov Logic Networks for plan recognition [Kate and Mooney, 2009; Singla and Mooney, 2011] 4

Our Approach  Extend Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] for plan recognition  BLPs integrate first-order logic and Bayesian networks  Why BLPs?  Efficient grounding mechanism that includes only those variables that are relevant to the query  Easy to extend by incorporating any type of logical inference to construct networks  Well suited for capturing causal relations in data 5

Outline Motivation  Background  Logical Abduction  Bayesian Logic Programs (BLPs)  Extending BLPs for Plan Recognition  Experiments  Conclusions 6

Logical Abduction  Abduction  Process of finding the best explanation for a set of observations  Given  Background knowledge, B, in the form of a set of (Horn) clauses in first-order logic  Observations, O, in the form of atomic facts in first-order logic  Find  A hypothesis, H, a set of assumptions (atomic facts) that logically entail the observations given the theory: B  H  O  Best explanation is the one with the fewest assumptions 7

Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001]  Set of Bayesian clauses a|a 1,a 2,....,a n  Definite clauses that are universally quantified  Range-restricted, i.e variables{head} variables{body}  Associated conditional probability table (CPT) o P(head|body)  Bayesian predicates a, a 1, a 2, …, a n have finite domains  Combining rule like noisy-or for mapping multiple CPTs into a single CPT. 8

Inference in BLPs [Kersting and De Raedt, 2001]  Logical inference  Given a BLP and a query, SLD resolution is used to construct proofs for the query  Bayesian network construction  Each ground atom is a random variable  Edges are added from ground atoms in the body to the ground atom in head  CPTs specified by the conditional probability distribution for the corresponding clause  P(X) = P(X i | Pa(X i ))  Probabilistic inference  Marginal probability given evidence  Most Probable Explanation (MPE) given evidence 9

BLPs for Plan Recognition  SLD resolution is deductive inference, used for predicting observations from top-level plans  Plan recognition is abductive in nature and involves predicting the top-level plan from observations 10 BLPs cannot be used as is for plan recognition

Extending BLPs for Plan Recognition 11 BLPs Logical Abduction BALPs BALPs – Bayesian Abductive Logic Programs + =

Logical Abduction in BALPs  Given  A set of observation literals O = {O 1, O 2,….O n } and a knowledge base KB  Compute a set abductive proofs of O using Stickel’s abduction algorithm [Stickel 1988]  Backchain on each O i until it is proved or assumed  A literal is said to be proved if it unifies with a fact or the head of some rule in KB, otherwise it is said to be assumed  Construct a Bayesian network using the resulting set of proofs as in BLPs. 12

Example – Intelligent User Interfaces  Top-level plan predicates  copy-file, move-file, remove-file  Action predicates  cp, rm  Knowledge Base (KB)  cp(Filename,Destdir) | copy-file(Filename,Destdir)  cp(Filename,Destdir) | move-file(Filename,Destdir)  rm(Filename) | move-file(Filename,Destdir)  rm(Filename) | remove-file(Filename)  Observed actions  cp(test1.txt, mydir)  rm(test1.txt) 13

Abductive Inference 14 copy-file(test1.txt,mydir) cp(test1.txt,mydir) cp(Filename,Destdir) | copy-file(Filename,Destdir) Assumed literal

Abductive Inference 15 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) cp(Filename,Destdir) | move-file(Filename,Destdir) Assumed literal

Abductive Inference 16 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(Filename) | move-file(Filename,Destdir) rm(test1.txt) Match existing assumption

Abductive Inference 17 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(Filename) | remove-file(Filename) rm(test1.txt) remove-file(test1) Assumed literal

Structure of Bayesian network 18 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(test1.txt) remove-file(test1)

Probabilistic Inference  Specifying probabilistic parameters  Noisy-and o Specify the CPT for combining the evidence from conjuncts in the body of the clause  Noisy-or o Specify the CPT for combining the evidence from disjunctive contributions from different ground clauses with the same head o Models “explaining away”  Noisy-and and noisy-or models reduce the number of parameters learned from data 19

Probabilistic Inference 20 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(test1.txt) remove-file(test1) Noisy-or

Probabilistic Inference  Most Probable Explanation (MPE)  For multiple plans, compute MPE, the most likely combination of truth values to all unknown literals given this evidence  Marginal Probability  For single top level plan prediction, compute marginal probability for all instances of plan predicate and pick the instance with maximum probability  When exact inference is intractable, SampleSearch [Gogate and Dechter, 2007], an approximate inference algorithm for graphical models with deterministic constraints is used 21

Probabilistic Inference 22 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(test1.txt) remove-file(test1) Noisy-or

Probabilistic Inference 23 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(test1.txt) remove-file(test1) Noisy-or Evidence

Probabilistic Inference 24 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(test1.txt) remove-file(test1) Noisy-or Evidence Query variables

Probabilistic Inference 25 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(test1.txt) remove-file(test1) Noisy-or Evidence Query variables TRUE FALSE MPE

Probabilistic Inference 26 copy-file(test1.txt,mydir) cp(test1.txt,mydir) move-file(test1.txt,mydir) rm(test1.txt) remove-file(test1) Noisy-or Evidence Query variables TRUE FALSE MPE

Parameter Learning  Learn noisy-or/noisy-and parameters using the EM algorithm adapted for BLPs [Kersting and De Raedt, 2008]  Partial observability  In plan recognition domain, data is partially observable  Evidence is present only for observed actions and top- level plans; sub-goals, noisy-or, and noisy-and nodes are not observed  Simplify learning problem  Learn noisy-or parameters only  Used logical-and instead of noisy-and to combine evidence from conjuncts in the body of a clause 27

Experimental Evaluation  Monroe (Strategic planning)  Linux (Intelligent user interfaces)  Story Understanding (Story understanding) 28

Monroe and Linux [Blaylock and Allen, 2005]  Task  Monroe involves recognizing top level plans in an emergency response domain (artificially generated using HTN planner)  Linux involves recognizing top-level plans based on linux commands  Single correct plan in each example  Data 29 No. examples Avg. observations / example Total top-level plan predicates Total observed action predicates Monroe Linux

Monroe and Linux  Methodology  Manually encoded the knowledge base  Learned noisy-or parameters using EM  Computed marginal probability for plan instances  Systems compared  BALPs  MLN-HCAM [Singla and Mooney, 2011] o MLN-PC and MLN-HC do not run on Monroe and Linux due to scaling issues  Blaylock and Allen’s system [Blaylock and Allen, 2005]  Performance metric  Convergence score - measures the fraction of examples for which the plan predicate was predicted correctly 30

Results on Monroe * Convergence Score BALPs MLN-HCAMBlaylock & Allen * - Differences are statistically significant wrt BALPs

Results on Linux 32 Convergence Score BALPs MLN-HCAMBlaylock & Allen 36.1 * * - Differences are statistically significant wrt BALPs

Experiments with partial observability  Limitations of convergence score  Does not account for predicting the plan arguments correctly  Requires all the observations to be seen before plans can be predicted  Early plan recognition with partial set of observations  Perform plan recognition after observing the first 25%, 50%, 75%, and 100% of the observations  Accuracy – Assign partial credit for the predicting plan predicate and a subset of the arguments correctly  Systems compared  BALPs  MLN-HCAM [Singla and Mooney, 2011] 33

Results on Monroe 34 Percent observations seen Accuracy

Results on Linux 35 Accuracy Percent observations seen

Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992]  Task  Recognize character’s top level plans based on actions described in narrative text  Multiple top-level plans in each example  Data  25 examples in development set and 25 examples in test set  12.6 observations per example  8 top-level plan predicates 36

Story Understanding  Methodology  Knowledge base was created for ACCEL [Ng and Mooney, 1992]  Parameters set manually o Insufficient number of examples in the development set to learn parameters  Computed MPE to get the best set of plans  Systems compared  BALPs  MLN-HCAM [Singla and Mooney, 2011] o Best performing MLN model  ACCEL-Simplicity [Ng and Mooney, 1992]  ACCEL-Coherence [Ng and Mooney, 1992] o Specific for Story Understanding 37

Results on Story Understanding 38 * - Differences are statistically significant wrt BALPs * *

Conclusion  BALPS – Extension of BLPs for plan recognition by employing logical abduction to construct Bayesian networks  Automatic learning of model parameters using EM  Empirical results on all benchmark datasets demonstrate advantages over existing methods 39

Future Work  Learn abductive knowledge base automatically from data  Compare BALPs with other probabilistic logics like ProbLog [De Raedt et. al, 2007], PRISM [Sato, 1995] and Poole’s Horn Abduction [Poole, 1993] on plan recognition 40

Questions 41

Backup 42

Completeness in First-order Logic  Completeness - If a sentence is entailed by a KB, then it is possible to find the proof that entails it  Entailment in first-order logic is semidecidable, i.e it is not possible to know if a sentence is entailed by a KB or not  Resolution is complete in first-order logic  If a set of sentences is unsatisfiable, then it is possible to find a contradiction 43

First-order Logic  Terms  Constants – individual entities like anna, bob  Variables – placeholders for objects like X, Y  Predicates  Relations over entities like worksFor, capitalOf  Literal – predicate or its negation applied to terms  Atom – Positive literal like worksFor(X,Y)  Ground literal – literal with no variables like worksFor(anna,bob)  Clause – disjunction of literals  Horn clause has at most one positive literal  Definite clause has exactly one positive literal 44

First-order Logic  Quantifiers  Universal quantification - true for all objects in the domain  Existential quantification - true for some objects in the domain  Logical Inference  Forward Chaining– For every implication p  q, if p is true, then q is concluded to be true  Backward Chaining – For a query literal q, if an implication p  q is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p 45

Forward chaining  For every implication p  q, if p is true, then q is concluded to be true  Results in addition of a new fact to KB  Efficient, but incomplete  Inference can explode and forward chaining may never terminate  Addition of new facts might result in rules being satisfied  It is data-driven, not goal-driven  Might result in irrelevant conclusions 46

Backward chaining  For a query literal q, if an implication p  q is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p  Efficient, but not complete  May never terminate, might get stuck in infinite loop  Exponential  Goal-driven 47

Herbrand Model Semantics  Herbrand universe  All constants in the domain  Herbrand base  All ground atoms atoms over Herbrand universe  Herbrand interpretation  A set of ground atoms from Herbrand base that are true  Herbrand model  Herbrand interpretation that satisfies all clauses in the knowledge base 48

Advantages of SRL models over vanilla probabilistic models  Compactly represent domain knowledge in first- order logic  Employ logical inference to construct ground networks  Enables parameter sharing 49

Parameter sharing in SRL 50 father(john) parent(john) father(mary) parent(mary) father(alice) parent(alice) dummy θ1θ1 θ1θ1 θ2θ2 θ2θ2 θ3θ3 θ3θ3

Parameter sharing in SRL father(X)  parent(X) 51 father(john) parent(john) father(mary) parent(mary) father(alice) parent(alice) dummy θ θ θ θ θ θ θ θ

Noisy-and Model  Several causes c i have to occur simultaneously if event e has to occur  c i fails to trigger e with probability p i  inh accounts for some unknown cause due to which e has failed to trigger P(e) = (I – inh) Π i (1-p i )^(1-c i ) 52

Noisy-or Model  Several causes c i cause event e has to occur  c i independently triggers e with probability p i  leak accounts for some unknown cause due to which e has triggered P(e) = 1 – [(I – inh) Π i (1-p i )^(1-c i )]  Models explaining away  If there are several causes of an event, and if there is evidence for one of the causes, then the probability that the other causes have caused the event goes down 53

Noisy-and And Noisy-or Models 54 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire)

Noisy-and And Noisy-or Models 55 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) dummy1 dummy2 Noisy/logical-and Noisy-or

Logical Inference in BLPs SLD Resolution 56 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 57 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 58 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 59 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 60 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 61 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 62 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 63 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) lives(james,yorkshire) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 64 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) lives(james,yorkshire) tornado(yorkshire) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution 65 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) lives(james,yorkshire) tornado(yorkshire) Example from Ngo and Haddawy, 1997

Bayesian Network Construction 66 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses

Bayesian Network Construction 67 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses

Bayesian Network Construction 68 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire) tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses

Bayesian Network Construction 69 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses Use combining rule to combine multiple CPTs into a single CPT lives(stefan,freiburg) ✖

Probabilistic Inference 70 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) 4 parameters

Probabilistic Inference 71 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) 2 parameters θ1θ1 θ2θ2 θ3θ3 θ4θ4 Noisy models require parameters linear in the number of parents

Learning in BLPs [Kersting and De Raedt, 2008]  Parameter learning  Expectation Maximization  Gradient-ascent based learning  Both approaches optimize likelihood  Structure learning  Hill climbing search through the space of possible structures  Initial structure obtained from CLAUDIEN [De Raedt and Dehaspe, 1997]  Learns from only positive examples 72

Probabilistic Inference and Learning  Probabilistic inference  Marginal probability given evidence  Most Probable Explanation (MPE) given evidence  Learning [Kersting and De Raedt, 2008]  Parameters o Expectation Maximization o Gradient-ascent based learning  Structure o Hill climbing search through the space of possible structures 73

Expectation Maximization for BLPs/BALPs Perform logical inference to construct a ground Bayesian network for each example Let r denote rule, X denote a node, and Pa(X) denote parents of X E Step The inner sum is over all groundings of rule r M Step 74 * * * From SRL tutorial at ECML 07 74

Decomposable Combining Rules  Express different influences using separate nodes  These nodes can be combined using a deterministic function 75

Combining Rules 76 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) dummy1 dummy2 Logical-and Noisy-or

Decomposable Combining Rules 77 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) dummy1 dummy2 Logical-and Noisy-or dummy1-newdummy2 -new Logical-or

BLPs vs. PLPs  Differences in representation  In BLPs, Bayesian atoms take finite set of values, but in PLPs, each atom is logical in nature and it can take true or false  Instead of having neighborhood(x) = bad, in PLPs, we have neighborhood(x,bad)  To compute probability of a query alarm(james), PLPs have to construct one proof tree for all possible values for all predicates  Inference is cumbersome  BLPs subsume PLPs 78

BLPs vs. Poole's Horn Abduction  Differences in representation  For example, if P(x) and R(x) are two competing hypothesis, then either P(x) could be true or R(x) could be true  Prior probabilities of P(x) and R(x) should sum to 1  Restrictions of these kind are not there in BLPs  PLPs and hence BLPs are more flexible and have a richer representation 79

BLPs vs. PRMs  BLPs subsume PRMs  PRMs use entity-relationship models to represent knowledge and they use KBMC-like construction to construct a ground Bayesian network  Each attribute becomes a random variable in the ground network and relations over the entities are logical constraints  In BLP, each attribute becomes a Bayesian atom and relations become logical atoms  Aggregator functions can be transformed into combining rules 80

BLPs vs. RBNs  BLPs subsume RBNs  In RBNs, each node in BN is a predicate and probability formulae are used to specify probabilities  Combining rules can be used to represent these probability formulae in BLPs. 81

BALPs vs. BLPs  Like BLPs, BALPs use logic programs as templates for constructing Bayesian networks  Unlike BLPs, BALPs uses logical abduction instead of deduction to construct the network 82

Monroe [Blaylock and Allen, 2005]  Task  Recognize top level plans in an emergency response domain (artificially generated using HTN planner)  Plans include set-up-shelter, clear-road-wreck, provide- medical-attention  Single correct plan in each example  Domain consists of several entities and sub-goals  Test the ability to scale to large domains  Data  Contains 1000 examples 83

Monroe  Methodology  Knowledge base constructed based on the domain knowledge encoded in planner  Learn noisy-or parameters using EM  Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability  Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [Blaylock and Allen, 2005]  Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 84

Learning Results - Monroe 85 MWMW-StartRand-Start Conv Score98.4 Acc Acc Acc Acc

Linux [Blaylock and Allen, 2005]  Task  Recognize top level plans based on Linux commands  Human users asked to perform tasks in Linux and commands were recorded  Top-level plans include find-file-by-ext, remove-file-by-ext, copy-file, move-file  Single correct plan in each example  Tests the ability to handle noise in data o Users indicate success even when they have not achieved the task correctly o Some top-level plans like find-file-by-ext and file-file-by- name have identical actions  Data  Contains 457 examples 86

Linux  Methodology  Knowledge base constructed based on the knowledge of Linux commands  Learn noisy-or parameters using EM  Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability  Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [ Blaylock and Allen, 2005]  Convergence score - measures the fraction of examples for which the plan schema was predicted correctly o 87

Learning Results - Linux 88 Accuracy Partial Observability MWMW-StartRand-Start Conv Score

Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992]  Task  Recognize character’s top level plans based on actions described in narrative text  Logical representation of actions literals provided  Top-level plans include shopping, robbing, restaurant dining, partying  Multiple top-level plans in each example  Tests the ability to predict multiple plans  Data  25 development examples  25 test examples 89

Story Understanding  Methodology  Knowledge base constructed for ACCEL by Ng and Mooney [1992]  Insufficient number of examples to learn parameters o Noisy-or parameters set to 0.9 o Noisy-and parameters set to 0.9 o Priors tuned on development set  Compute MPE to get the best set of plans  Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o ACCEL-Simplicity [Ng and Mooney, 1992 ] o ACCEL-Coherence [Ng and Mooney, 1992] –Specific for Story Understanding 90

Results on Story Understanding 91 * - Differences are statistically significant wrt BALPs

Other Applications of BALPs  Medical diagnosis  Textual entailment  Computational biology  Inferring gene relations based on the output of micro-array experiments  Any application that requires abductive reasoning 92

ACCEL [Ng and Mooney, 1992]  First-order logic based system for plan recognition  Simplicity metric selects explanations that have the least number of assumptions  Coherence metric selects explanations that connect maximum number of observations  Measures explanatory coherence  Specific to text interpretation 93

System by Blaylock and Allen [2005]  Statistical n-gram models to predict plans based on observed actions  Performs plan recognition in two phases  Predicts the plan schema first  Predicts arguments based on the predicted schema 94