Download presentation
Presentation is loading. Please wait.
Published byRichard Moore Modified over 9 years ago
1
Extending Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12 th, 2011 1
2
$ cd my-dir $ cp test1.txt test-dir $ rm test1.txt $ cd my-dir $ cp test1.txt test-dir $ rm test1.txt What task is the user performing? move-file Which files and directories are involved? test1.tx and test-dir Plan Recognition in Intelligent User Interfaces 2 Can the task be performed more efficiently? Data is relational in nature - several files and directories and several relations between them
3
Characteristics of Real World Data Relational or structured data Several entities in the domain Several relations between entities Do not always follow the i.i.d assumption Presence of noise or uncertainty Uncertainty in the types of entities Uncertainty in the relations 3 Traditional approaches like first-order logic or probabilistic models can handle either structured data or uncertainty, but not both.
4
Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] –Combines strengths of both first-order logic and probabilistic models SRL formalisms –Stochastic Logic Programs (SLPs) [Muggleton, 1996] –Probabilistic Relational Models (PRMs) [Friedman et al., 1999] –Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] –Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006] 4
5
Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] –Combines strengths of both first-order logic and probabilistic models SRL formalisms –Stochastic Logic Programs (SLPs) [Muggleton, 1996] –Probabilistic Relational Models (PRMs) [Friedman et al., 1999] –Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] –Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006] 5
6
Bayesian Logic Programs (BLPs) Integrate first-order logic and Bayesian networks Why BLPs? Efficient grounding mechanism that includes only those variables that are relevant to the query Easy to extend by incorporating any type of logical inference to construct networks Well suited for capturing causal relations in data 6
7
Objectives 7 BLPs for Plan Recognition BLPs for Machine Reading
8
Objectives 8 Plan recognition involves predicting the top-level plan of an agent based on its actions BLPs for Machine Reading
9
Objectives 9 BLPs for Plan Recognition Machine Reading involves automatic extraction of knowledge from natural language text
10
Outline Motivation Background First-order logic Logical Abduction Bayesian Logic Programs (BLPs) Completed Work Part 1 – Extending BLPs for Plan Recognition Part 2 – Extending BLPs for Machine Reading Proposed Work Conclusions 10
11
First-order Logic Terms Constants – individual entities like anna, bob Variables – placeholders for objects like X, Y Predicates Relations over entities like worksFor, capitalOf Literal – predicate or its negation applied to terms Atom – Positive literal like worksFor(X,Y) Ground literal – literal with no variables like worksFor(anna,bob) Clause – disjunction of literals Horn clause has at most one positive literal Definite clause has exactly one positive literal 11
12
First-order Logic Quantifiers Universal quantification - true for all objects in the domain Existential quantification - true for some objects in the domain Logical Inference Forward Chaining– For every implication p q, if p is true, then q is concluded to be true Backward Chaining – For a query literal q, if an implication p q is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p 12
13
Logical Abduction Abduction Process of finding the best explanation for a set of observations Given Background knowledge, B, in the form of a set of (Horn) clauses in first-order logic Observations, O, in the form of atomic facts in first-order logic Find A hypothesis, H, a set of assumptions (atomic facts) that logically entail the observations given the theory: B H O Best explanation is the one with the fewest assumptions 13
14
Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Set of Bayesian clauses a|a 1,a 2,....,a n Definite clauses that are universally quantified Head of the clause - a Body of the clause - a 1, a 2, …, a n Range-restricted, i.e variables{head} variables{body} Associated conditional probability table (CPT) o P(head|body) Bayesian predicates a, a 1, a 2, …, a n have finite domains Combining rule like noisy-or for mapping multiple CPTs into a single CPT. 14
15
Logical Inference in BLPs SLD Resolution 15 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) Example from Ngo and Haddawy, 1997
16
Logical Inference in BLPs SLD Resolution 16 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) Example from Ngo and Haddawy, 1997
17
Logical Inference in BLPs SLD Resolution 17 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997
18
Logical Inference in BLPs SLD Resolution 18 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997
19
Logical Inference in BLPs SLD Resolution 19 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997
20
Logical Inference in BLPs SLD Resolution 20 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) Example from Ngo and Haddawy, 1997
21
Logical Inference in BLPs SLD Resolution 21 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) Example from Ngo and Haddawy, 1997
22
Logical Inference in BLPs SLD Resolution 22 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) lives(james,yorkshire) Example from Ngo and Haddawy, 1997
23
Logical Inference in BLPs SLD Resolution 23 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) lives(james,yorkshire) tornado(yorkshire) Example from Ngo and Haddawy, 1997
24
Logical Inference in BLPs SLD Resolution 24 BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Proof alarm(james) burglary(james) neighborhood(james) lives(james,Y) tornado(Y) lives(james,yorkshire) tornado(yorkshire) Example from Ngo and Haddawy, 1997
25
Bayesian Network Construction 25 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses
26
Bayesian Network Construction 26 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses
27
Bayesian Network Construction 27 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire) tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses
28
Bayesian Network Construction 28 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses Use combining rule to combine multiple CPTs into a single CPT lives(stefan,freiburg) ✖
29
Probabilistic Inference and Learning Probabilistic inference Marginal probability given evidence Most Probable Explanation (MPE) given evidence Learning [Kersting and De Raedt, 2008] Parameters o Expectation Maximization o Gradient-ascent based learning Structure o Hill climbing search through the space of possible structures 29
30
Part 1 Extending BLPs for Plan Recognition [Raghavan and Mooney, 2010] 30
31
Plan Recognition Predict an agent’s top-level plans based on the observed actions Abductive reasoning involving inference of cause from effect Applications Story Understanding o Recognize character’s motives or plans based on its actions to answer questions about the story Strategic planning o Predict other agents’ plans so as to work co-operatively Intelligent User Interfaces o Predict the task that the user is performing so as to provide valuable tips to perform the task more efficiently 31
32
Related Work First-order logic based approaches [Kautz and Allen, 1986; Ng and Mooney, 1992] Knowledge base of plans and actions Default reasoning or logical abduction to predict the best plan based on the observed actions Unable to handle uncertainty in data or estimate likelihood of alternative plans Probabilistic graphical models [Charniak and Goldman, 1989; Huber et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005] Encode the domain knowledge using Bayesian networks, abstract hidden Markov models, or statistical n-gram models Unable to handle relational/structured data 32
33
Related Work Markov Logic based approaches [Kate and Mooney, 2009; Singla and Mooney, 2011] Logical inference in MLNs is deductive in nature MLN-PC [Kate and Mooney, 2009] o Add reverse implications to handle abductive reasoning o Does not scale to large domains MLN-HC [Singla and Mooney, 2011] o Improves over MLN-PC by adding hidden causes o Still does not scale to large domains MLN-HCAM [Singla and Mooney, 2011] uses logical abduction to constrain grounding o Incorporates ideas from the BLP approach for plan recognition [Raghavan and Mooney, 2010] o MLN-HCAM performs better than MLN-PC and MLN-HC 33
34
BLPs for Plan Recognition Why BLPs ? Directed model captures causal relationships well Efficient grounding process results in smaller networks, unlike in MLNs SLD resolution is deductive inference, used for predicting observed actions from top-level plans Plan recognition is abductive in nature and involves predicting the top-level plan from observed actions 34 BLPs cannot be used as is for plan recognition
35
Extending BLPs for Plan Recognition 35 BLPs Logical Abduction BALPs BALPs – Bayesian Abductive Logic Programs
36
Logical Abduction in BALPs Given A set of observation literals O = {O 1, O 2,….O n } and a knowledge base KB Compute a set abductive proofs of O using Stickel’s abduction algorithm [Stickel 1988] Backchain on each O i until it is proved or assumed A literal is said to be proved if it unifies with a fact or the head of some rule in KB, otherwise it is said to be assumed Construct a Bayesian network using the resulting set of proofs as in BLPs. 36
37
Example – Intelligent User Interfaces Top-level plans predicates copy-file, move-file, remove-file Actions predicates cp, rm Knowledge Base (KB) cp(filename,destdir) | copy-file(filename,destdir) cp(filename,destdir) | move-file(filename,destdir) rm(filename) | move-file(filename,destdir) rm(filename) | remove-file(filename) Observed actions cp(Test1.txt, Mydir) rm(Test1.txt) 37
38
Abductive Inference 38 copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) cp(filename,destdir) | copy-file(filename,destdir) Assumed literal
39
Abductive Inference 39 copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1.txt,Mydir) cp(filename,destdir) | move-file(filename,destdir) Assumed literal
40
Abductive Inference 40 copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1.txt,Mydir) rm(filename) | move-file(filename,destdir) rm(Test1.txt) Match existing assumption
41
Abductive Inference 41 copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1.txt,Mydir) rm(filename) | remove-file(filename) rm(Test1.txt) remove-file(Test1) Assumed literal
42
Structure of Bayesian network 42 copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1.txt,Mydir) rm(Test1.txt) remove-file(Test1)
43
Probabilistic Inference Specifying probabilistic parameters Noisy-and o Specify the CPT for combining the evidence from conjuncts in the body of the clause Noisy-or o Specify the CPT for combining the evidence from disjunctive contributions from different ground clauses with the same head o Models “explaining away” Noisy-and and noisy-or models reduce the number of parameters learned from data 43
44
Probabilistic Inference 44 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) 4 parameters
45
Probabilistic Inference 45 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) 2 parameters θ1θ1 θ2θ2 θ3θ3 θ4θ4 Noisy models require parameters linear in the number of parents
46
Probabilistic Inference Most Probable Explanation (MPE) For multiple plans, compute MPE, the most likely combination of truth values to all unknown literals given this evidence Marginal Probability For single top level plan prediction, compute marginal probability for all instances of plan predicate and pick the instance with maximum probability When exact inference is intractable, SampleSearch [Gogate and Dechter, 2007], an approximate inference algorithm for graphical models with deterministic constraints is used 46
47
Probabilistic Inference 47 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) Noisy-or
48
Probabilistic Inference 48 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) Noisy-or Evidence
49
Probabilistic Inference 49 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) Noisy-or Evidence Query variables
50
Probabilistic Inference 50 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) Noisy-or Evidence Query variables TRUE FALSE MPE
51
Probabilistic Inference 51 copy-file(Test1,txt,Mydir) cp(Test1.txt,Mydir) move-file(Test1,txt,Mydir) rm(Test1.txt) remove-file(Test1) Noisy-or Evidence Query variables TRUE FALSE MPE
52
Parameter Learning Learn noisy-or/noisy-and parameters using the EM algorithm adapted for BLPs [Kersting and De Raedt, 2008] Partial observability In plan recognition domain, data is partially observable Evidence is present only for observed actions and top- level plans; sub-goals, noisy-or, and noisy-and nodes are not observed Simplify learning problem Learn noisy-or parameters only Used logical-and instead of noisy-and to combine evidence from conjuncts in the body of a clause 52
53
Experimental Evaluation Monroe (Strategic planning) Linux (Intelligent user interfaces) Story Understanding (Story understanding) 53
54
Monroe and Linux [Blaylock and Allen, 2005] Task Monroe involves recognizing top level plans in an emergency response domain (artificially generated using HTN planner) Linux involves recognizing top-level plans based on linux commands Single correct plan in each example Data 54 No. examples Avg. observations / example Total top-level plan predicates Total observed action predicates Monroe100010.1910 30 Linux4576.119 43
55
Monroe and Linux Methodology Manually encoded the knowledge base Learned noisy-or parameters using EM Computed marginal probability for plan instances Systems compared BALPs MLN-HCAM [Singla and Mooney, 2011] o MLN-PC and MLN-HC do not run on Monroe and Linux due to scaling issues Blaylock and Allen’s system [Blaylock and Allen, 2005] Performance metric Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 55
56
Results on Monroe 56 94.2 * Convergence Score BALPs MLN-HCAMBlaylock & Allen * - Differences are statistically significant wrt BALPs
57
Results on Linux 57 Convergence Score BALPs MLN-HCAMBlaylock & Allen 36.1 * * - Differences are statistically significant wrt BALPs
58
Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992] Task Recognize character’s top level plans based on actions described in narrative text Multiple top-level plans in each example Data 25 examples in development set and 25 examples in test set 12.6 observations per example 8 top-level plan predicates 58
59
Story Understanding Methodology Knowledge base was created for ACCEL [Ng and Mooney, 1992] Parameters set manually o Insufficient number of examples in the development set to learn parameters Computed MPE to get the best set of plans Systems compared BALPs MLN-HCAM [Singla and Mooney, 2011] o Best performing MLN model ACCEL-Simplicity [Ng and Mooney, 1992] ACCEL-Coherence [Ng and Mooney, 1992] o Specific for Story Understanding 59
60
Results on Story Understanding 60 * - Differences are statistically significant wrt BALPs
61
Results on Story Understanding 61 * - Differences are statistically significant wrt BALPs * *
62
Summary of BALPs for Plan Recognition Extend BLPs for plan recognition by employing logical abduction to construct Bayesian networks Automatic learning of model parameters using EM Empirical results on all benchmark datasets demonstrate advantages over existing methods 62
63
Part 2 Extending BLPs for Machine Reading 63
64
Machine Reading Machine reading involves automatic extraction of knowledge from natural language text Information extraction (IE) systems extract factual information like entities and relations between entities that occur in text [Cohen, 1999; Craven et al., 2000; Bunescu and Mooney, 2007; Etzioni et al, 2008] Extracted information can be used for answering queries automatically 64
65
Limitations of IE Systems Extract information that is stated explicitly in text Natural language text is not necessarily complete Commonsense information is not explicitly stated in text Well known facts are omitted from the text Missing information cannot be inferred from text Human brain performs deeper inference using commonsense knowledge IE systems have no access to commonsense knowledge Errors in extraction process result in some facts not being extracted 65
66
Our Objective Improve performance of IE Learn general knowledge or commonsense information in the form of rules using the facts extracted by the IE system Infer additional facts that are not stated explicitly in text using the learned rules 66
67
Example 67 “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’
68
Example 68 “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad)
69
Example 69 “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Query isLedBy(X,Y) Query isLedBy(X,Y) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad)
70
Example 70 “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahatir-mohamad) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahatir-mohamad) Query isLedBy(X,Y) Query isLedBy(X,Y) X = malaysian Y = mahathir-mohamad X = malaysian Y = mahathir-mohamad
71
Example 71 “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Query citizenOf(mahathir-mohamad,Y) Query citizenOf(mahathir-mohamad,Y) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad)
72
Example 72 “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor." Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahatir-mohamad) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahatir-mohamad) Query citizenOf(mahathir-mohamad,Y) Query citizenOf(mahathir-mohamad,Y) Y = ?
73
Related Work Learning propositional rules [Nahm and Mooney, 2000] Learn propositional rules from the output of an IE system on computer-related job postings Perform deductive inference to infer new facts Learning first-order rules [Carlson at el., 2010, Schoenmackers et al., 2010; Doppa et al., 2010] Carlson et al. and Doppa et al. modify existing rule learners like FOIL and FARMER to learn probabilistic rules o Perform deductive inference to infer additional facts Schoenmackers et al. develop a new first-order rule learner based on statistical relevance o Use MLN based approach to infer additional facts 73
74
Our Approach Use an off-the shelf IE system to extract facts Learn commonsense knowledge from the extracted facts in the form of first-order-rules Infer additional facts based on the learned rules using BLPs Pure logical deduction results in inferring a large number of facts Inference using BLPs is probabilistic in nature, i.e inferred facts are assigned probabilities. Probabilities can be used to filter out facts with high confidence from the rest Efficient grounding mechanism in BLPs enables our approach to scale to natural language text 74
75
Extending BLPs for Machine Reading SLD resolution does not result in inference of new facts Given a query, SLD resolution will generated deductive proofs that prove the query Employ forward chaining to infer additional facts Forward chain on extracted facts to infer new facts Use the deductive proofs from forward chaining to construct Bayesian network for probabilistic inference 75
76
Learning First-order Rules 76 “ Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA……. ’’ Extracted facts nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) citizenOf(BarackObama,USA) Extracted facts nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) citizenOf(BarackObama,USA)
77
Learning Patterns from Extracted Facts nationState(USA) ∧ isLedBy(USA,BarackObama) citizenOf(BarackObama,USA) nationState(USA) ∧ isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasBirthPlace(BarackObama,USA) citizenOf(BarackObama,USA). 77
78
Generalizing Patterns to Rules 78 nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y) nationState(Y) ∧ isLedBy(Y,X) hasBirthPlace(X,Y) hasBirthPlace(X,Y) citizenOf(X,Y) nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y) nationState(Y) ∧ isLedBy(Y,X) hasBirthPlace(X,Y) hasBirthPlace(X,Y) citizenOf(X,Y)
79
Inductive Logic Programming (ILP) [Muggleton, 1992] 79 ILP Rule Learner ILP Rule Learner Target relation citizenOf(X,Y) Positive instances citizenOf(BarackObama, USA) citizenOf(GeorgeBush, USA) citizenOf(IndiraGandhi,I ndia). Negative instances citizenOf(BarackObama, India) citizenOf(GeorgeBush, India) citizenOf(IndiraGandhi,U SA). KB hasBirthPlace(BarackObama,USA) person(BarackObama) nationState(USA) nationState(India). Rules nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y). Rules nationState(Y) ∧ isLedBy(Y,X) citizenOf(X,Y).
80
Example 80 Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) Extracted facts nationState(malaysian) Person(mahathir-mohamad) isLedBy(malaysian,mahathir-mohamad) Learned rule nationState(B) ∧ isLedBy(B,A) citizenOf(A,B) Learned rule nationState(B) ∧ isLedBy(B,A) citizenOf(A,B) “Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.’’
81
Inference of Additional Facts 81 nationState(B) ∧ isLedBy(B,A) citizenOf(A,B) nationState(malaysian ) isLedBy(malaysian,mahathir- mohamad) citizenOf(mahathir-mohamad, malaysian)
82
Experimental Evaluation Data DARPA’s intelligent community (IC) data set Consists of news articles on politics, terrorism, and others international events 2000 documents, split into training and test in the ratio 9:1 Approximately 30,000 sentences Information Extraction Information extraction using SIRE, IBM’s information extraction system [Florin et al., 2004] 82
83
Experimental Evaluation Learning first-order rules using LIME [McCreath and Sharma, 1998] Learns rules from only positive instances Handles noise in data Models BLP-Pos-Only o Learn rules using only positive instances BLP-Pos-Neg o Learn rules using both positive and negative instances o Apply closed world assumption to automatically generate negative instances BLP-All o Includes rules learned from BLP-Pos-Only and BLP-Pos- Neg 83
84
Experimental Evaluation Specifying probabilistic parameters Manually specified parameters Logical-and model to combine evidence from conjuncts in the body of the clause Set noisy-or parameters to 0.9 Set priors to maximum likelihood estimates 84
85
Experimental Evaluation Methodology Learned rules for 13 relations including hasCitizenship, hasMember, attendedSchool Baseline o Perform pure logical deduction to infer all possible additional facts BLP-0.9 o Facts inferred using the BLP approach with marginal probability greater or equal to 0.9 BLP-0.95 o Facts inferred using the BLP approach with marginal probability greater or equal to 0.95 85
86
Performance Evaluation Precision Evaluate if the inferred facts are correct, i.e if they can be inferred from the document o No ground truth information available o Perform manual evaluation o Sample 20 documents randomly from the test set and evaluate inferred facts manually for all models 86
87
Precision (preliminary results) Model BLP-ALLBLP-Pos-NegBLP-Pos-Only Baseline – Purely logical approach [No. facts inferred] 3813751849 Precision 24.9416.0031.80 87
88
Precision (preliminary results) Model BLP-ALLBLP-Pos-NegBLP-Pos-Only Baseline – Purely logical approach [No. facts inferred] 3813751849 Precision 24.9416.0031.80 BLP-0.9 [No. facts inferred] 1208661158 Precision 34.6813.6335.49 88
89
Precision (preliminary results) Model BLP-ALLBLP-Pos-NegBLP-Pos-Only Baseline – Purely logical approach [No. facts inferred] 3813751849 Precision 24.9416.0031.80 BLP-0.9 [No. facts inferred] 1208661158 Precision 34.6813.6335.49 BLP-0.95 [No. of facts inferred] 5610 Precision 91.07100Nil 89
90
Performance Evaluation Estimated recall For each target relation, eliminate instances of it from the extracted facts in the test set Try to infer the eliminated instances correctly based on the remaining facts using the BLP approach o For each threshold point (0.1 to 1), compute fraction of eliminated instances that were inferred correctly o True recall cannot be calculated because of lack of ground truth 90
91
Estimated Recall (preliminary results) 91 Estimated recall Confidence threshold
92
Estimated Recall (preliminary results) 92 Estimated recall Confidence threshold
93
Estimated Recall (preliminary results) 93 Estimated recall Confidence threshold
94
Estimated Recall (preliminary results) 94 Estimated recall Confidence threshold
95
First-order Rules from LIME politicalParty(A) ^ employs(A,B) hasMemberPerson(A,B) building(B) ^ eventLocation(A,B) ^ bombing(A) thingPhysicallyDamaged(A,B) employs(A,B) hasMember(A,B) citizenOf(A,B) hasCitizenship(A,B) nationState(B) ^ person(A) ^ employs(B,A) hasBirthPlace(A,B) 95
96
Proposed Work Learning parameters automatically from data EM or gradient-ascent based algorithm adapted for BLPs by Kersting and De Raedt [2008] Evaluation Human evaluation for inferred facts using Amazon Mechanical Turk [Callison-Burch and Dredze, 2010] Use inferred facts to answer queries constructed for evaluation in the DARPA sponsored machine reading project Comparison to existing approaches like MLNs Develop a structure learner for incomplete and uncertain data 96
97
Future Extensions Multiple predicate learning Learn rules for multiple relations at the same time Discriminative parameter learning for BLPs EM and gradient-ascent based algorithms for learning BLP parameters optimize likelihood of the data No algorithm that learns parameters discriminatively for BLPs has been developed to date Comparison of BALPs to other probabilistic logics for plan recognition Comparison to PRISM [Sato, 1995], Poole’s Horn Abduction [Poole, 1993], Abductive Stochastic Logic Programs [Tammadoni- Nezhad et al., 2006] 97
98
Conclusions Extended BLPs to abductive plan recognition Enhance BLPs with logical abduction, resulting in Bayesian Abductive Logic Programs (BALPs) BALPs outperform state-of-the-art approaches to plan recognition on several benchmark data sets Extended BLPs to machine reading Preliminary results on the IC data set are encouraging Perform an extensive evaluation in immediate future Develop a first-order rule learner that learns from incomplete and uncertain data in future 98
99
Questions 99
100
Backup 100
101
Timeline Additional experiments comparing MLN-HCAM and BALPs for plan recognition [Target completion – August 2011] Use Sample Search to perform inference in MLN-HCAM Evaluation of BLPs for Machine Reading [Target completion – November 2011] Learn parameters automatically Evaluation using Amazon Mechanical Turk Evaluation of BLPs for answering queries Comparison of BLPs with MLNs for machine reading Developing structure learner from positive and uncertain examples [Target completion – May 2012] Proposed defense and graduation [August 2012] 101
102
Completeness in First-order Logic Completeness - If a sentence is entailed by a KB, then it is possible to find the proof that entails it Entailment in first-order logic is semidecidable, i.e it is not possible to know if a sentence is entailed by a KB or not Resolution is complete in first-order logic If a set of sentences is unsatisfiable, then it is possible to find a contradiction 102
103
Forward chaining For every implication p q, if p is true, then q is concluded to be true Results in addition of a new fact to KB Efficient, but incomplete Inference can explode and forward chaining may never terminate Addition of new facts might result in rules being satisfied It is data-driven, not goal-driven Might result in irrelevant conclusions 103
104
Backward chaining For a query literal q, if an implication p q is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p Efficient, but not complete May never terminate, might get stuck in infinite loop Exponential Goal-driven 104
105
Herbrand Model Semantics Herbrand universe All constants in the domain Herbrand base All ground atoms atoms over Herbrand universe Herbrand interpretation A set of ground atoms from Herbrand base that are true Herbrand model Herbrand interpretation that satisfies all clauses in the knowledge base 105
106
Advantages of SRL models over vanilla probabilistic models Compactly represent domain knowledge in first- order logic Employ logical inference to construct ground networks Enables parameter sharing 106
107
Parameter sharing in SRL 107 father(john) parent(john) father(mary) parent(mary) father(alice) parent(alice) dummy θ1θ1 θ1θ1 θ2θ2 θ2θ2 θ3θ3 θ3θ3
108
Parameter sharing in SRL father(X) parent(X) 108 father(john) parent(john) father(mary) parent(mary) father(alice) parent(alice) dummy θ θ θ θ θ θ θ θ
109
Noisy-and Model Several causes c i have to occur simultaneously if event e has to occur c i fails to trigger e with probability p i inh accounts for some unknown cause due to which e has failed to trigger P(e) = (I – inh) Π i (1-p i )^(1-c i ) 109
110
Noisy-or Model Several causes c i cause event e has to occur c i independently triggers e with probability p i leak accounts for some unknown cause due to which e has triggered P(e) = 1 – [(I – inh) Π i (1-p i )^(1-c i )] Models explaining away If there are several causes of an event, and if there is evidence for one of the causes, then the probability that the other causes have caused the event goes down 110
111
Noisy-and And Noisy-or Models 111 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire)
112
Noisy-and And Noisy-or Models 112 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) dummy1 dummy2 Noisy/logical-and Noisy-or
113
Inference in BLPs [Kersting and De Raedt, 2001] Logical inference Given a BLP and a query, SLD resolution is used to construct proofs for the query Bayesian network construction Each ground atom is a random variable Edges are added from ground atoms in the body to the ground atom in head CPTs specified by the conditional probability distribution for the corresponding clause P(X) = P(X i | Pa(X i )) Probabilistic inference Marginal probability given evidence Most Probable Explanation (MPE) given evidence 113
114
Learning in BLPs [Kersting and De Raedt, 2008] Parameter learning Expectation Maximization Gradient-ascent based learning Both approaches optimize likelihood Structure learning Hill climbing search through the space of possible structures Initial structure obtained from CLAUDIEN [De Raedt and Dehaspe, 1997] Learns from only positive examples 114
115
Expectation Maximization for BLPs/BALPs Perform logical inference to construct a ground Bayesian network for each example Let r denote rule, X denote a node, and Pa(X) denote parents of X E Step The inner sum is over all groundings of rule r M Step 115 * * * From SRL tutorial at ECML 07 115
116
Decomposable Combining Rules Express different influences using separate nodes These nodes can be combined using a deterministic function 116
117
Combining Rules 117 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) dummy1 dummy2 Logical-and Noisy-or
118
Decomposable Combining Rules 118 alarm(james) burglary(james) neighborhood(james) lives(james,yorkshire)tornado(yorkshire) dummy1 dummy2 Logical-and Noisy-or dummy1-newdummy2 -new Logical-or
119
BLPs vs. PLPs Differences in representation In BLPs, Bayesian atoms take finite set of values, but in PLPs, each atom is logical in nature and it can take true or false Instead of having neighborhood(x) = bad, in PLPs, we have neighborhood(x,bad) To compute probability of a query alarm(james), PLPs have to construct one proof tree for all possible values for all predicates Inference is cumbersome BLPs subsume PLPs 119
120
BLPs vs. Poole's Horn Abduction Differences in representation For example, if P(x) and R(x) are two competing hypothesis, then either P(x) could be true or R(x) could be true Prior probabilities of P(x) and R(x) should sum to 1 Restrictions of these kind are are not these in BLPs PLPs and hence BLPs are more flexible and have a richer representation 120
121
BLPs vs. PRMs BLPs subsume PRMs PRMs use entity-relationship models to represent knowledge and they use KBMC-like construction to construct a ground Bayesian network Each attribute becomes a random variable in the ground network and relations over the entities are logical constraints In BLP, each attribute becomes a Bayesian atom and relations become logical atoms Aggregator functions can be transformed into combining rules 121
122
BLPs vs. RBNs BLPs subsume RBNs In RBNs, each node in BN is a predicate and probability formulae are used to specify probabilities Combining rules can be used to represent these probability formulae in BLPs. 122
123
BALPs vs. BLPs Like BLPs, BALPs use logic programs as templates for constructing Bayesian networks Unlike BLPs, BALPs uses logical abduction instead of deduction to construct the network 123
124
Monroe [Blaylock and Allen, 2005] Task Recognize top level plans in an emergency response domain (artificially generated using HTN planner) Plans include set-up-shelter, clear-road-wreck, provide- medical-attention Single correct plan in each example Domain consists of several entities and sub-goals Test the ability to scale to large domains Data Contains 1000 examples 124
125
Monroe Methodology Knowledge base constructed based on the domain knowledge encoded in planner Learn noisy-or parameters using EM Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [Blaylock and Allen, 2005] Convergence score - measures the fraction of examples for which the plan schema was predicted correctly 125
126
Learning Results - Monroe 126 MWMW-StartRand-Start Conv Score98.4 Acc-10079.16 79.86 Acc-7546.0644.6344.73 Acc-5020.6720.2619.7 Acc-257.27.3310.46
127
Linux [Blaylock and Allen, 2005] Task Recognize top level plans based on Linux commands Human users asked to perform tasks in Linux and commands were recorded Top-level plans include find-file-by-ext, remove-file-by-ext, copy-file, move-file Single correct plan in each example Tests the ability to handle noise in data o Users indicate success even when they have not achieved the task correctly o Some top-level plans like find-file-by-ext and file-file-by- name have identical actions Data Contains 457 examples 127
128
Linux Methodology Knowledge base constructed based on the knowledge of Linux commands Learn noisy-or parameters using EM Compute marginal probability for instances of top level plans and pick the one with the highest marginal probability Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o Blaylock and Allen’s system [ Blaylock and Allen, 2005] Convergence score - measures the fraction of examples for which the plan schema was predicted correctly o 128
129
Learning Results - Linux 129 Accuracy Partial Observability MWMW-StartRand-Start Conv Score39.8246.641.57
130
Story Understanding [Charniak and Goldman, 1991; Ng and Mooney, 1992] Task Recognize character’s top level plans based on actions described in narrative text Logical representation of actions literals provided Top-level plans include shopping, robbing, restaurant dining, partying Multiple top-level plans in each example Tests the ability to predict multiple plans Data 25 development examples 25 test examples 130
131
Story Understanding Methodology Knowledge base constructed for ACCEL by Ng and Mooney [1992] Insufficient number of examples to learn parameters o Noisy-or parameters set to 0.9 o Noisy-and parameters set to 0.9 o Priors tuned on development set Compute MPE to get the best set of plans Systems compared o BALPs o MLN-HCAM [Singla and Mooney, 2011] o ACCEL-Simplicity [Ng and Mooney, 1992 ] o ACCEL-Coherence [Ng and Mooney, 1992] –Specific for Story Understanding 131
132
Other Applications of BALPs Medical diagnosis Textual entailment Computational biology Inferring gene relations based on the output of micro-array experiments Any application that requires abductive reasoning 132
133
ACCEL [Ng and Mooney, 1992] First-order logic based system for plan recognition Simplicity metric selects explanations that have the least number of assumptions Coherence metric selects explanations that connect maximum number of observations Measures explanatory coherence Specific to text interpretation 133
134
System by Blaylock and Allen [2005] Statistical n-gram models to predict plans based on observed actions Performs plan recognition in two phases Predicts the plan schema first Predicts arguments based on the predicted schema 134
135
Machine Reading Machine reading involves automatic extraction of knowledge from natural language text Approaches to machine reading Extract factual information like entities and relations between entities that occur in text [Cohen, 1999; Craven et al., 2000; Bunescu and Mooney, 2007; Etzioni et al, 2008] Extract commonsense knowledge about the domain [Nahm and Mooney, 2000; Carlson at el., 2010, Schoenmackers et al., 2010; Doppa et al., 2010] 135
136
Natural Language Text Natural language text is not necessarily complete Commonsense information is not explicitly stated in text Well known facts are omitted from the text Missing information has to be inferred from text Human brain performs deeper inference using commonsense knowledge IE systems have no access to commonsense knowledge 136 IE systems are limited to extracting information stated explicitly in text
137
IC ontology 57 entities 79 relations 137
138
Estimated Precision (preliminary results) Model BLP-ALLBLP-Pos-NegBLP-Pos-Only Baseline [No. facts inferred] 38,1511,51018,501 Estimated precision 24.94 (951/3813)* 16.00 (12/75)* 31.80 (588/1849)* 138 Total facts extracted – 12,254 * - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled documents from test set for manual evaluation
139
Estimated Precision (preliminary results) Model BLP-ALLBLP-Pos-NegBLP-Pos-Only Baseline [No. facts inferred] 38,1511,51018,501 Estimated precision 24.94 (951/3813)* 16.00 (12/75)* 31.80 (588/1849)* BLP-0.9 [No. facts inferred] 12,82388011,717 Estimated precision 34.68 (419/1208)* 13.63 (9/66)* 35.49 (411/1158)* 139 Total facts extracted – 12,254 * - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled documents from test set for manual evaluation
140
Estimated Precision (preliminary results) Model BLP-ALLBLP-Pos-NegBLP-Pos-Only Baseline [No. facts inferred] 38,1511,51018,501 Estimated precision 24.94 (951/3813)* 16.00 (12/75)* 31.80 (588/1849)* BLP-0.9 [No. facts inferred] 12,82388011,717 Estimated precision 34.68 (419/1208)* 13.63 (9/66)* 35.49 (411/1158)* BLP-0.95 [No. of facts inferred] 82611949 Estimated precision 91.07 (51/56)* 100 (1/1)* Nil (0/0)* 140 Total facts extracted – 12,254 * - x out of y inferred facts were correct. These inferred facts are from the 20 randomly sampled documents from test set for manual evaluation
141
Estimated Recall 141 “Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA……." Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) Rule hasBirthPlace(A,B) citizenOf(A,B) Rule hasBirthPlace(A,B) citizenOf(A,B)
142
Estimated Recall for citizenOf 142 Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) hasBirthPlace(A,B) citizenOf(A,B) hasBirthPlace(BarackObama,USA) citizenOf(BarackObama,USA) Estimated Recall – 1/1
143
Estimated Recall for hasBirthPlace 143 Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) Extracted facts nationState(USA) hasBirthPlace(BarackObama,USA) Person(BarackObama) citizenOf(BarackObama,USA) isLedBy(USA,BarackObama) hasBirthPlace(A,B) citizenOf(A,B) Estimated Recall – 0/1 ? ?
144
Proposed Work Learn first-order rules from incomplete and uncertain data Facts extracted from natural language text are incomplete o Existing rule/structure learners assume that data is complete Extractions from IE system have associated confidence scores o Existing structure learners cannot use these extraction probabilities Absence of negative instances o Most rule learners require both positive and negative instances o Closed world assumption does not hold for machine reading 144
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.