Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu Raghavan Advisor: Raymond Mooney PhD Oral Defense Nov 29 th,
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 2
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 3
Machine Reading Machine reading involves the automatic extraction of knowledge from natural language text 4 Example “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa, barackobama) Data is relational in nature - several entities and several relations between them
Characteristics of Real World Data Relational or structured data – Several entities in the domain – Several relations between entities – Not always independent and identically distributed (i.i.d) Presence of noise or uncertainty – Uncertainty in the types of entities – Uncertainty in the relations Traditional approaches like first-order logic or probabilistic models can handle either structured data or uncertainty, but not both. 5
Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] – Overcome limitations of traditional approaches SRL formalisms – Stochastic Logic Programs (SLPs) [Muggleton, 1996] – Probabilistic Relational Models (PRMs) [Friedman et al., 1999] – Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] – Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] 6
Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] – Overcome limitations of traditional approaches SRL formalisms – Stochastic Logic Programs (SLPs) [Muggleton, 1996] – Probabilistic Relational Models (PRMs) [Friedman et al., 1999] – Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] – Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] 7
Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Integrate first-order logic and Bayesian networks Why BLPs? – Efficient grounding mechanism that includes only those variables that are relevant to the query – Easy to extend by incorporating any type of logical inference to construct networks – Well suited for capturing causal relations in data 8
Objectives Plan Recognition Machine Reading 9
Objectives 10 Plan recognition involves predicting the top-level plan of an agent based on its observed actions Machine Reading
Objectives 11 Plan Recognition Machine Reading involves automatic extraction of knowledge from natural language text
Common characteristics Inference and learning from partially observed or incomplete data Plan recognition – Top-level plan is not observed – Some of the executed actions can be unobserved Machine Reading – Information that is implicit is rarely observed in data – Common sense knowledge is not always explicitly stated 12
Thesis Contributions Plan Recognition – Bayesian Abductive Logic Programs (BALPs) [ECML 2011] Machine Reading – BLPs for learning to infer implicit facts from natural language text [ACL 2012] – Online rule learner for learning common sense knowledge from natural language extractions [In Submission] – Approach to scoring first-order rules (common sense knowledge) using WordNet [In Submission] 13
Thesis Contributions Plan Recognition – Bayesian Abductive Logic Programs (BALPs) [ECML 2011] Machine Reading – BLPs for learning to infer implicit facts from natural language text [ACL 2012] – Online rule learner for learning common sense knowledge from natural language extractions [In Submission] – Approach to scoring first-order rules (common sense knowledge) using WordNet [In Submission] 14
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 15
Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Set of Bayesian clauses a|a 1,a 2,....,a n –Definite clauses that are universally quantified –Range-restricted, i.e variables{head} variables{body} –Associated conditional probability table (CPT) P(head|body) Bayesian predicates a, a 1, a 2, …, a n have finite domains –Combining rule like noisy-or for mapping multiple CPTs into a single CPT Given a set of Bayesian clauses and a query, SLD resolution is used to construct ground Bayesian networks for probabilistic inference 16
Probabilistic Inference and Learning Probabilistic inference –Marginal probability Exact Inference Sample Search [Gogate and Dechter, 2007] Learning [Kersting and De Raedt, 2008] –Parameters Expectation Maximization Gradient-ascent based learning 17
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 18
Plan Recognition Predict an agent’s top-level plan based on its observed actions Abductive reasoning involving inference of cause from effect Since SLD resolution used in BLPs is deductive in nature, BLPs cannot be used as is plan recognition 19
Extending BLPs for Plan Recognition 20 BLPs Logical Abduction BALPs BALPs – Bayesian Abductive Logic Programs
Extending BLPs for Plan Recognition 21 BLPs Stickel’s Abduction Algorithm BALPs BALPs – Bayesian Abductive Logic Programs
Experimental Evaluation Data Monroe [Blaylock and Allen, 2005] Linux [Blaylock and Allen, 2005] Story Understanding [Ng and Mooney, 1992] Systems compared – BALPs – MLN-HCAM [Singla and Mooney, 2011] – Blaylock and Allen’s system [Blaylock and Allen, 2005] – ACCEL-Simplicity [Ng and Mooney, 1992] – ACCEL-Coherence [Ng and Mooney, 1992] 22
Summary of Results Monroe and Linux – BALPs outperform both MLN-HCAM and the system by Blaylock and Allen Story Understanding – BALPS outperform both MLN-HCAM and ACCEL- Simplicity – ACCEL-Coherence outperforms BALPs and other systems Specifically developed for text interpretation Automatic learning of model parameters using EM 23
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 24
Machine Reading Natural language text is typically “incomplete” – Some information is always implicit – Common sense information is not always explicitly stated – Grice’s maxim of quantity [1975] Information extraction (IE) systems extract information that is explicitly stated [Cowie and Lenhert, 1996; Sarawagi, 2008] – Cannot extract information that is implicit 25
Example Natural language text “Barack Obama is the President of the United States of America.” Query “Barack Obama is a citizen of what country?” IE systems cannot answer this query since citizenship information is not explicitly stated. 26
Objective Infer implicit facts from explicitly stated information – Extract explicitly stated facts using an off-the-shelf IE system – Learn common sense knowledge in the form of first-order rules to deduce additional facts – Use BLPs for inference of additional facts 27
Related Work Logical deduction based approaches – Learning propositional rules [Nahm and Mooney, 2000] – Purely logical deduction is brittle since it cannot assign probabilities to inferences – Learning probabilistic first-order rules using FOIL and FARMER [Carlson et al., 2010; Doppa et al., 2010] – Probabilities are not computed using well-founded probabilistic graphical models Use MLN based approaches for inferring additional facts [Schoenmackers et al., 2010; Sorower et al., 2011] – “Brute force” inference could result in intractably large networks for large domains – Scaling of MLNs to large domains [Schoenmackers et al., 2010; Niu et al., 2012] 28
Objectives BLPs for learning to infer implicit facts from natural language text Online rule learner for learning common sense knowledge from natural language extractions Approach to scoring first-order common sense knowledge using WordNet 29
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 30
System Architecture Training Documents Information Extractor (IBM SIRE) Extracted Facts Rule learner First-Order Logical Rules BLP Weight Learner Bayesian Logic Program (BLP) BLP Inference Engine Test Document Extractions Inferences with probabilities Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) nationState(B) ∧ isLedBy(B,A) hasCitizenship(A,B) nationState(B) ∧ employs(B,A) hasCitizenship(A,B) hasCitizenship(A,B) | nationState(B), isLedBy(B,A).9 hasCitizenship(A,B) | nationState(B), employs(B,A).6 nationState(malaysia) Person(mahathir-mohamad) isLedBy(malaysia,mahathir-mohamad) employs(malaysia,mahatir-mohamad) hasCitizenship(mahathir-mohamad, malaysia) 0.75
System Architecture Training Documents Information Extractor (IBM SIRE) Extracted Facts Inductive Logic Programming (LIME) First-Order Logical Rules BLP Weight Learner Bayesian Logic Program (BLP) BLP Inference Engine Test Document Extractions Inferences with probabilities 32
Inductive Logic Programming (ILP) for learning first-order rules ILP Rule Learner ILP Rule Learner Target relation hasCitizenship(X,Y) Positive instances hasCitizenship (BarackObama, USA) hasCitizenship (GeorgeBush, USA) hasCitizenship (IndiraGandhi,India). Negative instances hasCitizenship (BarackObama, India) hasCitizenship (GeorgeBush, India) hasCitizenship (IndiraGandhi,USA). KB hasBirthPlace(BarackObama,USA) person(BarackObama) nationState(USA) nationState(India). Rules nationState(Y) ∧ person(X) ∧ isLedBy(Y,X) hasCitizenship (X,Y). Rules nationState(Y) ∧ person(X) ∧ isLedBy(Y,X) hasCitizenship (X,Y). Generated using closed- world assumption 33
Inference using BLPs Test document “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Test document “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa, barackobama) Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa, barackobama) Learned rules nationState(B) ∧ person(A) ∧ isLedBy(B,A) hasCitizenship(A,B) nationState(B) ∧ person(A) ∧ employs(B,A) hasCitizenship(A,B) 34
Logical Inference - Proof 1 hasCitizenship(barackobama,usa) nationState(usa) person(barackobama) isLedBy(usa,barackobama) nationState(B) ∧ person(A) ∧ isLedBy(B,A) hasCitizenship(A,B) 35
Logical Inference - Proof 2 hasCitizenship(barackobama,usa) nationState(usa) person(barackobama) employs(usa,barackobama) nationState(B) ∧ person(A) ∧ employs(B,A) hasCitizenship(A,B) 36
Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) employs (usa, barack obama) hasCitizenship (barackobama, usa) 37 person (barack obama)
Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) employs (usa, barack obama) hasCitizenship (barackobama, usa) 38 person (barack obama)
Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) employs (usa, barack obama) hasCitizenship (barackobama, usa) 39 person (barack obama)
Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) Logical And employs (usa, barack obama) dummy1 dummy2 hasCitizenship (barackobama, usa) Logical And Noisy Or 40 person (barack obama) Marginal Probability ??
Experimental Evaluation Data – DARPA’s intelligence community (IC) data set from the Machine Reading Project (MRP) – Consists of news articles on politics, terrorism, and other international events – 10,000 documents in total Perform 10-fold cross validation 41
Experimental Evaluation Learning first-order rules using LIME [McCreath and Sharma, 1998] – Learn rules for 13 target relations – Learn rules using both positive and negative instances and using only positive instances – Include all unique rules learned from different models Learning BLP parameters – Learn noisy-or parameters using Expectation Maximization (EM) – Set priors to maximum likelihood estimates 42
Experimental Evaluation Performance evaluation – Lack of ground truth for evaluation – Manually evaluated inferred facts from 40 documents, randomly selected from each test set – Compute precision – Fraction of inferences that are correct – Compute two precision scores Unadjusted (UA) – does not account for extractor’s mistakes Adjusted (AD) – account for extractor’s mistakes – Rank inferences using marginal probabilities and evaluate top-n 43
Experimental Evaluation Systems compared – BLP Learned Weights Noisy-or parameters learned using EM – BLP Manual Weights Noisy-or parameters set to 0.9 – Logical Deduction – MLN Learned Weights Learn weights using generative online weight learner – MLN Manual Weights Assign a weight of 10 to all rules and MLE priors to all predicates 44
Unadjusted Precision 45
Inferior performance of EM Insufficient training data Lack of ground truth information for relations that can be inferred – Implicit relations seen less frequently in training data – EM learns lower weights for rules corresponding to implicit relations 46
Performance of MLNs Inferior performance of MLNs – Insufficient training data for learning – Use of closed world assumption for inference and learning – Lack of strictly typed ontology GeopoliticalEntity could be an Agent as well as Location Improvements to MLNs – Integrity constraints to avoid inference of spurious facts like employs(a,a) – Incorporate techniques proposed by Sorower et al. [2011] 47
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 48
Limitations of LIME Assumes data is accurate – Negative instances artificially generated are usually noisy and inaccurate – Extraction errors result in noisy data Does not scale to large corpora 49 Develop an approach that can learn first-order rules from noisy and incomplete IE extractions
Online Rule Learning Incorporates the incomplete nature of natural language text – Body consists of relations that are explicitly stated – Head is a relation that can be inferred Relations that are implicit occur less frequently than those that are explicitly stated – Use frequency of occurrence as a heuristic to distinguish different types of relations Process examples in an online manner to scale to large corpora 50
Approach For each example, construct a directed graph of relation extractions Add directed edges between nodes that share one or more constants – Relations connected by edges are related and participate in the same rule Traverse the graph to learn first-order rules 51 Learning from positive instances only
Example “ Barack Obama is the current President of the USA……. Obama, citizen of the USA was born on August 4, 1961, in Hawaii, USA……. ” Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) 52
Example Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Entities “ Barack Obama is the current President of the USA……. Obama, citizen of the USA was born on August 4, 1961, in Hawaii, USA……. ” 53
Example Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Relations “ Barack Obama is the current President of the USA……. Obama, citizen of the USA was born on August 4, 1961, in Hawaii, USA……. ” 54
Directed graph construction isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) hasCitizenship (Barack Obama, USA) hasCitizenship (Barack Obama, USA) isLedBy33 hasBirthPlace25 hasCitizenship17 ? ? 55
Graph Traversal isLedBy(USA, Barack Obama) hasBirthPlace(Barack Obama, USA) isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) 56
Graph Traversal isLedBy(USA, Barack Obama) ∧ person(Barack Obama) ∧ nationState(USA) hasBirthPlace(Barack Obama, USA) isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) 57
Graph Traversal isLedBy(X, Y) ∧ person(Y) ∧ nationState(X) hasBirthPlace(Y, X) isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) 58
Rules learned isLedBy(X, Y) ∧ person(Y) ∧ nationState(X) hasBirthPlace(Y, X) isLedBy(X, Y) ∧ person(Y) ∧ nationState(X) hasCitizenship(Y, X) hasBirthPlace(X, Y) ∧ person(X) ∧ nationState(Y) hasCitizenship(X, Y) 59
Sample rules employs(X, Y) ∧ commercialOrganization(X) hasMemberPerson(X, Y) isLedBy(X, Y) ∧ nationState(X) hasCitizenship(Y, X) isLedBy(X, Y) ∧ nationState(X) ∧ person(Y) hasBirthPlace(Y, X) 60
Experimental Evaluation Learn first-order rules for 14 target relations – Full-set – Subset 10 target relations Manually set noisy-or parameters to 0.9 Systems compared – Online Rule Learner (ORL) – LIME [McCreath and Sharma, 1998] – Combined 61
Full-set 62
Inferior performance of ORL on Full-set Several incorrect inferences with high marginal probabilities – Instances of thingPhysicallyDamaged and eventLocationGPE – High probabilities due to multiple rules inferring these instances – Rules not very accurate resulting in inaccurate inferences 63
Subset 64
Running Time LIME – Learns rules for one target relation at a time – Includes time taken to learn from positive only and from positive and negative examples ORL – Learns rules for all target relations at once 65 ORLLIME 3.8 mins11.23 hrs
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 66
Scoring first-order rules Predicate names employ English words Confident rules typically have predicates whose words are semantically related Use word similarity or relatedness to calculate weights – Word similarity computed using WordNet Compute weights between 0 and 1, which are then used as noisy-or parameters – Higher weights indicate more confident rules 67
WordNet [Fellbaum, 1998] Lexical knowledge base consisting of 130,000 English words Nouns, verbs, adjectives, and adverbs organized into “synsets” (synonym sets) wup [Wu and Palmer, 1994] similarity measure to compute word similarity – Computes scaled similarity scores between 0 and 1 – Computes the depth of the least common subsumer of the given words and scales it by the sum of the depths of the given words 68
Scoring rules using WUP Compute word similarity using wup for every pair of words (w i,w j ) – w i refers to words in the body – w j refers to words in the head Compute average similarity for all pairs of words Predicate names like hasCitizenship and hasMember are segmented into has, citizenship, and member – Stop words are removed 69
Example employs(X,Y) ∧ governmentOrganization(X) hasMember(X,Y) 70
Example employs(X,Y) ∧ governmentOrganization(X) hasMember(X,Y) (employs, government, organization) (member) 71
Example employs(X,Y) ∧ governmentOrganization(X) hasMember(X,Y) (employs, government, organization) (member) 72 Word pairwup score employs, member.50 government, member.75 organization, member.85 Average.70
Example employs(X,Y) ∧ governmentOrganization(X) hasMember(X,Y) (.70) (employs, government, organization) (member) employs(X,Y) ∧ person(Y) ∧ nationState(X) hasBirthPlace(Y,X) (.67) (employs, person, nation, state) (birth, place) 73
Scoring rules using WUP WUP-AVG – Use words from both entities and relations – Use the average similarity between all pairs of words as the weight WUP-MAX – Use words from both entities and relations – Use maximum similarity among all pairs of words as the weight WUP-MAX-REL – Use words from relations only – Use maximum similarity among all pairs of words as the weight 74
Experimental Evaluation Target relations – Full-set – Subset Models – COMBINED Rule scoring approaches compared – WUP-AVG – WUP-MAX – WUP-MAX-REL – Default (Manual weights set to 0.9) – EM (Weights learned from EM) 75
Full-set 76
Subset 77
Summary BLP approach for inferring implicit facts with high precision Superior performance of BLPs over purely logical deduction and MLNs Efficient learning of probabilistic first-order rules using online rule learning Efficacy of WUP-AVG for scoring first-order rules 78
Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 79
Future Work Plan recognition – Structure learning of abductive knowledge bases for BALPs – Comparison of BALPs to other SRL models ProbLog [Kimmig et al., 2008] PRISM [Sato, 1995] Poole’s Horn Abduction [Poole, 1993] Abductive Stochastic Logic Programs [Tamaddoni-Nezhad, Chaleil, Kakas, & Muggleton, 2006] 80
Future Work Machine Reading – Large scale evaluation using crowdsourcing – Comparison of BLPs to existing approaches on machine reading [Schoenmackers et al., 2010; Carlson et al., 2010; Doppa et al., 2010; Sorower et al., 2011] – Alternate approaches to scoring rules Use models from distributional semantics [Garrette et al., 2011] 81
Long-term Directions Parameter learning – Using approximate inference techniques – Discriminative learning of parameters Lifted inference for BLPs and BALPs 82
Conclusions Demonstrated the efficacy of BLPs on two diverse tasks – Plan recognition BALPs – Machine reading Infer implicit facts from natural language text Online rule learner for efficient learning of first-order rules from noisy IE extractions Scoring first-order rules using WordNet 83
Conclusions Demonstrated superior performance of BLPs over MLNs on both tasks Contributions could have direct impact on the advancement of applications that use plan recognition and machine reading – SIRI – IBM’s Watson system 84
Questions 85