Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu Raghavan Advisor: Raymond Mooney PhD Oral Defense Nov 29 th, 2012 1.

Slides:

Advertisements

Similar presentations

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Fast Algorithms For Hierarchical Range Histogram Constructions

SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.

Department of Computer Science The University of Texas at Austin Probabilistic Abduction using Markov Logic Networks Rohit J. Kate Raymond J. Mooney.

Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo.

Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.

School of Computing Science Simon Fraser University Vancouver, Canada.

Plan Recognition with Multi- Entity Bayesian Networks Kathryn Blackmond Laskey Department of Systems Engineering and Operations Research George Mason University.

APRIL, Application of Probabilistic Inductive Logic Programming, IST Albert-Ludwigs-University, Freiburg, Germany & Imperial College of Science,

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Evaluating Search Engine

A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,

Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.

Evaluating Hypotheses

Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Scalable Text Mining with Sparse Generative Models

1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington.

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Boosting Markov Logic Networks

Inductive Logic Programming Includes slides by Luis Tari CS7741L16ILP.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,

Extending Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12 th,

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Markov Logic And other SRL Approaches

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Tuffy Scaling up Statistical Inference in Markov Logic using an RDBMS

Lifted First-Order Probabilistic Inference Rodrigo de Salvo Braz SRI International joint work with Eyal Amir and Dan Roth.

Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)

Learning to “Read Between the Lines” using Bayesian Logic Programs Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku The University of Texas at Austin July.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Modeling Speech Acts and Joint Intentions in Modal Markov Logic Henry Kautz University of Washington.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Abductive Plan Recognition By Extending Bayesian Logic Programs Sindhu V. Raghavan & Raymond J. Mooney The University of Texas at Austin 1.

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:

Definition and Technologies Knowledge Representation.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

An Introduction to Markov Logic Networks in Knowledge Bases

CS 9633 Machine Learning Inductive-Analytical Methods

Probabilistic Horn abduction and Bayesian Networks

Machine Learning: Lecture 6

Machine Learning: UNIT-3 CHAPTER-1

INF 141: Information Retrieval

Statistical Relational AI

Presentation transcript:

Bayesian Logic Programs for Plan Recognition and Machine Reading Sindhu Raghavan Advisor: Raymond Mooney PhD Oral Defense Nov 29 th,

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 2

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 3

Machine Reading Machine reading involves the automatic extraction of knowledge from natural language text 4 Example “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa, barackobama) Data is relational in nature - several entities and several relations between them

Characteristics of Real World Data Relational or structured data – Several entities in the domain – Several relations between entities – Not always independent and identically distributed (i.i.d) Presence of noise or uncertainty – Uncertainty in the types of entities – Uncertainty in the relations Traditional approaches like first-order logic or probabilistic models can handle either structured data or uncertainty, but not both. 5

Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] – Overcome limitations of traditional approaches SRL formalisms – Stochastic Logic Programs (SLPs) [Muggleton, 1996] – Probabilistic Relational Models (PRMs) [Friedman et al., 1999] – Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] – Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] 6

Statistical Relational Learning (SRL) Integrates first-order logic and probabilistic graphical models [Getoor and Taskar, 2007] – Overcome limitations of traditional approaches SRL formalisms – Stochastic Logic Programs (SLPs) [Muggleton, 1996] – Probabilistic Relational Models (PRMs) [Friedman et al., 1999] – Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] – Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] 7

Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Integrate first-order logic and Bayesian networks Why BLPs? – Efficient grounding mechanism that includes only those variables that are relevant to the query – Easy to extend by incorporating any type of logical inference to construct networks – Well suited for capturing causal relations in data 8

Objectives Plan Recognition Machine Reading 9

Objectives 10 Plan recognition involves predicting the top-level plan of an agent based on its observed actions Machine Reading

Objectives 11 Plan Recognition Machine Reading involves automatic extraction of knowledge from natural language text

Common characteristics Inference and learning from partially observed or incomplete data Plan recognition – Top-level plan is not observed – Some of the executed actions can be unobserved Machine Reading – Information that is implicit is rarely observed in data – Common sense knowledge is not always explicitly stated 12

Thesis Contributions Plan Recognition – Bayesian Abductive Logic Programs (BALPs) [ECML 2011] Machine Reading – BLPs for learning to infer implicit facts from natural language text [ACL 2012] – Online rule learner for learning common sense knowledge from natural language extractions [In Submission] – Approach to scoring first-order rules (common sense knowledge) using WordNet [In Submission] 13

Thesis Contributions Plan Recognition – Bayesian Abductive Logic Programs (BALPs) [ECML 2011] Machine Reading – BLPs for learning to infer implicit facts from natural language text [ACL 2012] – Online rule learner for learning common sense knowledge from natural language extractions [In Submission] – Approach to scoring first-order rules (common sense knowledge) using WordNet [In Submission] 14

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 15

Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] Set of Bayesian clauses a|a 1,a 2,....,a n –Definite clauses that are universally quantified –Range-restricted, i.e variables{head} variables{body} –Associated conditional probability table (CPT) P(head|body) Bayesian predicates a, a 1, a 2, …, a n have finite domains –Combining rule like noisy-or for mapping multiple CPTs into a single CPT Given a set of Bayesian clauses and a query, SLD resolution is used to construct ground Bayesian networks for probabilistic inference 16

Probabilistic Inference and Learning Probabilistic inference –Marginal probability Exact Inference Sample Search [Gogate and Dechter, 2007] Learning [Kersting and De Raedt, 2008] –Parameters Expectation Maximization Gradient-ascent based learning 17

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 18

Plan Recognition Predict an agent’s top-level plan based on its observed actions Abductive reasoning involving inference of cause from effect Since SLD resolution used in BLPs is deductive in nature, BLPs cannot be used as is plan recognition 19

Extending BLPs for Plan Recognition 20 BLPs Logical Abduction BALPs BALPs – Bayesian Abductive Logic Programs

Extending BLPs for Plan Recognition 21 BLPs Stickel’s Abduction Algorithm BALPs BALPs – Bayesian Abductive Logic Programs

Experimental Evaluation Data Monroe [Blaylock and Allen, 2005] Linux [Blaylock and Allen, 2005] Story Understanding [Ng and Mooney, 1992] Systems compared – BALPs – MLN-HCAM [Singla and Mooney, 2011] – Blaylock and Allen’s system [Blaylock and Allen, 2005] – ACCEL-Simplicity [Ng and Mooney, 1992] – ACCEL-Coherence [Ng and Mooney, 1992] 22

Summary of Results Monroe and Linux – BALPs outperform both MLN-HCAM and the system by Blaylock and Allen Story Understanding – BALPS outperform both MLN-HCAM and ACCEL- Simplicity – ACCEL-Coherence outperforms BALPs and other systems Specifically developed for text interpretation Automatic learning of model parameters using EM 23

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 24

Machine Reading Natural language text is typically “incomplete” – Some information is always implicit – Common sense information is not always explicitly stated – Grice’s maxim of quantity [1975] Information extraction (IE) systems extract information that is explicitly stated [Cowie and Lenhert, 1996; Sarawagi, 2008] – Cannot extract information that is implicit 25

Example Natural language text “Barack Obama is the President of the United States of America.” Query “Barack Obama is a citizen of what country?” IE systems cannot answer this query since citizenship information is not explicitly stated. 26

Objective Infer implicit facts from explicitly stated information – Extract explicitly stated facts using an off-the-shelf IE system – Learn common sense knowledge in the form of first-order rules to deduce additional facts – Use BLPs for inference of additional facts 27

Related Work Logical deduction based approaches – Learning propositional rules [Nahm and Mooney, 2000] – Purely logical deduction is brittle since it cannot assign probabilities to inferences – Learning probabilistic first-order rules using FOIL and FARMER [Carlson et al., 2010; Doppa et al., 2010] – Probabilities are not computed using well-founded probabilistic graphical models Use MLN based approaches for inferring additional facts [Schoenmackers et al., 2010; Sorower et al., 2011] – “Brute force” inference could result in intractably large networks for large domains – Scaling of MLNs to large domains [Schoenmackers et al., 2010; Niu et al., 2012] 28

Objectives BLPs for learning to infer implicit facts from natural language text Online rule learner for learning common sense knowledge from natural language extractions Approach to scoring first-order common sense knowledge using WordNet 29

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 30

System Architecture Training Documents Information Extractor (IBM SIRE) Extracted Facts Rule learner First-Order Logical Rules BLP Weight Learner Bayesian Logic Program (BLP) BLP Inference Engine Test Document Extractions Inferences with probabilities Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ employs(B,A)  hasCitizenship(A,B) hasCitizenship(A,B) | nationState(B), isLedBy(B,A).9 hasCitizenship(A,B) | nationState(B), employs(B,A).6 nationState(malaysia) Person(mahathir-mohamad) isLedBy(malaysia,mahathir-mohamad) employs(malaysia,mahatir-mohamad) hasCitizenship(mahathir-mohamad, malaysia) 0.75

System Architecture Training Documents Information Extractor (IBM SIRE) Extracted Facts Inductive Logic Programming (LIME) First-Order Logical Rules BLP Weight Learner Bayesian Logic Program (BLP) BLP Inference Engine Test Document Extractions Inferences with probabilities 32

Inductive Logic Programming (ILP) for learning first-order rules ILP Rule Learner ILP Rule Learner Target relation hasCitizenship(X,Y) Positive instances hasCitizenship (BarackObama, USA) hasCitizenship (GeorgeBush, USA) hasCitizenship (IndiraGandhi,India). Negative instances hasCitizenship (BarackObama, India) hasCitizenship (GeorgeBush, India) hasCitizenship (IndiraGandhi,USA). KB hasBirthPlace(BarackObama,USA) person(BarackObama) nationState(USA) nationState(India). Rules nationState(Y) ∧ person(X) ∧ isLedBy(Y,X)  hasCitizenship (X,Y). Rules nationState(Y) ∧ person(X) ∧ isLedBy(Y,X)  hasCitizenship (X,Y). Generated using closed- world assumption 33

Inference using BLPs Test document “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Test document “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa, barackobama) Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa, barackobama) Learned rules nationState(B) ∧ person(A) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ person(A) ∧ employs(B,A)  hasCitizenship(A,B) 34

Logical Inference - Proof 1 hasCitizenship(barackobama,usa) nationState(usa) person(barackobama) isLedBy(usa,barackobama) nationState(B) ∧ person(A) ∧ isLedBy(B,A)  hasCitizenship(A,B) 35

Logical Inference - Proof 2 hasCitizenship(barackobama,usa) nationState(usa) person(barackobama) employs(usa,barackobama) nationState(B) ∧ person(A) ∧ employs(B,A)  hasCitizenship(A,B) 36

Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) employs (usa, barack obama) hasCitizenship (barackobama, usa) 37 person (barack obama)

Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) employs (usa, barack obama) hasCitizenship (barackobama, usa) 38 person (barack obama)

Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) employs (usa, barack obama) hasCitizenship (barackobama, usa) 39 person (barack obama)

Bayesian Network Construction nationState (usa) isLedBy (usa, barack obama) Logical And employs (usa, barack obama) dummy1 dummy2 hasCitizenship (barackobama, usa) Logical And Noisy Or 40 person (barack obama) Marginal Probability ??

Experimental Evaluation Data – DARPA’s intelligence community (IC) data set from the Machine Reading Project (MRP) – Consists of news articles on politics, terrorism, and other international events – 10,000 documents in total Perform 10-fold cross validation 41

Experimental Evaluation Learning first-order rules using LIME [McCreath and Sharma, 1998] – Learn rules for 13 target relations – Learn rules using both positive and negative instances and using only positive instances – Include all unique rules learned from different models Learning BLP parameters – Learn noisy-or parameters using Expectation Maximization (EM) – Set priors to maximum likelihood estimates 42

Experimental Evaluation Performance evaluation – Lack of ground truth for evaluation – Manually evaluated inferred facts from 40 documents, randomly selected from each test set – Compute precision – Fraction of inferences that are correct – Compute two precision scores Unadjusted (UA) – does not account for extractor’s mistakes Adjusted (AD) – account for extractor’s mistakes – Rank inferences using marginal probabilities and evaluate top-n 43

Experimental Evaluation Systems compared – BLP Learned Weights Noisy-or parameters learned using EM – BLP Manual Weights Noisy-or parameters set to 0.9 – Logical Deduction – MLN Learned Weights Learn weights using generative online weight learner – MLN Manual Weights Assign a weight of 10 to all rules and MLE priors to all predicates 44

Unadjusted Precision 45

Inferior performance of EM Insufficient training data Lack of ground truth information for relations that can be inferred – Implicit relations seen less frequently in training data – EM learns lower weights for rules corresponding to implicit relations 46

Performance of MLNs Inferior performance of MLNs – Insufficient training data for learning – Use of closed world assumption for inference and learning – Lack of strictly typed ontology GeopoliticalEntity could be an Agent as well as Location Improvements to MLNs – Integrity constraints to avoid inference of spurious facts like employs(a,a) – Incorporate techniques proposed by Sorower et al. [2011] 47

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 48

Limitations of LIME Assumes data is accurate – Negative instances artificially generated are usually noisy and inaccurate – Extraction errors result in noisy data Does not scale to large corpora 49 Develop an approach that can learn first-order rules from noisy and incomplete IE extractions

Online Rule Learning Incorporates the incomplete nature of natural language text – Body consists of relations that are explicitly stated – Head is a relation that can be inferred Relations that are implicit occur less frequently than those that are explicitly stated – Use frequency of occurrence as a heuristic to distinguish different types of relations Process examples in an online manner to scale to large corpora 50

Approach For each example, construct a directed graph of relation extractions Add directed edges between nodes that share one or more constants – Relations connected by edges are related and participate in the same rule Traverse the graph to learn first-order rules 51 Learning from positive instances only

Example “ Barack Obama is the current President of the USA……. Obama, citizen of the USA was born on August 4, 1961, in Hawaii, USA……. ” Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) 52

Example Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Entities “ Barack Obama is the current President of the USA……. Obama, citizen of the USA was born on August 4, 1961, in Hawaii, USA……. ” 53

Example Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Extracted facts nationState(USA) person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Relations “ Barack Obama is the current President of the USA……. Obama, citizen of the USA was born on August 4, 1961, in Hawaii, USA……. ” 54

Directed graph construction isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) hasCitizenship (Barack Obama, USA) hasCitizenship (Barack Obama, USA) isLedBy33 hasBirthPlace25 hasCitizenship17 ? ? 55

Graph Traversal isLedBy(USA, Barack Obama)  hasBirthPlace(Barack Obama, USA) isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) 56

Graph Traversal isLedBy(USA, Barack Obama) ∧ person(Barack Obama) ∧ nationState(USA)  hasBirthPlace(Barack Obama, USA) isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) 57

Graph Traversal isLedBy(X, Y) ∧ person(Y) ∧ nationState(X)  hasBirthPlace(Y, X) isLedBy (USA, Barack Obama) isLedBy (USA, Barack Obama) hasBirthPlace (Barack Obama, USA) hasBirthPlace (Barack Obama, USA) 58

Rules learned isLedBy(X, Y) ∧ person(Y) ∧ nationState(X)  hasBirthPlace(Y, X) isLedBy(X, Y) ∧ person(Y) ∧ nationState(X)  hasCitizenship(Y, X) hasBirthPlace(X, Y) ∧ person(X) ∧ nationState(Y)  hasCitizenship(X, Y) 59

Sample rules employs(X, Y) ∧ commercialOrganization(X)  hasMemberPerson(X, Y) isLedBy(X, Y) ∧ nationState(X)  hasCitizenship(Y, X) isLedBy(X, Y) ∧ nationState(X) ∧ person(Y)  hasBirthPlace(Y, X) 60

Experimental Evaluation Learn first-order rules for 14 target relations – Full-set – Subset 10 target relations Manually set noisy-or parameters to 0.9 Systems compared – Online Rule Learner (ORL) – LIME [McCreath and Sharma, 1998] – Combined 61

Full-set 62

Inferior performance of ORL on Full-set Several incorrect inferences with high marginal probabilities – Instances of thingPhysicallyDamaged and eventLocationGPE – High probabilities due to multiple rules inferring these instances – Rules not very accurate resulting in inaccurate inferences 63

Subset 64

Running Time LIME – Learns rules for one target relation at a time – Includes time taken to learn from positive only and from positive and negative examples ORL – Learns rules for all target relations at once 65 ORLLIME 3.8 mins11.23 hrs

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 66

Scoring first-order rules Predicate names employ English words Confident rules typically have predicates whose words are semantically related Use word similarity or relatedness to calculate weights – Word similarity computed using WordNet Compute weights between 0 and 1, which are then used as noisy-or parameters – Higher weights indicate more confident rules 67

WordNet [Fellbaum, 1998] Lexical knowledge base consisting of 130,000 English words Nouns, verbs, adjectives, and adverbs organized into “synsets” (synonym sets) wup [Wu and Palmer, 1994] similarity measure to compute word similarity – Computes scaled similarity scores between 0 and 1 – Computes the depth of the least common subsumer of the given words and scales it by the sum of the depths of the given words 68

Scoring rules using WUP Compute word similarity using wup for every pair of words (w i,w j ) – w i refers to words in the body – w j refers to words in the head Compute average similarity for all pairs of words Predicate names like hasCitizenship and hasMember are segmented into has, citizenship, and member – Stop words are removed 69

Example employs(X,Y) ∧ governmentOrganization(X)  hasMember(X,Y) 70

Example employs(X,Y) ∧ governmentOrganization(X)  hasMember(X,Y) (employs, government, organization)  (member) 71

Example employs(X,Y) ∧ governmentOrganization(X)  hasMember(X,Y) (employs, government, organization)  (member) 72 Word pairwup score employs, member.50 government, member.75 organization, member.85 Average.70

Example employs(X,Y) ∧ governmentOrganization(X)  hasMember(X,Y) (.70) (employs, government, organization)  (member) employs(X,Y) ∧ person(Y) ∧ nationState(X)  hasBirthPlace(Y,X) (.67) (employs, person, nation, state)  (birth, place) 73

Scoring rules using WUP WUP-AVG – Use words from both entities and relations – Use the average similarity between all pairs of words as the weight WUP-MAX – Use words from both entities and relations – Use maximum similarity among all pairs of words as the weight WUP-MAX-REL – Use words from relations only – Use maximum similarity among all pairs of words as the weight 74

Experimental Evaluation Target relations – Full-set – Subset Models – COMBINED Rule scoring approaches compared – WUP-AVG – WUP-MAX – WUP-MAX-REL – Default (Manual weights set to 0.9) – EM (Weights learned from EM) 75

Full-set 76

Subset 77

Summary BLP approach for inferring implicit facts with high precision Superior performance of BLPs over purely logical deduction and MLNs Efficient learning of probabilistic first-order rules using online rule learning Efficacy of WUP-AVG for scoring first-order rules 78

Outline Motivation Background – Bayesian Logic Programs (BLPs) Plan Recognition Machine Reading – BLPs for inferring implicit facts – Online Rule Learning – Scoring Rules using WordNet Future Work Conclusions 79

Future Work Plan recognition – Structure learning of abductive knowledge bases for BALPs – Comparison of BALPs to other SRL models ProbLog [Kimmig et al., 2008] PRISM [Sato, 1995] Poole’s Horn Abduction [Poole, 1993] Abductive Stochastic Logic Programs [Tamaddoni-Nezhad, Chaleil, Kakas, & Muggleton, 2006] 80

Future Work Machine Reading – Large scale evaluation using crowdsourcing – Comparison of BLPs to existing approaches on machine reading [Schoenmackers et al., 2010; Carlson et al., 2010; Doppa et al., 2010; Sorower et al., 2011] – Alternate approaches to scoring rules Use models from distributional semantics [Garrette et al., 2011] 81

Long-term Directions Parameter learning – Using approximate inference techniques – Discriminative learning of parameters Lifted inference for BLPs and BALPs 82

Conclusions Demonstrated the efficacy of BLPs on two diverse tasks – Plan recognition BALPs – Machine reading Infer implicit facts from natural language text Online rule learner for efficient learning of first-order rules from noisy IE extractions Scoring first-order rules using WordNet 83

Conclusions Demonstrated superior performance of BLPs over MLNs on both tasks Contributions could have direct impact on the advancement of applications that use plan recognition and machine reading – SIRI – IBM’s Watson system 84

Questions 85