Natural Language Semantics using Probabilistic Logic

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Bayesian Abductive Logic Programs Sindhu Raghavan Raymond J. Mooney The University of Texas at Austin 1.
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Logic Use mathematical deduction to derive new knowledge.
Recognizing Textual Entailment Challenge PASCAL Suleiman BaniHani.
Logic.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.
Islam Beltagy, Cuong Chau, Gemma Boleda, Dan Garrette, Katrin Erk, Raymond Mooney The University of Texas at Austin Richard Montague Andrey Markov Montague.
Montague meets Markov: Combining Logical and Distributional Semantics
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
Outline Recap Knowledge Representation I Textbook: Chapters 6, 7, 9 and 10.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
University of Texas at Austin
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics Islam Beltagy and Katrin Erk The University of Texas at Austin IWCS 2015.
Scalable Text Mining with Sparse Generative Models
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Propositional Logic Reasoning correctly computationally Chapter 7 or 8.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.
Markov Logic And other SRL Approaches
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.
Natural Language Semantics using Probabilistic Logic Islam Beltagy Doctoral Dissertation Proposal Supervising Professors: Raymond J. Mooney, Katrin Erk.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Spring, 2005 CSE391 – Lecture 1 1 Introduction to Artificial Intelligence Martha Palmer CSE391 Spring, 2005.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
1 11 Natural Language Semantics Combining Logical and Distributional Methods using Probabilistic Logic Raymond J. Mooney Katrin Erk Islam Beltagy, Stephen.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Happy Mittal Advisor : Parag Singla IIT Delhi Lifted Inference Rules With Constraints.
More precise fuzziness, more fuzzy precision
Ensembling Diverse Approaches to Question Answering
An Introduction to Markov Logic Networks in Knowledge Bases
D1 Miwa Makoto Chikayama & Taura Lab
Announcements No office hours today!
Deep learning David Kauchak CS158 – Fall 2016.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
School of Computer Science & Engineering
Web News Sentence Searching Using Linguistic Graph Similarity
Semantic Parsing for Question Answering
Markov Logic Networks for NLP CSCI-GA.2591
Vector-Space (Distributional) Lexical Semantics
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Ensembling Diverse Approaches to Question Answering
Logic Use mathematical deduction to derive new knowledge.
Discriminative Frequent Pattern Analysis for Effective Classification
iSRD Spam Review Detection with Imbalanced Data Distributions
Michal Rosen-Zvi University of California, Irvine
Probabilistic Databases
CS246: Information Retrieval
Text Categorization Berlin Chen 2003 Reference:
Sanjna Kashyap 11th March 2019
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Dan Roth Department of Computer Science
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

Natural Language Semantics using Probabilistic Logic Islam Beltagy Doctoral Dissertation Defense Supervising Professors: Raymond J. Mooney, Katrin Erk

Who is the first president of the United States ? George Washington “George Washington was the first President of the United States, the Commander-in-Chief of the Continental Army and one of the Founding Fathers of the United States” Where was George Washington born ? Westmoreland County, Virginia “George Washington was born at his father's plantation on Pope's Creek in Westmoreland County, Virginia” What is the birthplace of the first president of the United States ? …. ??? NOTES What is the semantic representation used ? bag of words will not work, we need something more expressive In this example, it needs to: -capture lexical variations (born at => birthplace of) -combine evidence from multiple sentences

Objective Develop a new semantic representation With better semantic representations, more NLP applications can be done better Automated Grading, Machine Translation, Summarization, Question Answering … Redo

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Future work Conclusion Redo - copy

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Future work Conclusion Redo - copy

Formal Semantics Natural language ➜ Formal language [Montague, 1970] A person is driving a car ∃x,y,z. person(x) ∧ agent(y,x) ∧ drive(y) ∧ patient(y,z) ∧ car(z) ✅ Expressive: entities, events, relations, negations, disjunctions, quantifiers … ✅ Automated inference: theorem proving ❌ Brittle: unable to handle uncertain knowledge

Distributional Semantics “You shall know a word by the company it keeps” [John Firth, 1957] Word as vectors in high dimensional space ✅ Captures graded similarity ❌ Does not capture structure of the sentence cut slice drive

Proposal: Probabilistic Logic Semantics [Beltagy et al., *SEM 2013] Logic: expressivity of formal semantics Reasoning with uncertainty: encode linguistic resources e.g: distributional semantics Combines advantages of symbolic and continuous representations

Related Work Uncertainty Logical structure Distributional semantics Compositional distributional Natural Logic [Angeli and Manning 2014] [MacCartney and Manning 2007,2008] Our work Semantic parsing (fixed ontology) Uncertainty [Lewis and Steedman 2013] Draw the spectrum of related work Formal semantics Logical structure

Proposal: Probabilistic Logic Semantics Logic + Statistics [Nilsson, 1986][Getoor and Taskar, 2007] Implementations Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] Probabilistic Soft Logic (PSL) [Kimmig et al., NIPS 2012] ∀x. slice(x) → cut(x) | 2.3 ∀x. apple(x) → company(x) | 1.6 Weighted first-order logic rules Combines advantages of symbolic and continuous representations

Proposal: Probabilistic Logic Semantics Logic + Statistics [Nilsson, 1986][Getoor and Taskar, 2007] Implementations Markov Logic Networks (MLNs) [Richardson and Domingos, 2006] Probabilistic Soft Logic (PSL) [Kimmig et al., NIPS 2012] ∀x. slice(x) → cut(x) | 2.3 ∀x. apple(x) → company(x) | 1.6 Weighted first-order logic rules Distributional similarity WSD confidence Combines advantages of symbolic and continuous representations

Markov Logic Networks [Richardson and Domingos, 2006] Weighted first-order logic rules ∀x,y. ogre(x) ∧ friend(x,y) → ogre(y) | 1.1 ∀x. ogre(x) → grumpy(x) | 1.5 Graphical model: Probability distribution over possible worlds Constants S: Shrek F: Fiona friend(S,F) friend(F,S) ogre(S) friend(S,S) ogre(F) friend(F,F) grumpy(F) grumpy(S) MLN + PSL Inference P(Q|E,KB) P(grumpy(Shrek) | friend(Shrek, Fiona), ogre(Fiona))

Markov Logic Networks [Richardson and Domingos, 2006] Probability Mass Function (PMF) No. of true groundings of formula i in x a possible truth assignment Normalization constant Weight of formula i

PSL: Probabilistic Soft Logic [Kimmig et al., NIPS 2012] Designed with focus on efficient inference Atoms have continuous truth values ∈ [0,1] (MLN: Boolean atoms) Łukasiewicz relaxation of AND, OR, NOT I(ℓ1 ∧ ℓ2) = max {0, I(ℓ1) + I(ℓ2) – 1} I(ℓ1 ∨ ℓ2) = min {1, I(ℓ1) + I(ℓ2) } I(¬ ℓ1) = 1 – I(ℓ1) Inference: linear program (MLN: combinatorial counting problem)

PSL: Probabilistic Soft Logic [Kimmig et al., NIPS 2012] PDF: Inference: Most Probable Explanation (MPE) Linear program Distance to satisfaction of rule r a possible continuous truth assignment Normalization constant For all rules Weight of formula r

Tasks Require deep semantic understanding Textual Entailment (RTE) [Beltagy et al., 2013,2015,2016] Textual Similarity (STS) [Beltagy et al., 2014] (proposal work) Question Answering (QA)

Pipeline for an Entailment T: A person is driving a car H: A person is driving a vehicle Does T ⊨ H ? Logical form ∃x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ car(z) ∃x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ vehicle(z) Knowledge base KB: ∀x. car(x) → vehicle(x) | w Inference Calculating P(H|T, KB)

Summery of proposal work Efficient MLN inference for the RTE task [Beltagy et al., 2014] MLNs and PSL inference for the STS task [Beltagy et al., 2013] Reasons why MLNs fit RTE and PSL fits STS

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Future work Conclusion Redo - copy

Using Boxer, a rule based system on top of a CCG parser [Bos, 2008] Logical form T: A person is driving a car H: A person is driving a vehicle Parsing T: ∃x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ car(z) H: ∃x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ vehicle(z) Formulate the probabilistic logic problem based on the task, e.g. P(H|T,KB) Knowledge base construction KB: ∀x. car(x) → vehicle(x) | w Inference: calculating P(H|T, KB) Using Boxer, a rule based system on top of a CCG parser [Bos, 2008]

Adapting logical form Theorem proving: T ∧ KB ⊨ H Probabilistic logic: P(H|T,KB) Finite domain: explicitly introduce needed constants Prior probabilities: results are sensitive to prior probabilities Adapt logical form to probabilistic logic

Adapting logical form [Beltagy and Erk, IWCS 2015] Finite domain (proposal work) Quantifiers don’t work properly T: Tweety is a bird. Tweety flies bird(🐤) ∧ agent(F, 🐤) ∧ fly(F) H: All birds fly ∀x. bird(x) → ∃y. agent(y, x) ∧ fly(y) Solution: additional entities Add an extra bird(🐧)

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities Ground atoms have prior probability 0.5 P(H|KB) determines how useful P(H|T,KB) is If both values are high T entails H Prior probability of H is high Example T: My car is green H: There is a bird Goal: Make P(H|T,KB) less sensitive to P(H|KB)

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities Solution 1: use the ratio Not a good fit for the Entailment task T: A person is driving a car H: A person is driving a green car The ratio is high but

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities Solution 2: set ground atom priors such that P(H|KB) ≈ 0 Matches the definition of the Entailment task T: Obama is the president of the USA H: Austin is in Texas Even though H is true in the real world,

Adapting logical form [Beltagy and Erk, IWCS 2015] Prior probabilities Solution 2: set ground atom priors such that P(H|KB) ≈ 0 Ground atoms not entailed by T ∧ KB are set to false (everything is false by default) Prior probability of negated predicates of H is set to high value T: A dog is eating H: A dog does not fly

Adapting logical form [Beltagy and Erk, IWCS 2015] Evaluation — Entailment datasets Synthetic T: No man eats all delicious food H: Some hungry men eat not all food some, all, no, not all all monotonicity directions

Adapting logical form [Beltagy and Erk, IWCS 2015] Evaluation — Entailment datasets SICK [SemEval 2014] (5K training, 5K testing) Short video description sentences Example T: A young girl is dancing H: A young girl is standing on one leg FraCas [Cooper et al., 1996] 46 manually constructed entailments to evaluate quantifiers Example: T: A Swede won a Nobel prize. Every Swede is a Scandinavian H: A Scandinavian win a Nobel prize

Adapting logical form [Beltagy and Erk, IWCS 2015] Evaluation — Results

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Future work Conclusion Redo - copy

Knowledge Base Logic handles sentence structure and quantifier + Knowledge base encodes lexical information

Knowledge Base [Beltagy et al., CompLing 2016] Collect the relevant weighted KB from different resources Precompiled rules WordNet rules: map semantic relations to logical rules Paraphrase rules: translate PPDB to weighted logical rules Generate on-the-fly rules for a specific dataset/task Lexical resources are never complete Logic handles sentence structure and quantifier + Learned rules handle lexical variations

On-the-fly rules [Beltagy et al., CompLing 2016] Simple solution: (proposal work) Generate rules between all pairs of words Use distributional similarity to evaluate the rules Generating a lot of useless rules Generated rules have limited predefined forms T: A person is driving a car H: A person is driving a vehicle

On-the-fly rules [Beltagy et al., CompLing 2016] Better solution: Use the logic to propose relevant lexical rules Use the training set to learn rule weights

On-the-fly rules [Beltagy et al., CompLing 2016] 1) Rules proposal: using Robinson resolution T: person(P) ∧ agent(D, P) ∧ drive(D) ∧ patient(D, C) ∧ car(C) H: ∃x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ vehicle(z) T: car(C) H: vehicle(C) T: agent(D, P) ∧ patient(D, C) ∧ car(C) H: ∃z. agent(D, P) ∧ patient(D, z) ∧ vehicle(z) T: person(P) ∧ agent(D, P) ∧ drive(D) ∧ patient(D, C) ∧ car(C) H: ∃x,y,z. person(x) ∧ agent(y, x) ∧ drive(y) ∧ patient(y, z) ∧ vehicle(z) Proposed rules: KB: ∀x. car(x) → vehicle(x)

On-the-fly rules [Beltagy et al., CompLing 2016] Example: complex rule T: A person is solving a problem H: A person is finding a solution to a problem KB: ∀e,x. solve(e) ∧ patient(e,x) → ∃s. find(e) ∧ patient(e,s) ∧ solution(s) ∧ to(t,x)

On-the-fly rules [Beltagy et al., CompLing 2016] Example: negative rule T: A person is driving H: A person is walking KB: ∀x. drive(x) → walk(x)

On-the-fly rules [Beltagy et al., CompLing 2016] Automatically annotating rules proposed rules of entailing examples: positive rules non-entailing examples: negative rules A lexical entailment dataset (good chance to evaluate different lexical entailment systems)

On-the-fly rules [Beltagy et al., CompLing 2016] T: A man is walking ⊨ H: A person is walking ∀x. man(x) → person(x) positive rule T: I have a green car H: I have a green bike ∀x. car(x) → bike(x) negative rule A lexical entailment dataset (good chance to evaluate different lexical entailment systems)

On-the-fly rules [Beltagy et al., CompLing 2016] 2) Weight learning The task of evaluating the lexical rules is called “lexical entailment” Usually viewed as a classification task (positive/negative rules) We use the “lexical entailment classifier” by Roller and Cheng [Beltagy et al., CompLing 2016] It uses various linguistic features to learn how to evaluate unseen rules Use the annotated rules of the training set to train the classifier Use the classifier to evaluate the rules of the test set Use classifier confidence as a rule weight A lexical entailment dataset (good chance to evaluate different lexical entailment systems)

On-the-fly rules [Beltagy et al., CompLing 2016] Entailment training set Rules proposal using Robinson resolution Automatically annotating rules Lexical entailment training set A lexical entailment dataset (good chance to evaluate different lexical entailment systems) Entailment testing set Rules proposal using Robinson resolution unseen lexical rules lexical entailment classifier weighted rules of the test set

On-the-fly rules [Beltagy et al., CompLing 2016] Entailment = Lexical Entailment + Probabilistic Logic Inference A lexical entailment dataset (good chance to evaluate different lexical entailment systems)

On-the-fly rules — Evaluation [Beltagy et al., CompLing 2016] Recognizing Textual Entailment (RTE) [Dagan et al., 2013] Given two sentences T and H Find if T Entails, Contradicts or not related (Neutral) to H Examples Entailment: T: A man is walking through the woods. H: A man is walking through a wooded area. Contradiction: T: A man is jumping into an empty pool. H: The man is jumping into a full pool. Neutral: T: A young girl is dancing. H: A young girl is standing on one leg.

Textual Entailment — Settings Logical form CCG parser + Boxer + Multiple parses Logical form adaptations Special entity coreference assumption for the detection of contradictions Knowledge base Precompiled rules: WordNet + PPDB On-the-fly rules using Robinson resolution alignment Inference P(H|T, KB), P(¬H|T, KB) Efficient MLN inference for RTE (proposal work) Simple rule weights mapping from [0-1] to MLN weights

Efficient MLN Inference for RTE Inference problem: P(H|T, KB) Speeding up inference Calculate probability of a complex query formula Layout: -efficient inference -adapting inference to new tasks

Speeding up Inference [Beltagy and Mooney, StarAI 2014] MLN’s grounding generates very large graphical models, especially in NLP applications H has O(cv) ground clauses v: number of variables in H c: number of constants in the domain

Speeding up Inference [Beltagy and Mooney, StarAI 2014] H: ∃x,y. guy(x) ∧ agent(y, x) ∧ drive(y) Constants {A, B, C} Ground clauses guy(A) ∧ agent(A, A) ∧ drive(A) guy(A) ∧ agent(B, A) ∧ drive(B) guy(A) ∧ agent(C, A) ∧ drive(C) guy(B) ∧ agent(A, B) ∧ drive(A) guy(B) ∧ agent(B, B) ∧ drive(B) guy(B) ∧ agent(C, B) ∧ drive(C) guy(C) ∧ agent(A, C) ∧ drive(A) guy(C) ∧ agent(B, C) ∧ drive(B) guy(C) ∧ agent(C, C) ∧ drive(C)

Speeding up Inference [Beltagy and Mooney, StarAI 2014] Closed-world assumption: assume everything is false by default In the world, most things are false Enables inference speeding up Large number of ground atoms are trivially false Removing them simplifies the inference problem Find these ground atoms using “evidence propagation”

Speeding up Inference [Beltagy and Mooney, StarAI 2014] T: man(M) ∧ agent(D, M) ∧ drive(D) KB: ∀x. man( x ) → guy( x ) | 1.8 Ground Atoms: H: ∃x,y. guy(x) ∧ agent(y, x) ∧ drive(y) Ground clauses: guy(M) ∧ agent(D, M) ∧ drive(D) M man(M), man(D), guy(M), guy(D), drive(M), drive(D), agent(D, D), agent(D, M), agent(M, D), agent(M, M) man(M), man(D), guy(M), guy(D), drive(M), drive(D), agent(D, D), agent(D, M), agent(M, D), agent(M, M) man(M), man(D), guy(M), guy(D), drive(M), drive(D), agent(D, D), agent(D, M), agent(M, D), agent(M, M)

Query Formula [Beltagy and Mooney, StarAI 2014] MLN’s implementations calculates probabilities of ground atoms only How to calculate probability of a complex query formula H ? Workaround H ↔ result() | w = ∞ P(result())

Query Formula [Beltagy and Mooney, StarAI 2014] Inference algorithm supports query formulas [Gogate and Domingos, 2011] Z: normalization constant of the probability distribution Calculate Z: use SampleSearch [Gogate and Dechter, 2011] Works with mixed graphical models (probabilistic and deterministic)

Evaluation [Beltagy and Mooney, StarAI 2014] Dataset: SICK - RTE [SemEval, 2014] CPU Time (sec) Timeouts (30 min) Accuracy MLN 147 96% 57% MLN + Query 111 30% 69% MLN + Speed 10 2.5% 66% MLN + Query + Speed 7 2.1% 72% MLNs inference can be fast and efficient System Accuracy CPU Time Timeouts(30 min) mln 57% 2min 27sec 96% mln+qf 69% 1min 51sec 30% mln+cwa 66% 10sec 2.5% mln+qf+cwa 72% 7sec 2.1% Alchemy implementation will be released soon

Textual Entailment [Beltagy et al., CompLing 2016] Dataset: SICK - RTE [SemEval, 2014] System Accuracy Logic 73.4% Logic + precompiled rules + weight mapping + multiple parses 80.4% Logic + Robinson resolution rules 83.0% Logic + Robinson resolution rules + precompiled rules + weight mapping + multiple parses 85.1% Current state of the art (Lai and Hockenmaier 2014) 84.6%

Textual Similarity Semantic Textual Similarity (STS) [Agirre et al., 2012] Given two sentences S1, S2 Evaluate their semantic similarity on a scale from 1 to 5 Example S1: “A man is playing a guitar.” S2: “A woman is playing the guitar.” score: 2.75 S1: “A car is parking.” S2: “A cat is playing.” score: 0.00

Textual Similarity — Settings [Beltagy, Erk and Mooney, ACL 2014] (proposal work) Logical form CCG parser + Boxer Knowledge base Precompiled rules: WordNet On-the-fly rules between all pairs of words Inference P(S1|S2, KB), P(S2|S1, KB) MLN and PSL inference algorithms suited for the task

PSL Relaxed Conjunction (for STS) [Beltagy, Erk and Mooney, ACL 2014] Conjunction in PSL (and MLN) does not fit STS T: A man is playing a guitar. H: A woman is playing the guitar. (score: 2.75) Introduce a new “average operator” (instead of conjunction) I(ℓ1 ∧ … ∧ ℓn) = avg( I(ℓ1), …, I(ℓn)) Inference “average” is a linear function No changes in the optimization problem Heuristic grounding (details omitted) Integrated into the official release of PSL

Evaluation – STS inference [Beltagy, Erk and Mooney, ACL 2014] Compare MLN with PSL on the STS task PSL time MLN time MLN timeouts (10 min) msr-vid 8s 1m 31s 9% msr-par 30s 11m 49s 97% SICK 10s 4m 24s 36% Apply MCW to MLN for a fairer comparison because PSL already has a lazy grounding

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Future work Conclusion Redo - copy

Question Answering Open-domain Question Answering Inference: Given a document T and a query H(x) Find the named entity e from T that best fills x in H(x) T: …. The Arab League is expected to give its official blessing to the military operation on Saturday, which could clear the way for a ground invasion, CNN's Becky Anderson reported. The Arab League actions are … H(x): X blessing of military action may set the stage for a ground invasion Inference:

Question Answering New challenges Long and diverse text Different inference objective

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Logical form Inference Future work Conclusion Redo - copy

Question Answering — Logical form Translating dependency trees to Boxer-like output Rule-based translation More accurate Less expressive: no negation or quantifiers ∃x,y,z,t. move(x) ∧ tmod(x, y) ∧ time(y) ∧ around(y) ∧ nsubj(x, z) ∧ they(z) ∧ adjmod(x, t)∧ faster(t) ∧ even(t)

Question Answering — Logical form Algorithm: Start from root, then iteratively for every relation do one of the following: introduce new entity merge with existing entity ignore Resulting logical form is a conjunction of predicates and relations Limitation Does not represent any construct that requires “scope” Negation Quantifiers Relative clauses

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Logical form Inference Future work Conclusion Redo - copy

Question Answering — Knowledge base On-the-fly rules — Robinson resolution rules assumes there is only one way to align T and H not suitable for QA only works for short sentences with relatively similar structure

Question Answering — Knowledge base On-the-fly rules — Graph-based alignment view T and H as graphs align T and H based on a set of potentially matching entities extract rules from the alignment

Question Answering — Knowledge base T: …. The Arab League is expected to give its official blessing to the military operation on Saturday, which could clear the way for a ground invasion, CNN's Becky Anderson reported. The Arab League actions are … H: X blessing of military action may set the stage for a ground invasion

Question Answering — Knowledge base actions Becky Anderson report military operation T: Arab League expected give official bless Saturday clear way ground invasion X expected to give official bless => X bless official bless military operation => bless military action official bless clear way ground invasion => bless set stage ground invasion military action X bless set stage ground invasion H:

Question Answering — Knowledge base KB: r1: Arab League expected to give official blessing ⇒ X blessing r2: official blessing to military operation ⇒ blessing of military action r3: official blessing clear way for ground invasion ⇒ blessing set stage for ground invasion r4: Arab League actions ⇒ X blessing of military action r5: Becky Anderson reported give official blessing ⇒ X blessing Notes: Rules correspond to multiple possible alignments We have a procedure to automatically annotate the rules as positive and negative

Question Answering — Knowledge base Annotating rules Run inference to find rules relevant to the right answer (positive rules). Remaining rules are negative rules Use the annotated rules to train a classifier to weight rules Repeat (Expectation Maximization)

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Logical form Inference Future work Conclusion Redo - copy

Question Answering — Inference Inference problem: Can be solved using MLNs or PSL but they are not the most efficient Define our own graphic model and its inference algorithm Encodes all possible ways of aligning the document and question Inference finds the best one

Question Answering — Inference X bless military action set stage ground invasion multivalued random variable for each entity in the question instead of large number of binary random variables r4: Arab League actions Becky Anderson actions r2: official blessing to military operation r1: Arab League expected to give official blessing r3: official blessing clear way for ground invasion official blessing ground invasion military operation Arab League r5: Becky Anderson reported give official blessing Exact inference starts from X and exhaustively scans the search space

Question Answering — Evaluation Dataset: Collected from CNN (Hermann et al., 2015) 380K training, 4K validation, 3K testing System Accuracy Runtime Preliminary PSL implementation 33% 4 seconds This work 43% 9 milliseconds This work + lexical entailment classifier 48% This work + alignment classifier 63% State of the art (Chen et al., 2016) — Neural Network 72%

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Future work Conclusion Redo - copy

Future Work Generalize QA implementation: inference as an alignment Logical form: learn the transformation of dependency tree to logical form to recover scope and other phenomena that dependency parsers do not support Generalize our graphic model formulation to other tasks Extend it to support negation and quantifiers

Future Work Deep learning to integrate symbolic and continuous representations

Outline ّIntroduction Logical form adaptations Knowledge base Question Answering Future work Conclusion Redo - copy

Conclusion Probabilistic logic is a powerful representation that can effectively integrate symbolic and continuous aspects of meaning. Our contributions include adaptations of the logical form, various ways of collecting lexical knowledge and several inference algorithms for three natural language understanding tasks.

Multiple Parses Reduce effect of mis-parses Use the top CCG parse from C&C [Clark and Curran 2004] EasyCCG [Lewis and Steedman 2014] Each sentence has two parses: Text: T1, T2 Query: H1, H2 Run our system with all combinations and use the highest probability

Precompiled rules: WordNet 1) WordNet rules WordNet: lexical database of word and their semantic relations Synonyms: ∀x. man(x) ↔ guy(x) ⎮ w = ∞ Hyponym: ∀x. car(x) → vehicle(x) ⎮ w = ∞ Antonyms: ∀x. tall(x) ↔ ¬short(x) ⎮ w = ∞