Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Artificial Intelligence
Advertisements

1 Knowledge Representation Introduction KR and Logic.
Explanation-Based Learning (borrowed from mooney et al)
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
Page 1 CS 546 Machine Learning in NLP Structured Prediction: Theories and Applications to Natural Language Processing Dan Roth Department of Computer Science.
Page 1 CS 546 Machine Learning in NLP Structured Prediction: Theories and Applications to Natural Language Processing Dan Roth Department of Computer Science.
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
CS 6961: Structured Prediction Fall 2014 Introduction Lecture 1 What is structured prediction?
Page 1 Learning and Global Inference for Information Access and Natural Language Understanding Dan Roth Department of Computer Science University of Illinois.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Logical Agents Chapter 7 Feb 26, Knowledge and Reasoning Knowledge of action outcome enables problem solving –a reflex agent can only find way from.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Online Learning Algorithms
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
1 CS598 DNR FALL 2005 Machine Learning in Natural Language Dan Roth University of Illinois, Urbana-Champaign
Page 1 February 2008 University of Edinburgh With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons,
Global Inference in Learning for Natural Language Processing Vasin Punyakanok Department of Computer Science University of Illinois at Urbana-Champaign.
Page 1 November 2007 Beckman Institute With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Mark Sammons, Scott Yih, Dav Zimak.
1 CS546 Spring 2009 Machine Learning in Natural Language Dan Roth & Ivan Titov SC Wed/Fri 9:30  What’s.
Page 1 March 2009 Brigham Young University With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons, Scott.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.
June 2013 Inferning Workshop, ICML, Atlanta GA Amortized Integer Linear Programming Inference Dan Roth Department of Computer Science University of Illinois.
Penn 1 Kindle: Knowledge and Inference via Description Logics for Natural Language Dan Roth University of Illinois, Urbana-Champaign Martha Palmer University.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Page 1 Learning and Inference in Natural Language From Stand Alone Learning Tasks to Structured Representations Dan Roth Department of Computer Science.
Page 1 Global Inference in Learning for Natural Language Processing.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Generic Tasks by Ihab M. Amer Graduate Student Computer Science Dept. AUC, Cairo, Egypt.
Data Mining and Decision Support
Page 1 CS 546 Machine Learning in NLP Structured Prediction: Theories and Applications in Natural Language Processing Dan Roth Department of Computer Science.
Page 1 April 2010 Carnegie Mellon University With thanks to: Collaborators: Ming-Wei Chang, James Clarke, Dan Goldwasser, Lev Ratinov, Vivek Srikumar,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Page 1 January 2010 Saarland University, Germany. Constrained Conditional Models Learning and Inference for Natural Language Understanding Dan Roth Department.
From NARS to a Thinking Machine Pei Wang Temple University.
1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign
Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Page 1 June 2009 ILPNLP NAACL-HLT With thanks to: Collaborators: Ming-Wei Chang, Dan Goldwasser, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo,
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Lecture 7: Constrained Conditional Models
CSC 594 Topics in AI – Natural Language Processing
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Improving a Pipeline Architecture for Shallow Discourse Parsing
Knowledge Representation
Margin-based Decomposed Amortized Inference
Overview of Machine Learning
Dan Roth Department of Computer Science
Habib Ullah qamar Mscs(se)
Presentation transcript:

Page 1 Global Inference and Learning Towards Natural Language Understanding Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign CRI-06 Workshop on Machine Learning in Natural Language Processing

Page 2 Nice to Meet You

Page 3 Learning and Inference  Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.  (Learned) classifiers for different sub-problems  Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints.  Global inference for the best assignment to all variables of interest.

Page 4 Comprehension 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now. A process that maintains and updates a collection of propositions about the state of affairs. (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 5 How to Address Comprehension? (cartoon) It’s an Inference Problem Map into a well defined language Use standard reasoning tools Huge number of problems: Variability of Language Knowledge Acquisition Reasoning Patterns multiple levels of ambiguity make language precise Canonical Mapping? Underspecificity calls for “purposeful”, goal specific mapping Statistics: know a word by its neighbors Counting Machine Learning Clustering Probabilistic models Structured Models Classification

Page 6 Illinois’ bored of education board...Nissan Car and truck plant is … …divide life into plant and animal kingdom (This Art) (can N) (will MD) (rust V) V,N,N The dog bit the kid. He was taken to a veterinarian a hospital What we Know: Stand Alone Ambiguity Resolution Learn a function f: X  Y that maps observations in a domain to one of several categories or < Broad Coverage

Page 7  Theoretically: generalization bounds  How many example does one need to see in order to guarantee good behavior on previously unobserved examples.  Algorithmically: good learning algorithms for linear representations.  Can deal with very high dimensionality (10 6 features)  Very efficient in terms of computation and # of examples. On-line.  Key issues remaining:  Learning protocols: how to minimize interaction (supervision); how to map domain/task information to supervision; semi-supervised learning; active learning; ranking; sequences  What are the features? No good theoretical understanding here.  Programming systems that have multiple classifiers Classification is Well Understood

Page 8 Comprehension 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now. A process that maintains and updates a collection of propositions about the state of affairs. (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 9 This Talk  Integrating Learning and Inference  Historical Perspective  Role of Learning  Global Inference over Classifiers  Semantic Parsing  Multiple levels of processing  Textual Entailment  Summary and Future Directions

Page 10 Learning, Knowledge Representation, Reasoning There has been a lot of work on these three topics in AI But, mostly,  work on Learning, and  work on Knowledge Representation and Reasoning Very little has been done on an integrative framework.  Inductive Logic Programming  Some work from the perspective of probabilistic AI  Some work from the perspective of Machine Learning (EBL)  Recent: Valiant’s Robust Logic  Learning to Reason

Page 11 An unified framework to study Learning, Knowledge Representation and Reasoning The goal is to Reason (deduction; abduction - best explanation) Reasoning is not done from a static Knowledge Base but rather done with knowledge that is learned via interaction with the world. Intermediate Representation is important – but only to the extent that it is learnable, and it facilitates reasoning. Feedback to learning is given by the reasoning stage. There may not be a need (or even a possibility) to learn the intermediate representation exactly, but only to the extent that is supports Reasoning. [Khardon & Roth JACM97, AAAI94; Roth95, Roth96, Khardon&Roth99 Learning to Plan: Khardon’99] Learning to Reason [’94-’97]: Key Insights

Page 12 Learning to Reason Interaction with the World  Knowledge representation is:  - Chosen to facilitate inference  - Learned by interaction with the world Performance is measured with respect to the world, on a reasoning task Task Reasoning W Learning Knowledge Representation KB

Page 13 Deduction (Khardon,Roth 94,97) f q, f  CNF, q  MonCNF L2R: Computational Advantages

Page 14 Deduction (Khardon,Roth 94,97) f q, f  CNF, q  MonCNF Learning to Reason is easy when Reasoning is hard f q L2R: Computational Advantages

Page 15 Deduction (Khardon,Roth 94,97) f q, f  CNF, q  MonCNF Learning to Reason is easy when Reasoning is hard W KB(f) q L2R: Computational Advantages

Page 16 Deduction (Khardon,Roth 94,97) f q, f  CNF, q  MonCNF Learning to Reason is easy when Reasoning is hard Learning to Reason is easy when Learning is hard W RL KB(f) W L f q L2R: Computational Advantages

Page 17 Deduction (Khardon,Roth 94,97) f q, f  CNF, q  MonCNF Learning to Reason is easy when Reasoning is hard Learning to Reason is easy when Learning is hard W RL KB(f) W RL q q L2R: Computational Advantages

Page 18 Deduction (Khardon,Roth 94,97) f q, f  CNF, q  MonCNF Learning to Reason is easy when Reasoning is hard Learning to Reason is easy when Learning is hard No Magic W RL KB(f) W RL q q L2R: Computational Advantages Learn a different representation for f, one that allows efficient reasoning Learn a different function, which only approximates f, but is sufficient for exact reasoning. Other studies on non-monotonic reasoning as learning, etc.

Page 19 An unified framework to study Learning, Knowledge Representation and Reasoning The goal is to Reason (deduction; abduction - best explanation) Reasoning is not done from a static Knowledge Base but rather done with knowledge that is learned via interaction with the world. Intermediate Representation is important – but only to the extent that it is learnable, and it facilitates reasoning. Feedback to learning is given by the reasoning stage. There may not be a need (or even a possibility) to learn the intermediate representation exactly, but only to the extent that is supports Reasoning. [Khardon & Roth JACM97, AAAI94; Roth95, Roth96, Khardon&Roth99 Learning to Plan: Khardon’99] Learning to Reason [’94-’97]: Relevant? How?

Page 20 Comprehension 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now. A process that maintains and updates a collection of propositions about the state of affairs. (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 21 Learning serves to support abstraction  A context sensitive operation that is done at multiple levels  From names (Mr. Robin, Christopher Robin) to Relations (wrote, author) and concepts Learning serves to generate the vocabulary  over which reasoning is possible (Part-of-speech; a subject-of,…) Knowledge Acquisition; tuning; memory,…. Learning in the context of a reasoning system  Training with an Inference Mechanism  Feedback is done at the inference level, not at the single classifier level  Labeling using Inference It’s an Inference Problem Today’s Research Issues What is the role of Learning?

Page 22 This Talk  Integrating Learning and Inference  Historical Perspective  Role of Learning  Global Inference over Classifiers  Semantic Parsing  Multiple levels of processing  Textual Entailment  Summary and Future Directions

Page 23 A unified representation that is used  as an input to learning processes, and  as an output of learning processes  …… Specifically, we use: An abstract representation that is centered around a semantic parse (predicate-argument representation) augmented by additional information. Formalized as a hierarchical concept graph (Description Logic inspired) Feature Description Logic [Cumby&Roth’00, 02, 03] What I will not talk about: Knowledge Representation ci ty pers on date month(April) year(2001) cou ntry mee ting participa nt locationtime name(Iraq ) affiliatio n nationalit y name(Pragu e) organizatio n Mohammed Atta met with an Iraqi intelligence agent in Prague in April 2001.

Page 24 Inference and Learning  Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.  Learned classifiers for different sub-problems  Incorporate classifiers’ information, along with constraints, in making coherent decisions – decisions that respect the local classifiers as well as domain & context specific constraints.  Global inference for the best assignment to all variables of interest. How to induce a predicate argument representation of a sentence. How to use inference methods over learned outcomes. How to use declarative information over/along with learned information.

Page 25 Semantic Role Labeling I left my pearls to my daughter in my will. [ I ] A0 left [ my pearls ] A1 [ to my daughter ] A2 [ in my will ] AM-LOC. A0Leaver A1Things left A2Benefactor AM-LOCLocation I left my pearls to my daughter in my will. Special Case (structure output problem): here, all the data is available at one time; in general, classifiers might be learned from different sources, at different times, at different contexts. Implications on training paradigms Overlapping arguments If A2 is present, A1 must also be present.

Page 26 Random Variables Y: Conditional Distributions P (learned by classifiers) Constraints C– any Boolean function defined on partial assignments (possibly: + weights W ) Goal: Find the “best” assignment  The assignment that achieves the highest global accuracy. This is an Integer Programming Problem Problem Setting y4y4 y5y5 y6y6 y7y7 y8y8 y1y1 y2y2 y3y3 C(y 1,y 4 ) C(y 2,y 3,y 6,y 7,y 8 ) Y*=argmax Y P  Y subject to constraints C (+ W  C)

Page 27 A General Inference Setting Inference as Optimization [Yih&Roth CoNLL ’ 04] [Punyakanok et. al COLING ’ 04] [Punyakanok et. al IJCAI ’ 04]  Markov Random Field  [standard] Optimization Problem (e.g., Metric Labeling Problems)  [Chekuri et. al ’ 01] Linear Programming Problems An Integer linear programming (ILP) formulation  General: works on non-sequential constraint structure  Expressive: can represent many types of declarative constraints  Optimal: finds the optimal solution  Fast: commercial packages are able to quickly solve very large problems (hundreds of variables and constraints; sparsity is important)

Page 28 For each verb in a sentence 1. Identify all constituents that fill a semantic role 2. Determine their roles Core Arguments, e.g., Agent, Patient or Instrument Their adjuncts, e.g., Locative, Temporal or Manner I left my pearls to my daughter-in-law in my will. A0 : leaver A1 : thing left A2 : benefactor AM-LOC The pearls which I left to my daughter-in-law are fake. A0 : leaverA1 : thing left A2 : benefactorR-A1 The pearls, I said, were left to my daughter-in-law. A0 : sayer A1 : utteranceC-A1 : utterance Who did what to whom, when, where, why,… Semantic Role Labeling (1/2)

Page 29 PropBank [Palmer et. al. 05] provides a large human-annotated corpus of semantic verb-argument relations.  It adds a layer of generic semantic labels to Penn Tree Bank II.  (Almost) all the labels are on the constituents of the parse trees. Core arguments: A0-A5 and AA  different semantics for each verb  specified in the PropBank Frame files 13 types of adjuncts labeled as AM- arg  where arg specifies the adjunct type Semantic Role Labeling (2/2)

Page 30 Algorithmic Approach Identify argument candidates  Pruning [Xue&Palmer, EMNLP’04]  Argument Identifier Binary classification (SNoW) Classify argument candidates  Argument Classifier Multi-class classification (SNoW) Inference  Use the estimated probability distribution given by the argument classifier  Use structural and linguistic constraints  Infer the optimal global output I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] I left my nice pearls to her Identify Vocabulary Inference over (old and new) Vocabulary candidate arguments

Page 31 Argument Identification & Classification Both argument identifier and argument classifier are trained phrase-based classifiers. Features (some examples)  voice, phrase type, head word, path, chunk, chunk pattern, etc. [some make use of a full syntactic parse] Learning Algorithm – SNoW  Sparse network of linear functions weights learned by regularized Winnow multiplicative update rule  Probability conversion is done via softmax p i = exp{act i }/  j exp{act j } I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] I left my nice pearls to her

Page 32 Inference I left my nice pearls to her The output of the argument classifier often violates some constraints, especially when the sentence is long. Finding the best legitimate output is formalized as an optimization problem and solved via Integer Linear Programming. [Punyakanok et. al 04, Roth & Yih 04] Input:  The probability estimation (by the argument classifier)  Structural and linguistic constraints Allows incorporating expressive (non-sequential) constraints on the variables (the arguments types).

Page 33 Integer Linear Programming (ILP) Maximize: Subject to

Page 34 Integer Linear Programming Inference For each argument a i  Set up a Boolean variable: a i,t indicating whether a i is classified as t Goal is to maximize   i score(a i = t ) a i,t  Subject to the (linear) constraints Any Boolean constraints can be encoded as linear constraint(s). If score(a i = t ) = P(a i = t ), the objective is to find the assignment that maximizes the expected number of arguments that are correct and satisfies the constraints.

Page 35 Maximize expected number correct  T* = argmax T  i P( a i = t i ) Subject to some constraints  Structural and Linguistic (R-A1  A1) Solved with Integer Learning Programming I left my nice pearls to her Cost = = 1.6Non-OverlappingCost = = 1.4 Blue  Red & N-O Cost = = 1.8Independent Max Inference

Page 36 No duplicate argument classes  a  P OT A RG x { a = A0 }  1 R-ARG  a2  P OT A RG,  a  P OT A RG x { a = A0 }  x { a2 = R-A0 } C-ARG  a2  P OT A RG,  (a  P OT A RG )  (a is before a2 ) x { a = A0 }  x { a2 = C-A0 } Many other possible constraints: Unique labels No overlapping or embedding Relations between number of arguments If verb is of type A, no argument of type B Any Boolean rule can be encoded as a linear constraint. If there is an R-ARG phrase, there is an ARG Phrase If there is an C-ARG phrase, there is an ARG before it Constraints Universally quantified rules

Page 37 This approach produces a very good semantic parser. Top ranked system in CoNLL’05 shared task:  Key difference is the Inference Easy and fast: ~1 Sentence/Second (using Xpress-MP) A lot of room for improvement (additional constraints) Demo available  Significant also in enabling knowledge acquisition So far, shown the use of only declarative (deterministic) constraints. In fact, this approach can be used both with statistical and declarative constraints. Semantic Parsing: Summary I

Page 38 ILP as a Unified Algorithmic Scheme Consider a common model for sequential inference: HMM/CRF  Inference in this model is done via the Viterbi Algorithm. Viterbi is a special case of the Linear Programming based Inference.  Viterbi is a shortest path problem, which is a LP, with a canonical matrix that is totally unimodular. Therefore, you can get integrality constraints for free.  One can now incorporate non-sequential/expressive/declarative constraints by modifying this canonical matrix  The extension reduces to a polynomial scheme under some conditions (e.g., when constraints are sequential, when the solution space does not change, etc.)  Not necessarily increases complexity and very efficient in practice [Roth&Yih, ICML’05] y1y1 y2y2 y3y3 y4y4 y5y5 y x x1x1 x2x2 x3x3 x4x4 x5x5 s A B C A B C A B C A B C A B C t

Page 39 An Inference method for the “best explanation”, used here to induce a semantic representation of a sentence.  A general Information Integration framework. Allows expressive constraints  Any Boolean rule can be represented by a set of linear (in)equalities Combining acquired (statistical) constraints with declarative constraints  Start with shortest path matrix and constraints  Add new constraints to the basic integer linear program. Solved using off-the-shelf packages  If the additional constraints don’t change the solution, LP is enough  Otherwise, the computational time depends on sparsity; fast in practice Demo available Integer Linear Programming Inference - Summary

Page 40 This Talk  Integrating Learning and Inference  Historical Perspective  Role of Learning  Global Inference over Classifiers  Semantic Parsing  Multiple levels of processing  Textual Entailment  Summary and Future Directions

Page 41  Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.  So far, this was a single stage process.  Learn (acquire a new vocabulary) and  Run inference over it to guarantee the coherency of the outcome.  Is that it?  Of course, this isn’t sufficient.  The process of learning and Inference needs to be done in phases. Inference and Learning It’s turtles all the way down…

Page 42 Pipelining is a crude approximation; interactions occur across levels and down stream decisions often interact with previous decisions. Leads to propagation of errors Occasionally, later stage problems are easier but upstream mistakes will not be corrected. There are good reasons for pipelining decisions Global inference over the outcomes of different levels can be used to break away from this paradigm. [between pipeline & fully global] Allows a flexible way to incorporate linguistic and structural constraints. POS TaggingPhrasesSemantic EntitiesRelations Vocabulary is generated in phases Left to Right processing of sentences is also a pipeline process ParsingWSDSemantic Role Labeling Raw Data Pipeline

Page 43 J.V. Oswald was murdered at JFK after his assassin, K. F. Johns… Identify: J.V. Oswald was murdered at JFK after his assassin, K. F. Johns… location person Kill (X, Y) Identify named entities Identify relations between entities Exploit mutual dependencies between named entities and relation to yield a coherent global detection. [Roth & Yih, COLING’02;CoNLL’04] Some knowledge (classifiers) may be known in advance Some constraints may be available only at decision time Entities and Relations: Information Integration

Page 44 This Talk  Integrating Learning and Inference  Historical Perspective  Role of Learning  Global Inference over Classifiers  Semantic Parsing  Multiple levels of processing  Textual Entailment  Summary and Future Directions

Page 45 Comprehension 1. Christopher Robin was born in England. 2. Winnie the Pooh is a title of a book. 3. Christopher Robin’s dad was a magician. 4. Christopher Robin must be at least 65 now. A process that maintains and updates a collection of propositions about the state of affairs. (ENGLAND, June, 1989) - Christopher Robin is alive and well. He lives in England. He is the same person that you read about in the book, Winnie the Pooh. As a boy, Chris lived in a pretty home called Cotchfield Farm. When Chris was three years old, his father wrote a poem about him. The poem was printed in a magazine for others to read. Mr. Robin then wrote a book. He made up a fairy tale land where Chris lived. His friends were animals. There was a bear called Winnie the Pooh. There was also an owl and a young pig, called a piglet. All the animals were stuffed toys that Chris owned. Mr. Robin made them come to life with his words. The places in the story were all near Cotchfield Farm. Winnie the Pooh was written in Children still love to read about Christopher Robin and his animal friends. Most people don't know he is a real person who is grown now. He has written two books of his own. They tell what it is like to be famous.

Page 46 Given: Q: Who acquired Overture? Determine: A: Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc last year. Textual Entailment Eyeing the huge market potential, currently led by Google, Yahoo took over search company Overture Services Inc. last year Yahoo acquired Overture Entails Subsumed by  Overture is a search company Google is a search company ………. Google owns Overture By “semantically entailed” we mean: most people would agree that one sentence implies the other. Simply – making plausible inferences Phrasal verb paraphrasing [Connor&Roth’06] Entity matching [Li et. al, AAAI’04, NAACL’04] Semantic Role Labeling

Page 47 Discussing Textual Entailment Requires an inference process that makes use of a large number of learned (and knowledge-based) operators.  A sound approach for determining whether a statement of interest holds in a given sentence. [Braz et. al, AAAI05]  A pair (Sentence, hypothesis) is transformed into a simpler pair, in an entailment preserving manner.  Constrained Optimization formulation, over a large number of learned operators. Aimed at the best (simplest) mapping between predicate- argument representations. Inference is purposeful: No canonical representation, but rather reasoning on sentences transformation depends on the hypothesis. What is shown next is a proof. At any stage, a large number of operators are entertained, some do not fire, some lead nowhere. This is a path through the optimization process, that leads to a justifiable (and explainable) answer.

Page 48 Sample Entailment Pair S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. Does ‘T’ follow from ‘S’?

Page 49 S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. They finally made up their minds to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. OPERATOR 1: Phrasal Verb Replace phrasal verbs with an equivalent single word verb

Page 50 S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. They finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. OPERATOR 2: Coreference Resolution Replace pronouns/possessive pronouns with the entity to which they refer

Page 51 S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments, involved supplies of crude or refined oil products. S:Hurricane Katrina petroleum-supply outlook improved somewhat, yesterday, as U.S. and European governments finally reached a consensus. U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. OPERATOR 3: Focus of Attention Remove segments of a sentence that do not appear to be necessary; may allow more accurate annotation of remaining words

Page 52 S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments offered supplies of crude or refined oil products. S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Offers by individual European governments involved supplies of crude or refined oil products. OPERATOR 4: Nominalization Promotion Replace a verb that does not express a useful/meaningful relationship with a nominalization in one of its arguments involved supplies … Offers by individual … offered Individual …supplies … Requires semantic role labeling (for noun predicates)

Page 53 S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments offered supplies of crude or refined oil products. S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments supplied crude or refined oil products. S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments offered supplies of crude or refined oil products. OPERATOR 4: Nominalization Promotion Replace a verb that does not express a useful/meaningful relationship with a nominalization in one of its arguments offered Individual … supplies of crude … supplied Individual …crude …

Page 54 S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments supplied crude or refined oil products. OPERATOR 5: Predicate Embedding Resolution Replace a verb compound where the first verb may indicate modality or negation with a single verb, marked with negation/modality attribute decided U.S. and … release 2 million barrels a day, of oil … released U.S. and … 2 million barrels a day, of oil … S:U.S. and European governments finally decided to release 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments supplied crude or refined oil products. S:U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments supplied crude or refined oil products. ‘decided’ (almost) does not change the meaning of the embedded verb But what if the embedding verb had been ‘refused’? ENTAILMENT SHOULD NOT BE SUCCEEDED refused U.S. and … release 2 million barrels a day, of oil … released U.S. and … 2 million barrels a day, of oil … negation Requires semantic role labeling (for noun predicates)

Page 55 S:U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments supplied crude or refined oil products. OPERATOR 6: Predicate Matching System matches PREDICATES and their ARGUMENTS -- accounts for monotonicity, modality, negation, and quantifiers S:U.S. and European governments finally released 2 million barrels a day, of oil and refined products, from their reserves. T:Individual European governments supplied crude or refined oil products. ENTAILMENT SUCCEEDS Requires lexical abstraction

Page 56 Discussed a general paradigm for learning and inference in the context of natural language understanding tasks Did not discuss: Knowledge Representation How to train?  Key insight – what to learn is driven by global decisions.  Luckily: # of components is much smaller than # of decisions. Emphasis should be on  Learn Locally and make use globally (via global inference) [Punyakanok et. al IJCAI’05]  Ability to make use of domain & constraints to drive supervision [Klementiev & Roth, ACL’06] Conclusions

Page 57 Discussed a general paradigm for learning and inference in the context of natural language understanding tasks  Incorporate classifiers’ information, along with expressive constraints, within an inference framework for the best explanation. We can now incorporate many good old ideas. Learning allows us to develop the right vocabulary, and supports appropriate abstractions so that we can study natural language understanding as a problem of reasoning. Room for new research on reasoning patterns in NLP Conclusions (2)

Page 58 Acknowledgement Many of my students contributed significantly to this line of work Vasin Punyakanok, Scott Yih, Mark Sammons Xin Li, Dav Zimak, Rodrigo de salvo Braz, Chad Cumby, Yair Even Zohar, Michael Connor; Kevin Small, Alex Klementiev Funding  ARDA, under the AQUAINT program  NSF: ITR IIS , ITR IIS ; ITR IIS  A DOI grant under the Reflex program,  DASH Optimization

Page 59 Questions? Thank you