Introductory Notes about Constrained Conditional Models and ILP for NLP Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign Notes CS546-11
Goal: Learning and Inference Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome. E.g. Structured Output Problems – multiple dependent output variables (Learned) models/classifiers for different sub-problems In some cases, not all local models can be learned simultaneously Engineering issues: We don’t have data annotated with all aspects of problem Distributed development In these cases, constraints may appear only at evaluation time In other cases, we may prefer to learn independent models Incorporate models’ information, along with prior knowledge/constraints, in making coherent decisions decisions that respect the local models as well as domain & context specific knowledge/constraints.
ILP & Constraints Conditional Models (CCMs) Making global decisions in which several local interdependent decisions play a role. Informally: Everything that has to do with constraints (and learning models) Formally: We typically make decisions based on models such as: Argmaxy wT Á(x,y) CCMs (specifically, ILP formulations) make decisions based on models such as: Argmaxy wT Á(x,y) + c 2 C ½c d(y, 1C) We do not define the learning method, but we’ll discuss it and make suggestions CCMs make predictions in the presence of /guided by constraints Issues to attend to: While we formulate the problem as an ILP problem, Inference can be done multiple ways Search; sampling; dynamic programming; SAT; ILP The focus is on joint global inference Learning may or may not be joint. Decomposing models is often beneficial 3
Constraint Driven Learning The Space of Problems Examples How to solve? [Inference] An Integer Linear Program Exact (ILP packages) or approximate solutions How to train? [Learning] Training is learning the objective function [A lot of work on this] Decouple? Joint Learning vs. Joint Inference Difficulty of Annotating Data Indirect Supervision Constraint Driven Learning Semi-supervised Learning Constraint Driven Learning New Applications
Introductory Examples: Introductory Examples: Constrained Conditional Models (aka ILP for NLP) CCMs can be viewed as a general interface to easily combine domain knowledge with data driven statistical models Formulate NLP Problems as ILP problems (inference may be done otherwise) 1. Sequence tagging (HMM/CRF + Global constraints) 2. Sentence Compression (Language Model + Global Constraints) 3. SRL (Independent classifiers + Global Constraints) Sentence Compression/Summarization: Language Model based: Argmax ¸ijk xijk Sequential Prediction HMM/CRF based: Argmax ¸ij xij Linguistics Constraints Cannot have both A states and B states in an output sequence. Linguistics Constraints If a modifier chosen, include its head If verb is chosen, include its arguments
Next Few Meetings: (I) How to pose the inference problem Introduction to ILP Posing NLP Problems as ILP problems 1. Sequence tagging (HMM/CRF + global constraints) 2. SRL (Independent classifiers + Global Constraints) 3. Sentence Compression (Language Model + Global Constraints) Detailed examples 1. Co-reference 2. A bunch more ... Inference Algorithms (ILP & Search) Compiling knowledge to linear inequalities Inference algorithms 6
Next Few Meetings (Part II) Training Issues Learning models Independently of constraints (L+I); Jointly with constraints (IBT) Decomposed to simpler models Learning constraints’ penalties Independently of learning the model Jointly, along with learning the model Dealing with lack of supervision Constraints Driven Semi-Supervised learning (CODL) Indirect Supervision Learning Constrained Latent Representations Markov Logic Networks Relations and Differences 7
Summary: Constrained Conditional Models Conditional Markov Random Field Constraints Network y7 y4 y5 y6 y8 y1 y2 y3 y7 y4 y5 y6 y8 y1 y2 y3 y* = argmaxy wi Á(x; y) Linear objective functions Often Á(x,y) will be local functions, or Á(x,y) = Á(x) - i ½i dC(x,y) Expressive constraints over output variables Soft, weighted constraints Specified declaratively as FOL formulae For example, the first argument of the born_in relation must be a person, and the second argument is a location. Clearly, there is a joint probability distribution that represents this mixed model. We would like to: Learn a simple model or several simple models Make decisions with respect to a complex model : Regularize in the posterior rather then in the prior. Key difference from MLNs, which provide a concise definition of a model, but the whole joint one.