Dan Roth Department of Computer Science

Slides:

Advertisements

Similar presentations

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Advertisements

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.

CS 6961: Structured Prediction Fall 2014 Introduction Lecture 1 What is structured prediction?

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Knowledge Acquisitioning. Definition The transfer and transformation of potential problem solving expertise from some knowledge source to a program.

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Radial Basis Function Networks

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Integer Linear Programming in NLP Constrained Conditional Models

Page 1 March 2009 Brigham Young University With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons, Scott.

Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Multi-Relational Data Mining: An Introduction Joe Paulowskey.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

2007 Science of Design (SoD) PI Meeting – Project Nuggets NSF SoD Award No: NSF SoD-HCER Project Title: Learning Based Programming Investigator.

Prior Knowledge Driven Domain Adaptation Gourab Kundu, Ming-wei Chang, and Dan Roth Hyphenated compounds are tagged as NN. Example: H-ras Digit letter.

Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin

Machine Learning 5. Parametric Methods.

CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Learning Relational Dependency Networks for Relation Extraction

Automatic Test Generation

Lecture 7: Constrained Conditional Models

Algorithms and Problem Solving

Sentiment analysis algorithms and applications: A survey

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Integer Linear Programming Formulations in Natural Language Processing

KDD CUP 2001 Task 1: Thrombin Jie Cheng (

Dan Roth Department of Computer and Information Science

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Kai-Wei Chang University of Virginia

CIS 700 Advanced Machine Learning for NLP A First Look at Structures

CIS 700 Advanced Machine Learning for NLP Inference Applications

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

Margin-based Decomposed Amortized Inference

Lecture 24: NER & Entity Linking

Optimization Techniques for Natural Resources SEFS 540 / ESRM 490 B

CSCI 5822 Probabilistic Models of Human and Machine Learning

Hidden Markov Models Part 2: Algorithms

LTI Student Research Symposium 2004 Antoine Raux

Probabilistic Horn abduction and Bayesian Networks

Representing Uncertainty

Overview of Machine Learning

A Structured Learning Approach to Temporal Relation Extraction

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.

Algorithms and Problem Solving

Nonparametric Hypothesis Tests for Dependency Structures

Lecture 14 Learning Inductive inference

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Dan Roth Computer and Information Science University of Pennsylvania

Multidisciplinary Optimization

Presentation transcript:

Introductory Notes about Constrained Conditional Models and ILP for NLP Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign Notes CS546-11

Goal: Learning and Inference Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome. E.g. Structured Output Problems – multiple dependent output variables (Learned) models/classifiers for different sub-problems In some cases, not all local models can be learned simultaneously Engineering issues: We don’t have data annotated with all aspects of problem Distributed development In these cases, constraints may appear only at evaluation time In other cases, we may prefer to learn independent models Incorporate models’ information, along with prior knowledge/constraints, in making coherent decisions decisions that respect the local models as well as domain & context specific knowledge/constraints.

ILP & Constraints Conditional Models (CCMs) Making global decisions in which several local interdependent decisions play a role. Informally: Everything that has to do with constraints (and learning models) Formally: We typically make decisions based on models such as: Argmaxy wT Á(x,y) CCMs (specifically, ILP formulations) make decisions based on models such as: Argmaxy wT Á(x,y) +  c 2 C ½c d(y, 1C) We do not define the learning method, but we’ll discuss it and make suggestions CCMs make predictions in the presence of /guided by constraints Issues to attend to: While we formulate the problem as an ILP problem, Inference can be done multiple ways Search; sampling; dynamic programming; SAT; ILP The focus is on joint global inference Learning may or may not be joint. Decomposing models is often beneficial 3

Constraint Driven Learning The Space of Problems Examples How to solve? [Inference] An Integer Linear Program Exact (ILP packages) or approximate solutions How to train? [Learning] Training is learning the objective function [A lot of work on this] Decouple? Joint Learning vs. Joint Inference Difficulty of Annotating Data Indirect Supervision Constraint Driven Learning Semi-supervised Learning Constraint Driven Learning New Applications

Introductory Examples: Introductory Examples: Constrained Conditional Models (aka ILP for NLP) CCMs can be viewed as a general interface to easily combine domain knowledge with data driven statistical models Formulate NLP Problems as ILP problems (inference may be done otherwise) 1. Sequence tagging (HMM/CRF + Global constraints) 2. Sentence Compression (Language Model + Global Constraints) 3. SRL (Independent classifiers + Global Constraints) Sentence Compression/Summarization: Language Model based: Argmax  ¸ijk xijk Sequential Prediction HMM/CRF based: Argmax  ¸ij xij Linguistics Constraints Cannot have both A states and B states in an output sequence. Linguistics Constraints If a modifier chosen, include its head If verb is chosen, include its arguments

Next Few Meetings: (I) How to pose the inference problem Introduction to ILP Posing NLP Problems as ILP problems 1. Sequence tagging (HMM/CRF + global constraints) 2. SRL (Independent classifiers + Global Constraints) 3. Sentence Compression (Language Model + Global Constraints) Detailed examples 1. Co-reference 2. A bunch more ... Inference Algorithms (ILP & Search) Compiling knowledge to linear inequalities Inference algorithms 6

Next Few Meetings (Part II) Training Issues Learning models Independently of constraints (L+I); Jointly with constraints (IBT) Decomposed to simpler models Learning constraints’ penalties Independently of learning the model Jointly, along with learning the model Dealing with lack of supervision Constraints Driven Semi-Supervised learning (CODL) Indirect Supervision Learning Constrained Latent Representations Markov Logic Networks Relations and Differences 7

Summary: Constrained Conditional Models Conditional Markov Random Field Constraints Network y7 y4 y5 y6 y8 y1 y2 y3 y7 y4 y5 y6 y8 y1 y2 y3 y* = argmaxy  wi Á(x; y) Linear objective functions Often Á(x,y) will be local functions, or Á(x,y) = Á(x) - i ½i dC(x,y) Expressive constraints over output variables Soft, weighted constraints Specified declaratively as FOL formulae For example, the first argument of the born_in relation must be a person, and the second argument is a location. Clearly, there is a joint probability distribution that represents this mixed model. We would like to: Learn a simple model or several simple models Make decisions with respect to a complex model : Regularize in the posterior rather then in the prior. Key difference from MLNs, which provide a concise definition of a model, but the whole joint one.