A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University.

Slides:



Advertisements
Similar presentations
Using dynamic flow network modeling for global flight plan optimization CARE workshop 14th –15th March 2001, EUROCONTROL Brussels. Dritan Nace Heudiasyc.
Advertisements

Authors Sebastian Riedel and James Clarke Paper review by Anusha Buchireddygari Incremental Integer Linear Programming for Non-projective Dependency Parsing.
BU Decision Models Integer_LP1 Integer Optimization Summer 2013.
Guiding Semi- Supervision with Constraint-Driven Learning Ming-Wei Chang,Lev Ratinow, Dan Roth.
Satisfiability Modulo Theories (An introduction)
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Tutorial at ICCV (Barcelona, Spain, November 2011)
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
Linear Obfuscation to Combat Symbolic Execution Zhi Wang 1, Jiang Ming 2, Chunfu Jia 1 and Debin Gao 3 1 Nankai University 2 Pennsylvania State University.
EMIS 8373: Integer Programming Valid Inequalities updated 4April 2011.
Computational problems, algorithms, runtime, hardness
1 Introduction to Linear and Integer Programming Lecture 9: Feb 14.
Mathematical Modeling and Optimization: Summary of “Big Ideas”
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
6/20/2015List Decoding Of RS Codes 1 Barak Pinhas ECC Seminar Tel-Aviv University.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
1 Lecture 4 Maximal Flow Problems Set Covering Problems.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Graphical models for part of speech tagging
Transformation of Timed Automata into Mixed Integer Linear Programs Sebastian Panek.
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
INFERENCE Shalmoli Gupta, Yuzhe Wang. Outline Introduction Dual Decomposition Incremental ILP Amortizing Inference.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
1 Chapter 4: Integer and Mixed-Integer Linear Programming Problems 4.1 Introduction to Integer and Mixed-Integer Linear Programming 4.2 Solving Integer.
Honors Track: Competitive Programming & Problem Solving Optimization Problems Kevin Verbeek.
Fast and accurate energy minimization for static or time-varying Markov Random Fields (MRFs) Nikos Komodakis (Ecole Centrale Paris) Nikos Paragios (Ecole.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
15.053Tuesday, April 9 Branch and Bound Handouts: Lecture Notes.
Object-Oriented Modeling: Static Models. Object-Oriented Modeling Model the system as interacting objects Model the system as interacting objects Match.
Chapter 3 Algorithms Complexity Analysis Search and Flow Decomposition Algorithms.
Supertagging CMSC Natural Language Processing January 31, 2006.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla
RECURRENCE Sequence Recursively defined sequence
Intersection Cuts for Bilevel Optimization
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *
Chapter 6 Optimization Models with Integer Variables.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.
On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
From Mixed-Integer Linear to Mixed-Integer Bilevel Linear Programming
Part 2 Applications of ILP Formulations in Natural Language Processing
Greedy Technique.
Exact Algorithms for Mixed-Integer Bilevel Linear Programming
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
Matteo Fischetti, University of Padova
Margin-based Decomposed Amortized Inference
Max-margin sequential learning methods
Chapter 6. Large Scale Optimization
Major Design Strategies
Intersection Cuts from Bilinear Disjunctions
Major Design Strategies
Chapter 6. Large Scale Optimization
Discrete Optimization
Presentation transcript:

A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University

Seminar – Friday, April 1 Speaker: Monty Hall Location: Auditorium #1 “Let’s Make a Dilemma” Monty Hall will host a discussion of his famous paradox. We know what the labels should look like! Agreement: –Named Entity Recognition (Finkel et al., ACL 2005) –Seminar announcements (Finkel et al., ACL 2005) Label structure: –Bibliography parsing (Peng & McCallum, HLT- NAACL 2004) –Semantic Role Labeling (Roth & Yih, ICML 2005) *One role per string*One string per role

Sequence modeling quality Decoding runtime Local models Global constraints Finite-state constraint relaxation Exploit the quality of the local models!

Semantic Role Labeling Label each argument to a verb –Six core argument types (A0-A5) CoNLL-2004 shared task –Penn Treebank section 20 –4305 propositions Follow Roth & Yih (ICML 2005) A1A4A3 Sales for the quarter rose to $ 1.63 billion from $ 1.47 billion. A1 A1 A1 O O A4 O A3 O

Encoding constraints as finite-state automata

Roth & Yih’s constraints as FSAs [^A0]*A0*[^A0]* [^A1]*A1*[^A1]* Each argument type (A0, A1,...) can label at most one sub-sequence of the input. NO DUPLICATE ARGUMENTS

Roth & Yih’s constraints as FSAs O*[^O]?* The label sequence must contain at least one instance that is not O. AT LEAST ONE ARGUMENT Regular expressions on any sequences: grep for sequence models

Roth & Yih’s constraints as FSAs Only allow argument types that are compatible with the proposition’s verb. DISALLOW ARGUMENTS

Roth & Yih’s constraints as FSAs The proposition’s verb must be labeled O. KNOWN VERB POSITION

Roth & Yih’s constraints as FSAs Certain sub-sequences must receive a single label. ARGUMENT CANDIDATES Any constraints on bounded-length sequences

Roth & Yih’s local model as a lattice “Soft constraints” or “features” Unigram model!

A brute-force FSA decoder Local model IntersectDecode Sentence Labeling Global constraints

NO DUPLICATE A0

NO DUPLICATE A0, A1

NO DUPLICATE A0, A1, A2

NO DUPLICATE ARGUMENTS Any approach would blow up in worst case! Satisfying global constraints is NP-hard.

Roth & Yih (ICML 2005): Express path decoding and global constraints as an integer linear program (ILP). Apply ILP solver: –Relax ILP to (real-valued) LP. –Apply polynomial-time LP solver. –Branch and bound to find optimal integer solution. Handling an NP-hard problem

The ILP solver doesn’t know it’s labeling sequences Path constraints: State 0: outflow ≤ 1; State 3: inflow ≤ 1 States 1 & 2: outflow = inflow At least one argument: Arcs labeled O: flow ≤ 1

Maybe we can fix the brute-force decoder?

Local model usually violated no constraints

Most constraints were rarely violated

Finite-state constraint relaxation Local models already capture much structure. Relax the constraints instead! Find best path using linear decoding algorithm. Apply only those global constraints that path violates.

Brute-force algorithm Local model IntersectDecode Sentence Labeling Global constraints

Constraint relaxation algorithm Test Violated constraints yes no C1C1 C2C2 C3C3 Local model IntersectDecode Sentence Labeling Global constraints Never intersected! Optimal!

Finite-state constraint relaxation is faster than the ILP solver State-of-the-art implementations: –Xpress-MP for ILP, –FSA (Kanthak & Ney, ACL 2004) for constraint relaxation. Why?

No sentences required more than a few iterations Many take one iteration even though two constraints were violated.

Buy one, get one free A1A4A3A1 Sales for the quarter rose to $ 1.63 billion from $ 1.47 billion.

Lattices remained small Arcs at each iteration for examples that required 5 intersectionsArcs in brute force lattice for examples that required 5 intersections

Take-home message Global constraints aren’t usually doing that much work for you: –Typical examples violate only a small number using local models. They shouldn’t have to slow you down so much, even though they’re NP-hard in the worst case: –Figure out dynamically which ones need to be applied.

Future work General soft constraints (We discuss binary soft constraints in the paper.) Choose order to test and apply constraints, e.g. by reinforcement learning. k-best decoding

Thanks to Scott Yih for providing both data and runtime, and to Stephan Kanthak for FSA.