A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University
Seminar – Friday, April 1 Speaker: Monty Hall Location: Auditorium #1 “Let’s Make a Dilemma” Monty Hall will host a discussion of his famous paradox. We know what the labels should look like! Agreement: –Named Entity Recognition (Finkel et al., ACL 2005) –Seminar announcements (Finkel et al., ACL 2005) Label structure: –Bibliography parsing (Peng & McCallum, HLT- NAACL 2004) –Semantic Role Labeling (Roth & Yih, ICML 2005) *One role per string*One string per role
Sequence modeling quality Decoding runtime Local models Global constraints Finite-state constraint relaxation Exploit the quality of the local models!
Semantic Role Labeling Label each argument to a verb –Six core argument types (A0-A5) CoNLL-2004 shared task –Penn Treebank section 20 –4305 propositions Follow Roth & Yih (ICML 2005) A1A4A3 Sales for the quarter rose to $ 1.63 billion from $ 1.47 billion. A1 A1 A1 O O A4 O A3 O
Encoding constraints as finite-state automata
Roth & Yih’s constraints as FSAs [^A0]*A0*[^A0]* [^A1]*A1*[^A1]* Each argument type (A0, A1,...) can label at most one sub-sequence of the input. NO DUPLICATE ARGUMENTS
Roth & Yih’s constraints as FSAs O*[^O]?* The label sequence must contain at least one instance that is not O. AT LEAST ONE ARGUMENT Regular expressions on any sequences: grep for sequence models
Roth & Yih’s constraints as FSAs Only allow argument types that are compatible with the proposition’s verb. DISALLOW ARGUMENTS
Roth & Yih’s constraints as FSAs The proposition’s verb must be labeled O. KNOWN VERB POSITION
Roth & Yih’s constraints as FSAs Certain sub-sequences must receive a single label. ARGUMENT CANDIDATES Any constraints on bounded-length sequences
Roth & Yih’s local model as a lattice “Soft constraints” or “features” Unigram model!
A brute-force FSA decoder Local model IntersectDecode Sentence Labeling Global constraints
NO DUPLICATE A0
NO DUPLICATE A0, A1
NO DUPLICATE A0, A1, A2
NO DUPLICATE ARGUMENTS Any approach would blow up in worst case! Satisfying global constraints is NP-hard.
Roth & Yih (ICML 2005): Express path decoding and global constraints as an integer linear program (ILP). Apply ILP solver: –Relax ILP to (real-valued) LP. –Apply polynomial-time LP solver. –Branch and bound to find optimal integer solution. Handling an NP-hard problem
The ILP solver doesn’t know it’s labeling sequences Path constraints: State 0: outflow ≤ 1; State 3: inflow ≤ 1 States 1 & 2: outflow = inflow At least one argument: Arcs labeled O: flow ≤ 1
Maybe we can fix the brute-force decoder?
Local model usually violated no constraints
Most constraints were rarely violated
Finite-state constraint relaxation Local models already capture much structure. Relax the constraints instead! Find best path using linear decoding algorithm. Apply only those global constraints that path violates.
Brute-force algorithm Local model IntersectDecode Sentence Labeling Global constraints
Constraint relaxation algorithm Test Violated constraints yes no C1C1 C2C2 C3C3 Local model IntersectDecode Sentence Labeling Global constraints Never intersected! Optimal!
Finite-state constraint relaxation is faster than the ILP solver State-of-the-art implementations: –Xpress-MP for ILP, –FSA (Kanthak & Ney, ACL 2004) for constraint relaxation. Why?
No sentences required more than a few iterations Many take one iteration even though two constraints were violated.
Buy one, get one free A1A4A3A1 Sales for the quarter rose to $ 1.63 billion from $ 1.47 billion.
Lattices remained small Arcs at each iteration for examples that required 5 intersectionsArcs in brute force lattice for examples that required 5 intersections
Take-home message Global constraints aren’t usually doing that much work for you: –Typical examples violate only a small number using local models. They shouldn’t have to slow you down so much, even though they’re NP-hard in the worst case: –Figure out dynamically which ones need to be applied.
Future work General soft constraints (We discuss binary soft constraints in the paper.) Choose order to test and apply constraints, e.g. by reinforcement learning. k-best decoding
Thanks to Scott Yih for providing both data and runtime, and to Stephan Kanthak for FSA.