Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo
Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG P(NN|a/NN, pretty)
Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG
Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG
Implicit assumption Processing state = Primitive probability –Efficient algorithm for searching –Avoid exponential explosion of ambiguities NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG A pretty girl is crying POS tag = processing state = primitive probability
The assumption is right? Ex.) Shallow parsing, NE recognition
The assumption is right? Ex.) Shallow parsing, NE recognition NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B
The assumption is right? Ex.) Shallow parsing, NE recognition –B(Begin), I(Internal), O(Other) tags are introduced to represent multi-word tags NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B
The assumption is right? Ex.) Syntactic parsing
The assumption is right? Ex.) Syntactic parsing What do you want to give? VP S S S P(VP|VPto give)
The assumption is right? Ex.) Syntactic parsing –Non-local dependencies are not represented What do you want to give? VP S S S P(VP|VPto give)
Problem of existing models Processing state Primitive probability
Problem of existing models Processing state Primitive probability How to model the probability of ambiguous structures with more flexibility?
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing Probability of the sequence of multi-word tags NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing Probability of the sequence of multi-word tags NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences
Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing What do you want to give? VP S S S
Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing what do you want to give ARG1 ARG2 MODIFY ARG2
Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing Probability of argument structures what do you want to give ARG1 ARG2 MODIFY ARG2
Problem Complete structures have exponentially many ambiguities NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP Exponentially many sequences
Proposal Feature forest model [Miyao and Tsujii, 2002]
Proposal Feature forest model [Miyao and Tsujii, 2002] Conjunctive node Disjunctive node Features Exponentially many trees are packed Features are assigned to each conjunctive node
Feature forest model Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002]
Feature forest model Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002] When unpacking the forest, the model is equivalent to maximum entropy models [Berger et al., 1996]
Application to parsing Applying a feature forest model to disambiguation of argument structures
Application to parsing Applying a feature forest model to disambiguation of argument structures How to represent exponential ambiguities of argument structures with a feature forest?
Application to parsing Applying a feature forest model to disambiguation of argument structures How to represent exponential ambiguities of argument structures with a feature forest? –Argument structures are not trees, but DAGs (including reentrant structures)
want ARG1 ARG2 I argue1 1 ARG1 1 fact ARG1 want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 fact Packing argument structures An example including reentrant structures She neglected the fact that I wanted to argue.
I Packing argument structures She neglected the fact that I wanted to argue.
want ARG1 ARG2 I argue1 1 ARG1 1 Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated She neglected the fact that I wanted to argue. I
want ARG1 ARG2 I argue1 1 ARG1 1 Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I
want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 ? Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 fact
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I fact argue2 A1 A2 fact I
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I fact argue2 A1 A2 fact I A1 want
Feature forest representation of argument structures fact A1 want fact argue2 A1 A2 want A1 A2 argue1 I A1 She neglected the fact that I wanted to argue. I argue1 I want A1 A2 argue2 I fact I she neglect A1 A2 fact she Conjunctive nodes correspond to argument structures whose arguments are all instantiated
Experiments Grammar: a treebank grammar of HPSG [Miyao and Tsujii, 2003] –Extracted from the Penn Treebank [Marcus et al., 1994] Section Training: Section of the Penn Treebank Test: sentences from Section 22 covered by the grammar Measure: Accuracy of dependencies in argument structures
Experiments Features: the combinations of –Surface strings/POS –Labels of dependencies (ARG1, ARG2, …) –Labels of lexical entries (head noun, transitive, …) –Distance Estimation algorithm: Limited-memory BFGS algorithm [Nocedal, 1980] with MAP estimation [Chen & Rosenfeld, 1999]
Preliminary results Estimation time: 143 min. Accuracy (precision/recall): exactpartial Baseline48.1 / / 56.2 Unigram77.3 / / 81.3 Feature forest85.5 / / 88.2
Conclusion Feature forest models allow the probabilistic modeling of complete structures without exponential explosion The application to syntactic parsing resulted in the high accuracy
Ongoing work Refinement of the grammar and tuning of estimation parameters Development of efficient algorithms for best-first/beam search