PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University
Outline Background: Dependency Parsing Training Criteria: Probabilistic Criterion Neural Parsing: Basic, Convolution, Ensemble Experiments and Results
Background Dependency parsing aims to predict a dependency tree, in which all the edges connect head-modifier pairs. In graph-based methods, a dependency tree is factored into sub-trees, The score for a dependency tree (T) is defined as the sum of the scores of all its factors (p).
Decoding Algorithms Chart-based dynamic programming algorithms (for projective parsing). Explored extensively in previous work (Eisner, 1996; McDonald et al., 2005; McDonald and Pereira, 2006; Koo and Collins, 2010; Ma and Zhao, 2012).
Decoding Algorithm Third-order Grand-Sibling Model (the most complex one which we explore).
Highlights Probabilistic criteria for neural network training. Sentence-level representation learned from a convolutional layer. Ensemble models with a stacked linear output layer.
Probabilistic Model As in log-linear models like Conditional Random Field (CRF) (Lafferty et al., 2001), we can treat it in a probabilistic way. It is not new, and has been explored in many previous work.
Training Criteria
Again DP How to calculate the marginal probability? Also solved by dynamic programming, a variant of the well-known inside-outside algorithm, (Paskin, 2001; Ma and Zhao, 2015) provide the corresponding algorithms for dependency parsing.
MLE and Max-Margin probabilistic criterion can be viewed as a soft version of the max-margin criterion. Gimpel and Smith (2010) provide a good review of several training criteria
Neural Parsing: Recent Work (Durrett and Klein, 2015): Neural-CRF parsing (phrase-based parsing). (Pei et al., 2015): Graph Parsing with feed-forward NN. (Weiss et al., 2015): Transition Parsing with structured training. (Dyer et al., 2015): LSTM Transition Parsing. And many others …
Neural Model: Basic A simple feed-forward neural network with a window-based approach.
Neural Model: Convolutional Model To encode sentence-level information and obtain sentence embeddings, a convolutional layer of the whole sentence followed by a max-pooling layer is adopted. The scheme is to use the distance embedding for the whole convolution window as the position feature.
Neural Model: Convolutional Model
Neural Model: Ensemble Models The ensemble method of different order models for scoring. Scheme 1: Simple adding
Neural Model: Ensemble Models Scheme 2: Stacking Another Layer
Experiments English Penn Treebank (PTB) (Three converters) 1. Penn2Malt and the head rules of Yamada and Matsumoto (2003), noted as PTB-Y&M 2. Stanford parser v3.3.0 with Stanford Basic Dependencies (De Marneffe et al., 2006), noted as PTB-SD 3. LTH Constituent-to-Dependency Conversion Tool (Johansson and Nugues, 2007), noted as PTB-LTH Chinese Penn Treebank (CTB) using the Penn2Malt converter.
Model Analysis To verify the effectiveness of the proposed methods and only the PTB-SD development set will be used in these experiments.
Model Analysis: on Dependency Length
Main Results
References Marie-Catherine De Marneffe, Bill MacCartney, Christopher D Manning, et al Generating typed dependency parses from phrase structure parses. In Proceedings of LREC, volume 6, pages 449–454. Greg Durrett and Dan Klein Neural crf parsing. In Proceedings of ACL, pages 302–312, Beijing, China, July. Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith Transitionbased dependency parsing with stack long shortterm memory. In Proceedings of ACL, pages 334– 343, Beijing, China, July. Jason M. Eisner Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th International Conference on Computational Linguistics, pages 340–345, Copenhagen, August. Kevin Gimpel and Noah A. Smith Softmaxmargin crfs: Training log-linear models with cost functions. In Proceedings of NAACL, pages 733– 736, Los Richard Johansson and Pierre Nugues Extended constituent-to-dependency conversion for english. In 16th Nordic Conference of Computational Linguistics, pages 105–112. University of Tartu. Angeles, California, June. Implementation for reference:
References Terry Koo and Michael Collins Efficient thirdorder dependency parsers. In Proceedings of ACL, pages 1–11, Uppsala, Sweden, July. John Lafferty, Andrew McCallum, and Fernando CN Pereira Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Xuezhe Ma and Hai Zhao Fourth-order dependency parsing. In Proceedings of COLING, pages 785–796, Mumbai, India, December. Xuezhe Ma and Hai Zhao Probabilistic models for high-order projective dependency parsing. arXiv preprint arXiv: Ryan McDonald, Koby Crammer, and Fernando Pereira Online large-margin training of dependency parsers. In Proceedings ofACL, pages 91– 98, Ann Arbor, Michigan, June. MarkA Paskin Cubic-time parsing and learning algorithms for grammatical bigram models. Technical report. Wenzhe Pei, Tao Ge, and Baobao Chang An effective neural network model for graph-based dependency parsing. In Proceedings of ACL, pages 313–322, Beijing, China, July. David Weiss, Chris Alberti, Michael Collins, and Slav Petrov Structured training for neural network transition-based parsing. In Proceedings of ACL, pages 323–333, Beijing, China, July.
Thanks … Q & A