Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University
Outline Introduction Model Details Experiments Conclusion
Introduction
Graph-based models are of the most successful solutions to dependency parsing Given a sentence x, graph-based models formulate the parsing process as a searching problem :
Introduction
The most common choice for Score Function: Problems: – Heavily rely on feature engineering and feature design requires domain expertise. Moreover, millions of hand- crafted features heavily slow down parsing speed – Conventional first-order model limits the scope of feature selection. High-order features are proven to be useful in recovering long-distance dependencies. However, incorporating high-order features is usually done at high cost in terms of efficiency.
Introduction Pei et al. (2015) propose a feed-forward neural network to score subgraph Advantages: – Learn feature combinations automatically – Exploit sentence segment information by averaging Problem: – Require large feature set – Context window limits their ability in detecting long- distance information – Still rely on high-order factorization strategy to further improve the accuracy
Introduction We propose an LSTM-based neural network model for graph-based parsing Advantages: – Capture long range contextual information and exhibit improved accuracy in recovering long distance dependencies – Reduce the number of features to a minimum level – An LSTM-based sentence segment embedding method LSTM- Minus is utilized to effectively learn sentence-level information – Our model is a first-order model, the computational cost remains at the lowest level among graph-based models
Model Details
Architecture of our model input token Direction-specific Transformation
Model Details Segment embeddings Compared with averaging – LSTM-minus enables our model to learn segment embeddings from information both outside and inside the segments and thus enhances our model’s ability to access to sentence-level information
Model Details Direction-specific Transformation – The direction of edge is very important in dependency parsing – This information is bound with model parameters
Model Details Learning Feature Combinations – Activation function: tanh-cube – Intuitively, the cube term in each hidden unit directly models feature combinations in a multiplicative way
Model Details Features in our model
Experiments
Dataset English Penn TreeBank (PTB) – Penn2Malt for Penn-YM – Stanford parser for Penn-SD – Use Stanford POS Tagger for POS-tagging Chinese Penn Treebank – Gold segmentation and POS tags Two Models – Basic model – Basic model + Segment features
Experiments Compare with previous graph-based Models
Experiments
Compare with previous state-of-the-art Models
Experiments Model performance of different way to learn segment embeddings.
Experiments Advantage in recovering long distant dependencies – Using LSTM shows the same effect as high-order factorization strategy
Conclusion
We propose an LSTM-based neural network model for graph-based dependency parsing and an LSTM- based sentence segment embedding method Our model makes parsing decisions on a global perspective with first-order factorization, avoiding the expensive computational cost introduced by high- order factorization Our model minimize the effort in feature engineering
Recent work A better word representation for Chinese
Recent work Experiment Result
Thank you !