Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

1 Building a Dictionary of Image Fragments Zicheng Liao Ali Farhadi Yang Wang Ian Endres David Forsyth Department of Computer Science, University of Illinois.

1/13 Parsing III Probabilistic Parsing and Conclusions.

Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.

Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.

Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.

Interactive Optimization by Genetic Algorithms Cases: Lighting Patterns and Image Enhancement Janne Koljonen Electrical Engineering and Automation, University.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.

The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

A Face processing system Based on Committee Machine: The Approach and Experimental Results Presented by: Harvest Jang 29 Jan 2003.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

A Simulated-annealing-based Approach for Simultaneous Parameter Optimization and Feature Selection of Back-Propagation Networks (BPN) Shih-Wei Lin, Tsung-Yuan.

Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.

Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.

PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University

October 20-23rd, 2015 Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features Joshua Saxe, Dr. Konstantin Berlin Invincea.

Language Identification and Part-of-Speech Tagging

IEEE BIBM 2016 Xu Min, Wanwen Zeng, Ning Chen, Ting Chen*, Rui Jiang*

Raymond J. Mooney University of Texas at Austin

Deep Learning for Bacteria Event Identification

A Deep Memory Network for Chinese Zero Pronoun Resolution

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Sofus A. Macskassy Fetch Technologies

Data Mining, Neural Network and Genetic Programming

Chilimbi, et al. (2014) Microsoft Research

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Saliency-guided Video Classification via Adaptively weighted learning

Inference as a Feedforward Network

Chinese Academy of Sciences, Beijing, China

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Are End-to-end Systems the Ultimate Solutions for NLP?

Deep learning and applications to Natural language processing

Efficient Estimation of Word Representation in Vector Space

Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui

Word embeddings based mapping

Word embeddings based mapping

Comparative Evaluation of SOM-Ward Clustering and Decision Tree for Conducting Customer-Portfolio Analysis By 1Oloyede Ayodele, 2Ogunlana Deborah, 1Adeyemi.

Unsupervised Pretraining for Semantic Parsing

Part of Speech Tagging with Neural Architecture Search

Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Reseeding-based Test Set Embedding with Reduced Test Sequences

Word embeddings (continued)

Online Analytical Processing Stream Data: Is It Feasible?

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

Attention for translation

Neural Joint Model for Transition-based Chinese Syntactic Analysis

Presented by: Anurag Paul

Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.

Sequence-to-Sequence Models

Bidirectional LSTM-CRF Models for Sequence Tagging

Neural Machine Translation by Jointly Learning to Align and Translate

Visual Grounding.

Presentation transcript:

Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University

Outline Introduction Model Details Experiments Conclusion

Introduction

Graph-based models are of the most successful solutions to dependency parsing Given a sentence x, graph-based models formulate the parsing process as a searching problem ：

Introduction

The most common choice for Score Function: Problems: – Heavily rely on feature engineering and feature design requires domain expertise. Moreover, millions of hand- crafted features heavily slow down parsing speed – Conventional first-order model limits the scope of feature selection. High-order features are proven to be useful in recovering long-distance dependencies. However, incorporating high-order features is usually done at high cost in terms of efficiency.

Introduction Pei et al. (2015) propose a feed-forward neural network to score subgraph Advantages: – Learn feature combinations automatically – Exploit sentence segment information by averaging Problem: – Require large feature set – Context window limits their ability in detecting long- distance information – Still rely on high-order factorization strategy to further improve the accuracy

Introduction We propose an LSTM-based neural network model for graph-based parsing Advantages: – Capture long range contextual information and exhibit improved accuracy in recovering long distance dependencies – Reduce the number of features to a minimum level – An LSTM-based sentence segment embedding method LSTM- Minus is utilized to effectively learn sentence-level information – Our model is a first-order model, the computational cost remains at the lowest level among graph-based models

Model Details

Architecture of our model input token Direction-specific Transformation

Model Details Segment embeddings Compared with averaging – LSTM-minus enables our model to learn segment embeddings from information both outside and inside the segments and thus enhances our model’s ability to access to sentence-level information

Model Details Direction-specific Transformation – The direction of edge is very important in dependency parsing – This information is bound with model parameters

Model Details Learning Feature Combinations – Activation function: tanh-cube – Intuitively, the cube term in each hidden unit directly models feature combinations in a multiplicative way

Model Details Features in our model

Experiments

Dataset  English Penn TreeBank (PTB) – Penn2Malt for Penn-YM – Stanford parser for Penn-SD – Use Stanford POS Tagger for POS-tagging  Chinese Penn Treebank – Gold segmentation and POS tags Two Models – Basic model – Basic model + Segment features

Experiments Compare with previous graph-based Models

Experiments

Compare with previous state-of-the-art Models

Experiments Model performance of different way to learn segment embeddings.

Experiments Advantage in recovering long distant dependencies – Using LSTM shows the same effect as high-order factorization strategy

Conclusion

We propose an LSTM-based neural network model for graph-based dependency parsing and an LSTM- based sentence segment embedding method Our model makes parsing decisions on a global perspective with first-order factorization, avoiding the expensive computational cost introduced by high- order factorization Our model minimize the effort in feature engineering

Recent work A better word representation for Chinese

Recent work Experiment Result

Thank you ！