Neural Joint Model for Transition-based Chinese Syntactic Analysis

Slides:

Advertisements

Similar presentations

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:

Advertisements

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.

Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi

ORec : An Opinion-Based Point-of-Interest Recommendation Framework

Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C

Customized of Social Media Contents using Focused Topic Hierarchy

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.

RNNs: An example applied to the prediction task

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

Raymond J. Mooney University of Texas at Austin

Deep Learning for Bacteria Event Identification

Artificial Neural Networks

WAVENET: A GENERATIVE MODEL FOR RAW AUDIO

Google SyntaxNet “Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework.

SCTB: A Chinese Treebank in Scientific Domain

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

ECE 5424: Introduction to Machine Learning

Recursive Neural Networks

Neural Machine Translation by Jointly Learning to Align and Translate

A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis

Natural Language Processing (NLP)

Natural Language Processing of Knee MRI Reports

Are End-to-end Systems the Ultimate Solutions for NLP?

Background & Overview Proposed Model Experimental Results Future Work

Speaker: Jim-an tsai advisor: professor jia-lin koh

RNNs: Going Beyond the SRN in Language Prediction

Convolutional Neural Networks for sentence classification

A Large Scale Prediction Engine for App Install Clicks and Conversions

Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong

Final Presentation: Neural Network Doc Summarization

Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

Speaker: Jim-an tsai advisor: professor jia-lin koh

Identifying Decision Makers from Professional Social Networks

Neural Networks Geoff Hulten.

Other Classification Models: Recurrent Neural Network (RNN)

Advisor: Jia-Ling Koh Presenter: Yin-Hsiang Liao Source: ACL 2018

Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2

Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen

Unsupervised Pretraining for Semantic Parsing

RNNs: Going Beyond the SRN in Language Prediction

A Dynamic System Analysis of Simultaneous Recurrent Neural Network

Enriching Taxonomies With Functional Domain Knowledge

Neural networks (1) Traditional multi-layer perceptrons

A connectionist model in action

Natural Language Processing (NLP)

Ali Hakimi Parizi, Paul Cook

HeteroMed: Heterogeneous Information Network for Medical Diagnosis

Clinically Significant Information Extraction from Radiology Reports

A Data-Driven Question Generation Model for Educational Content

David Kauchak CS51A Spring 2019

CSE 291G : Deep Learning for Sequences

Presented by: Anurag Paul

Ask and Answer Questions

Deep learning enhanced Markov State Models (MSMs)

Heterogeneous Graph Attention Network

Attention Is All You Need

Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS

Bidirectional LSTM-CRF Models for Sequence Tagging

LHC beam mode classification

Natural Language Processing (NLP)

Presentation transcript:

Neural Joint Model for Transition-based Chinese Syntactic Analysis Authors: Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi Advisor: Jia-Ling Koh Speaker: Yin-Hsiang Liao Source: ACL 2017 Date: 2018/06/26

Outline Introduction Model Experiments Conclusion

Introduction Goal: Present a neural network-based joint model for Chinese word segmentation, POS tagging and dependency parsing.

Introduction Syntactic Analysis: Error propagation: Parsing: Part-of-speech, dependency Segmentation ( the nature of Chinese) Error propagation: Wrong segmentations lead mistake in parsing in the pipeline models.

Introduction Joint model: A joint learning framework combining missions (SegTag, SegTagDep) A traditional solution to error propagation problems. (The improvement may be due to the extra useful information from the structure)

Introduction Transition-based model: Data Structures: A queue and a stack Statue and Action

Outline Introduction Model Experiments Conclusion

Model POS tags

is it in the embeddings?* Model Preprocessing: Word / Character embeddings (Pretrained) Embeddings of Character Strings Inspiration: incomplete tokens may provide information. Ex: 台北車站 vs. 北車 → no such a word in general lexicons Word/characters embeddings Embeddings of characters yes no is it in the embeddings?* Character Strings

Model Feed-forward Neural Network Larger than previous works Loss function (greedy): Optimizer: Adagrad (quick convergence) 8000d 8000d 8200d* https://vimeo.com/234944928

Model Feature list: S0w: the top word of stack S1w: the second top word of stack L0: the leftmost child L1: the second leftmost child R0: the rightmost child R1: the second rightmost child q0 = the last shifted word q1 = the second shifted word q0e = the end char of q0 q0f1 = the sub-word of q0 (q0[1:]) q0f2 = the sub-word of q0 (q0[2:]) q0b1 = strings across q0 and buff (q0+b[:1]) q0b2 = strings across q0 and buff (q0+b[:2])

Model Word-based Char-based L0 L1 R1 R0 B0-2 L R B0-3 L0 L1 R1 R0 L R

Model Word-based Char-based q0 q0b1 q0b2 … 1 2 q0b3 e q0f1 q0f2

is it in the embeddings?* is it in the embeddings?* Text Embedding (200d) POS Embedding (20d) Length of q0 Embedding (20d) 40 text 9 pos 1 Word/characters embeddings Word/characters embeddings Word/characters embeddings Embeddings of characters Embeddings of characters Embeddings of characters yes no yes no yes no is it in the embeddings?* is it in the embeddings?* is it in the embeddings?* Character Strings Character Strings Character Strings

Model Beam Search A perceptron layer. Inputs from hidden layers and the greedy output layer

Model Structured learning model proposed by Weiss et al. and Andor et al.

is golden path in the beam? Model My guess yes no Backprop to embeddings Only backprop to hidden layers is golden path in the beam?

Model Bi-LSTM Can see whole sentence. Light feature engineering.

Model 技術有了新 Is in embeddings?

Outline Introduction Model Experiments Conclusion

Experiments Settings: Penn Chinese Treebank 5.1 (CTB-5) and 7 (CTB-7). Three FFNN models: SegTag SegTagDep Dep (similar to Weiss et al. and Andor et al.) Two Bi-LSTM models: Bi-LSTM 4 feat Bi-LSTM 8 feat

Experiments Seg POS SegTag does not use the children and children-of-children of stack word as feature.

Experiments Seg POS Dep

Experiments Bi-LSTMs Not better than FFNN

Outline Introduction Model Experiments Conclusion

Conclusion Two joint parsing models are proposed by the feed-forward and bi- LSTM neural networks. The character string embeddings help to capture the similarities of incomplete tokens. SegTagDep model get better scores than previous joint models in Seg and POS. SegTag + Dep achieves state-of-the-art dependency parsing score.

Reference Daniel Andor et al. 2016. Globally normalized transition-based neural networks. Jun Hatori et al. 2012. Incremental joint approach to word segmentation, pos tagging, and dependency parsing in Chinese. David Weiss et al. 2015. Structured training for neural network transition-based parsing.