Neural Joint Model for Transition-based Chinese Syntactic Analysis

Slides:



Advertisements
Similar presentations
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Advertisements

Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, Bing Qin
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
ORec : An Opinion-Based Point-of-Interest Recommendation Framework
Jonatas Wehrmann, Willian Becker, Henry E. L. Cagnini, and Rodrigo C
Customized of Social Media Contents using Focused Topic Hierarchy
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 2, 2017.
RNNs: An example applied to the prediction task
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.
Raymond J. Mooney University of Texas at Austin
Deep Learning for Bacteria Event Identification
Artificial Neural Networks
WAVENET: A GENERATIVE MODEL FOR RAW AUDIO
Google SyntaxNet “Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework.
SCTB: A Chinese Treebank in Scientific Domain
Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel
ECE 5424: Introduction to Machine Learning
Recursive Neural Networks
Neural Machine Translation by Jointly Learning to Align and Translate
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Natural Language Processing (NLP)
Natural Language Processing of Knee MRI Reports
Are End-to-end Systems the Ultimate Solutions for NLP?
Background & Overview Proposed Model Experimental Results Future Work
Speaker: Jim-an tsai advisor: professor jia-lin koh
RNNs: Going Beyond the SRN in Language Prediction
Convolutional Neural Networks for sentence classification
A Large Scale Prediction Engine for App Install Clicks and Conversions
Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong
Final Presentation: Neural Network Doc Summarization
Learning linguistic structure with simple and more complex recurrent neural networks Psychology February 8, 2018.
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Speaker: Jim-an tsai advisor: professor jia-lin koh
Identifying Decision Makers from Professional Social Networks
Neural Networks Geoff Hulten.
Other Classification Models: Recurrent Neural Network (RNN)
Advisor: Jia-Ling Koh Presenter: Yin-Hsiang Liao Source: ACL 2018
Yi Zhao1, Yanyan Shen*1, Yanmin Zhu1, Junjie Yao2
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Unsupervised Pretraining for Semantic Parsing
RNNs: Going Beyond the SRN in Language Prediction
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Enriching Taxonomies With Functional Domain Knowledge
Neural networks (1) Traditional multi-layer perceptrons
A connectionist model in action
Natural Language Processing (NLP)
Ali Hakimi Parizi, Paul Cook
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
Clinically Significant Information Extraction from Radiology Reports
A Data-Driven Question Generation Model for Educational Content
David Kauchak CS51A Spring 2019
CSE 291G : Deep Learning for Sequences
Presented by: Anurag Paul
Ask and Answer Questions
Deep learning enhanced Markov State Models (MSMs)
Heterogeneous Graph Attention Network
Attention Is All You Need
Text-to-speech (TTS) Traditional approaches (before 2016) Neural TTS
Bidirectional LSTM-CRF Models for Sequence Tagging
LHC beam mode classification
Natural Language Processing (NLP)
Presentation transcript:

Neural Joint Model for Transition-based Chinese Syntactic Analysis Authors: Shuhei Kurita, Daisuke Kawahara, Sadao Kurohashi Advisor: Jia-Ling Koh Speaker: Yin-Hsiang Liao Source: ACL 2017 Date: 2018/06/26

Outline Introduction Model Experiments Conclusion

Introduction Goal: Present a neural network-based joint model for Chinese word segmentation, POS tagging and dependency parsing.

Introduction Syntactic Analysis: Error propagation: Parsing: Part-of-speech, dependency Segmentation ( the nature of Chinese) Error propagation: Wrong segmentations lead mistake in parsing in the pipeline models.

Introduction Joint model: A joint learning framework combining missions (SegTag, SegTagDep) A traditional solution to error propagation problems. (The improvement may be due to the extra useful information from the structure)

Introduction Transition-based model: Data Structures: A queue and a stack Statue and Action

Outline Introduction Model Experiments Conclusion

Model POS tags

is it in the embeddings?* Model Preprocessing: Word / Character embeddings (Pretrained) Embeddings of Character Strings Inspiration: incomplete tokens may provide information. Ex: 台北車站 vs. 北車 → no such a word in general lexicons Word/characters embeddings Embeddings of characters yes no is it in the embeddings?* Character Strings

Model Feed-forward Neural Network Larger than previous works Loss function (greedy): Optimizer: Adagrad (quick convergence) 8000d 8000d 8200d* https://vimeo.com/234944928

Model Feature list: S0w: the top word of stack S1w: the second top word of stack L0: the leftmost child L1: the second leftmost child R0: the rightmost child R1: the second rightmost child q0 = the last shifted word q1 = the second shifted word q0e = the end char of q0 q0f1 = the sub-word of q0 (q0[1:]) q0f2 = the sub-word of q0 (q0[2:]) q0b1 = strings across q0 and buff (q0+b[:1]) q0b2 = strings across q0 and buff (q0+b[:2])

Model Word-based Char-based L0 L1 R1 R0 B0-2 L R B0-3 L0 L1 R1 R0 L R

Model Word-based Char-based q0 q0b1 q0b2 … 1 2 q0b3 e q0f1 q0f2

is it in the embeddings?* is it in the embeddings?* Text Embedding (200d) POS Embedding (20d) Length of q0 Embedding (20d) 40 text 9 pos 1 Word/characters embeddings Word/characters embeddings Word/characters embeddings Embeddings of characters Embeddings of characters Embeddings of characters yes no yes no yes no is it in the embeddings?* is it in the embeddings?* is it in the embeddings?* Character Strings Character Strings Character Strings

Model Beam Search A perceptron layer. Inputs from hidden layers and the greedy output layer

Model Structured learning model proposed by Weiss et al. and Andor et al.

is golden path in the beam? Model My guess yes no Backprop to embeddings Only backprop to hidden layers is golden path in the beam?

Model Bi-LSTM Can see whole sentence. Light feature engineering.

Model 技 術 有 了 新 Is in embeddings?

Outline Introduction Model Experiments Conclusion

Experiments Settings: Penn Chinese Treebank 5.1 (CTB-5) and 7 (CTB-7). Three FFNN models: SegTag SegTagDep Dep (similar to Weiss et al. and Andor et al.) Two Bi-LSTM models: Bi-LSTM 4 feat Bi-LSTM 8 feat

Experiments Seg POS SegTag does not use the children and children-of-children of stack word as feature.

Experiments Seg POS Dep

Experiments Bi-LSTMs Not better than FFNN

Outline Introduction Model Experiments Conclusion

Conclusion Two joint parsing models are proposed by the feed-forward and bi- LSTM neural networks. The character string embeddings help to capture the similarities of incomplete tokens. SegTagDep model get better scores than previous joint models in Seg and POS. SegTag + Dep achieves state-of-the-art dependency parsing score.

Reference Daniel Andor et al. 2016. Globally normalized transition-based neural networks. Jun Hatori et al. 2012. Incremental joint approach to word segmentation, pos tagging, and dependency parsing in Chinese. David Weiss et al. 2015. Structured training for neural network transition-based parsing.