PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University

Slides:

Advertisements

Similar presentations

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Advertisements

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Unsupervised Dependency Parsing David Mareček Institute of Formal and Applied Linguistics Charles University in Prague Doctoral thesis defense September.

Introduction to Conditional Random Fields John Osborne Sept 4, 2009.

Dependency Parsing Some slides are based on:

John Lafferty, Andrew McCallum, Fernando Pereira

28 June 2007EMNLP-CoNLL1 Probabilistic Models of Nonprojective Dependency Trees David A. Smith Center for Language and Speech Processing Computer Science.

Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

Deep Learning and Neural Nets Spring 2015

Reranking Parse Trees with a SRL system Charles Sutton and Andrew McCallum University of Massachusetts June 30, 2005.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network Kristina Toutanova, Dan Klein, Christopher Manning, Yoram Singer Stanford University.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Online Learning Algorithms

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Zhenghua Li, Jiayuan Chao, Min Zhang, Wenliang Chen {zhli13, minzhang, Soochow University, China Coupled Sequence.

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Graphical models for part of speech tagging

1 CS546: Machine Learning and Natural Language Multi-Class and Structured Prediction Problems Slides from Taskar and Klein are used in this lecture TexPoint.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition The 50th Annual Meeting of the Association for Computational Linguistics (ACL)

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,

The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

Relation Alignment for Textual Entailment Recognition Cognitive Computation Group, University of Illinois Experimental ResultsTitle Mark Sammons, V.G.Vinod.

A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology.

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

Approximation-aware Dependency Parsing by Belief Propagation September 19, 2015 TACL at EMNLP 1 Matt Gormley Mark Dredze Jason Eisner.

Haitham Elmarakeby.  Speech recognition

Deep Visual Analogy-Making

John Lafferty Andrew McCallum Fernando Pereira

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Dependency Parsing Parsing Algorithms Peng.Huang

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation EMNLP’14 paper by Kyunghyun Cho, et al.

Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.

R-NET: Machine Reading Comprehension With Self-Matching Networks

Neural Machine Translation

Deep Learning for Bacteria Event Identification

Tools for Natural Language Processing Applications

Parsing in Multiple Languages

Combining CNN with RNN for scene labeling (segmentation)

Bidirectional CRF for NER

mengye ren, ryan kiros, richard s. zemel

Deep Learning based Machine Translation

Recursive Structure.

محمدصادق رسولی rasooli.ms{#a#t#}gmail.com

A Fast Unified Model for Parsing and Sentence Understanding

Statistical Machine Translation Papers from COLING 2004

Natural Language to SQL(nl2sql)

Dependency Grammar & Stanford Dependencies

Neural Joint Model for Transition-based Chinese Syntactic Analysis

Bidirectional LSTM-CRF Models for Sequence Tagging

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University

Outline  Background: Dependency Parsing  Training Criteria: Probabilistic Criterion  Neural Parsing: Basic, Convolution, Ensemble  Experiments and Results

Background  Dependency parsing aims to predict a dependency tree, in which all the edges connect head-modifier pairs.  In graph-based methods, a dependency tree is factored into sub-trees, The score for a dependency tree (T) is defined as the sum of the scores of all its factors (p).

Decoding Algorithms  Chart-based dynamic programming algorithms (for projective parsing).  Explored extensively in previous work (Eisner, 1996; McDonald et al., 2005; McDonald and Pereira, 2006; Koo and Collins, 2010; Ma and Zhao, 2012).

Decoding Algorithm Third-order Grand-Sibling Model (the most complex one which we explore).

Highlights  Probabilistic criteria for neural network training.  Sentence-level representation learned from a convolutional layer.  Ensemble models with a stacked linear output layer.

Probabilistic Model  As in log-linear models like Conditional Random Field (CRF) (Lafferty et al., 2001), we can treat it in a probabilistic way.  It is not new, and has been explored in many previous work.

Training Criteria

Again DP  How to calculate the marginal probability?  Also solved by dynamic programming, a variant of the well-known inside-outside algorithm, (Paskin, 2001; Ma and Zhao, 2015) provide the corresponding algorithms for dependency parsing.

MLE and Max-Margin  probabilistic criterion can be viewed as a soft version of the max-margin criterion.  Gimpel and Smith (2010) provide a good review of several training criteria

Neural Parsing: Recent Work  (Durrett and Klein, 2015): Neural-CRF parsing (phrase-based parsing).  (Pei et al., 2015): Graph Parsing with feed-forward NN.  (Weiss et al., 2015): Transition Parsing with structured training.  (Dyer et al., 2015): LSTM Transition Parsing.  And many others …

Neural Model: Basic  A simple feed-forward neural network with a window-based approach.

Neural Model: Convolutional Model  To encode sentence-level information and obtain sentence embeddings, a convolutional layer of the whole sentence followed by a max-pooling layer is adopted.  The scheme is to use the distance embedding for the whole convolution window as the position feature.

Neural Model: Convolutional Model

Neural Model: Ensemble Models  The ensemble method of different order models for scoring.  Scheme 1: Simple adding

Neural Model: Ensemble Models  Scheme 2: Stacking Another Layer

Experiments  English Penn Treebank (PTB) (Three converters) 1. Penn2Malt and the head rules of Yamada and Matsumoto (2003), noted as PTB-Y&M 2. Stanford parser v3.3.0 with Stanford Basic Dependencies (De Marneffe et al., 2006), noted as PTB-SD 3. LTH Constituent-to-Dependency Conversion Tool (Johansson and Nugues, 2007), noted as PTB-LTH  Chinese Penn Treebank (CTB) using the Penn2Malt converter.

Model Analysis  To verify the effectiveness of the proposed methods and only the PTB-SD development set will be used in these experiments.

Model Analysis: on Dependency Length

Main Results

References  Marie-Catherine De Marneffe, Bill MacCartney, Christopher D Manning, et al Generating typed dependency parses from phrase structure parses. In Proceedings of LREC, volume 6, pages 449–454.  Greg Durrett and Dan Klein Neural crf parsing. In Proceedings of ACL, pages 302–312, Beijing, China, July.  Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith Transitionbased dependency parsing with stack long shortterm memory. In Proceedings of ACL, pages 334– 343, Beijing, China, July.  Jason M. Eisner Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th International Conference on Computational Linguistics, pages 340–345, Copenhagen, August.  Kevin Gimpel and Noah A. Smith Softmaxmargin crfs: Training log-linear models with cost functions. In Proceedings of NAACL, pages 733– 736, Los  Richard Johansson and Pierre Nugues Extended constituent-to-dependency conversion for english. In 16th Nordic Conference of Computational Linguistics, pages 105–112. University of Tartu. Angeles, California, June. Implementation for reference:

References  Terry Koo and Michael Collins Efficient thirdorder dependency parsers. In Proceedings of ACL, pages 1–11, Uppsala, Sweden, July.  John Lafferty, Andrew McCallum, and Fernando CN Pereira Conditional random fields: Probabilistic models for segmenting and labeling sequence data.  Xuezhe Ma and Hai Zhao Fourth-order dependency parsing. In Proceedings of COLING, pages 785–796, Mumbai, India, December.  Xuezhe Ma and Hai Zhao Probabilistic models for high-order projective dependency parsing. arXiv preprint arXiv:  Ryan McDonald, Koby Crammer, and Fernando Pereira Online large-margin training of dependency parsers. In Proceedings ofACL, pages 91– 98, Ann Arbor, Michigan, June.  MarkA Paskin Cubic-time parsing and learning algorithms for grammatical bigram models. Technical report.  Wenzhe Pei, Tao Ge, and Baobao Chang An effective neural network model for graph-based dependency parsing. In Proceedings of ACL, pages 313–322, Beijing, China, July.  David Weiss, Chris Alberti, Michael Collins, and Slav Petrov Structured training for neural network transition-based parsing. In Proceedings of ACL, pages 323–333, Beijing, China, July.

Thanks … Q & A