1 A Classifier-based Deterministic Parser for Chinese -- Mengqiu Wang Advisor: Prof. Teruko Mitamura Joint work with Kenji Sagae.

Slides:

Advertisements

Similar presentations

Dependency Parsing Some slides are based on:

Advertisements

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Semantic Role Labeling Abdul-Lateef Yussiff

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

1 A Fast Deterministic Parser for Chinese Mengqiu Wang, Kenji Sagae and Teruko Mitamura Language Technologies Institute School of Computer Science Carnegie.

Portability, Parallelism and Efficiency in Parsing Dan Bikel University of Pennsylvania March 11th, 2002.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Spring /22/071 Beyond PCFGs Chris Brew Ohio State University.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.

Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.

INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

CSA2050 Introduction to Computational Linguistics Parsing I.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.

Natural Language Processing Vasile Rus

Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.

CSC 594 Topics in AI – Natural Language Processing

Raymond J. Mooney University of Texas at Austin

CSC 594 Topics in AI – Natural Language Processing

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Tools for Natural Language Processing Applications

Parsing in Multiple Languages

Simone Paolo Ponzetto University of Heidelberg Massimo Poesio

David Mareček and Zdeněk Žabokrtský

Authorship Attribution Using Probabilistic Context-Free Grammars

Boosting and Additive Trees (2)

Relation Extraction CSCI-GA.2591

Improving a Pipeline Architecture for Shallow Discourse Parsing

Basic machine learning background with Python scikit-learn

Construct State Modification in the Arabic Treebank

Learning to Rank Shubhra kanti karmaker (Santu)

Probabilistic and Lexicalized Parsing

Syntax-Directed Translation

Training Tree Transducers

LING/C SC 581: Advanced Computational Linguistics

Constraining Chart Parsing with Partial Tree Bracketing

COSC 4335: Other Classification Techniques

CSCI 5832 Natural Language Processing

CS246: Information Retrieval

CSCI 5832 Natural Language Processing

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

By Hossein Hematialam and Wlodek Zadrozny Presented by

Presentation transcript:

1 A Classifier-based Deterministic Parser for Chinese -- Mengqiu Wang Advisor: Prof. Teruko Mitamura Joint work with Kenji Sagae

2 Outline of the talk Background Deterministic parsing model Classifier and feature selection POS tagging Experiment and results Discussion and future work Conclusion

3 Background Constituency parsing is one of the most fundamental tasks in NLP. State-of-the-art accuracy in Chinese constituency parsing achieves precision and recall in the lower 80% using automatically generated POS. Most literature in parsing only reports accuracy, efficiency is typically ignored But in reality, parsers are deemed too slow for many NLP applications (e.g. IR)

4 Deterministic Parsing Model Originally developed in [Sagae and Lavie 2005] Input Convention in deterministic parsing assumes input sentences (Chinese in our case) are already segmented and POS tagged 1. Main Data Structure A queue, to store input word-POS pairs A stack, holds partial parse trees Trees are lexicalized. We used the same head-finding rules as [Bikel 2004] The Parser performs binary Shift-Reduce actions based on classifier decisions. Example … 1. We perform our own POS tagging based on SVM

5 Deterministic Parsing Model Cont. Input sentence: 布朗 /NR (Brown/Proper Noun) 访问 /VV (Visits/Verb) 上海 /NR (Shanghai/Proper Noun) Initial parser state: Stack: Θ Queue: NR 布朗 VV 访问 NR 上海

6 Deterministic Parsing Model Cont. Action 1: Shift Parser State: Stack: Queue: NR 布朗 VV 访问 NR 上海

7 Deterministic Parsing Model Cont. Action 2: Reduce the first item on stack to a NP node, with node (NR 布朗 ) as the head Parser State: Stack: Queue: VV 访问 NR 上海 NR 布朗 NP (NR 布朗 )

8 Deterministic Parsing Model Cont. Action 3: Shift Parser State: Stack: Queue: VV 访问 NR 上海 NR 布朗 NP (NR 布朗 )

9 Deterministic Parsing Model Cont. Action 4: Shift Parser State: Stack: Queue: Θ VV 访问 NR 上海 NR 布朗 NP (NR 布朗 )

10 Deterministic Parsing Model Cont. Action 5: Reduce the first item on stack to a NP node, with node (NR 上海 ) as the head Parser State: Stack: Queue: Θ VV 访问 NR 布朗 NP (NR 布朗 ) NR 上海 NP (NR 上海 )

11 Deterministic Parsing Model Cont. Action 6: Reduce the first two items on stack to a VP node, with node (VV 访问 ) as the head Parser State: Stack: Queue: Θ NR 布朗 NP (NR 布朗 ) VV 访问 NR 上海 NP (NR 上海 ) VP (VV 访问 )

12 Deterministic Parsing Model Cont. Action 7: Reduce the first two items on stack to an IP node, take the head node of the VP subtree as the head -- (VV 访问 ). Parser State: Stack: Queue: Θ NR 布朗 NP (NR 布朗 ) VV 访问 NR 上海 NP (NR 上海 ) VP (VV 访问 )

13 Deterministic Parsing Model Cont. Parsing terminates when queue is empty and stack only contains one item Final parse tree: NP (NR 上海 ) NR 布朗 NP (NR 布朗 ) VV 访问 NR 上海 VP (VV 访问 )

14 Classifiers Classification is the most important part of deterministic parsing. We experimented with four different classifiers: SVM classifier finds a hyper-plane that gives the maximum soft margin that minimizes the expected risk. Maximum Entropy Classifier estimates a set of parameters that would maximize the entropy over distributions that satisfy certain constraints which force the model to best account for the training data. Decision Tree Classifier We used C4.5 Memory-based Learning kNN classifier, Lazy learner, short training time, ideal for prototyping.

15 Features The features we used are distributionally derived or linguistically motivated. Each feature carries information about the context of a particular parse state. We denote the top item on the stack as S(1), and second item (from the top) on the stack as S(2), and so on. Similarly, we denote the first item on the queue as Q(1), the second as Q(2), and so on.

16 Features A Boolean feature indicates if a closing punctuation is expected or not. A Boolean value indicates if the queue is empty or not. A Boolean feature indicates whether there is a comma separating S(1) and S(2) or not. Last action given by the classifier, and number of words in S(1) and S(2). Headword and its POS of S(1), S(2), S(3) and S(4), and word and POS of Q(1), Q(2), Q(3) and Q(4). Nonterminal label of the root of S(1) and S(2), and number of punctuations in S(1) and S(2). Rhythmic features and the linear distance between the head-words of the S(1) and S(2). Number of words found so far to be dependents of the head-words of S(1) and S(2). Nonterminal label, POS and headword of the immediate left and right child of the root of S(1) and S(2). Most recently found word and POS pair that is to the left of the head-word of S(1) and S(2). Most recently found word and POS pair that is to the right of the head-word of S(1) and S(2).

17 POS tagging In our model, POS tagging is treated as a separate problem and is done prior to parsing. But we care about the performance of the parser in realistic situations with automatically generated POS tags. We implemented a simple 2-pass POS tagging model based on SVM, achieved 92.5% accuracy.

18 Experiments Standard data collection Training set: section of the Penn Chinese Treebank (3484 sentences, words). Development set: section Testing set: section Total: words, about 1/10 of the size of English Penn Treebank. Standard corpus preparation Empty nodes were removed Functional label of nonterminal nodes removed. Eg. NP-Subj -> NP For scoring we used the evalb 1 program. Labeled recall, labeled precision and F1 (harmonic mean) measures are reported. 1.

19 Results Comparison of classifiers on development set using gold-standard POS classificationParsing Accuracy ModelAccuracyLRLPF1FailTime SVM94.3%86.9%87.9%87.4%03m 19s Maxent92.6%84.1%85.2%84.6%50m 21s DTree192.0%78.8%80.3%79.5%420m 12s DTree2N/A81.6%83.6%82.6%300m 18s MBL90.6%74.3%75.2%74.7%216m 11s

20 Classifier Ensemble Using stacked-classifier techniques, we improved the performance on the dev set to 90.3% LR and LP of 90.5%, which is a 3.4% improvement in LR and a 2.6% improvement in LP over the SVM model.

21 Comparison with related work Results on test set using automatically generated POS.

22 Comparison with related work cont. Comparison of parsing speed ModelRuntime Bikel54m 6s Levy & Manning8m 12s DTree0m 14s Maxent0m 24s SVM3m 50s

23 Discussion and future work Among the classifiers, SVM has high accuracy but low speed; DTree has lower accuracy but great speed; Maxent sits in between these two in terms of accuracy and speed. It is desirable to bring the two ends of the spectrum closer, ie. increase the accuracy of DTree classifier, lower the computational cost of SVM classification. Action items Apply boosting techniques (Adaboost, random forest, bagging, etc.) to DTree. (Preliminary attempt didn’t yield better performance, calls for further investigation). Feature selection (especially on lexical items) to reduce computational cost of classification Re-implement the parser in C++ (avoid invoking external processes and expensive I/O

24 Conclusion Implemented a classifier based deterministic constituency parser for Chinese We achieved comparable results to the state- of-the-art in Chinese parsing Very fast parsing is made possible for applications that are speed-critical with some tradeoff in accuracy. Advances in machine learning techniques can be directly applied to parsing problem, opens up lots of opportunities for further improvement

25 Reference Daniel M. Bikel and David Chiang Two statistical parsing models applied to the Chinese Treebank. In Proceedings of the Second Chinese Language Processing Workshop. Daniel M. Bikel On the Parameter Space of Generative Lexicalized Statistical Parsing Models. Ph.D. thesis, University of Pennsylvania. David Chiang and Daniel M. Bikel Recovering latent information in treebanks. In Proceedings of the 19th International Conference on Computational Linguistics. Michael John Collins Head-driven Statistical Models for Natural Langauge Parsing. Ph.D. thesis, University of Pennsylvania. Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and Antal van den Bosch Timbl: Tilburgmemory based learner, version 5.1, reference guide. Technical Report 04-02, ILK Research Group, Tilburg University. Pascale Fung, Grace Ngai, Yongsheng Yang, and Benfeng Chen A maximum-entropy Chinese parser augmented by transformation-based learning. ACM Transactions on Asian Language Information Processing, 3(2):159–168. Mary Hearne and Andy Way Data-oriented parsing and the Penn Chinese Treebank. In Proceedings of the First International Joint Conference on Natural Language Processing. Zhengping Jiang Statistical Chinese parsing. Honours thesis, National University of Singapore. Zhang Le, Maximum Entropy Modeling Toolkit for Python and C++. Reference Manual. Roger Levy and Christopher D. Manning Is it harder to parse Chinese, or the Chinese Treebank? In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Xiaoqiang Luo A maximum entropy Chinese character-based parser. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. David M. Magerman Natural Language Parsing as Statistical Pattern Recognition. Ph.D. thesis, Stanford University. Kenji Sagae and Alon Lavie A classifier-based parser with linear run-time complexity. In Proceedings of the Ninth International Workshop on Parsing Technology. Deyi Xiong, Shuanglong Li, Qun Liu, Shouxun Lin, and Yueliang Qian Parsing the Penn Chinese Treebank with semantic knowledge. In International Joint Conference on Natural Language Processing 2005.

26 Thank you! Questions?