Improving Parsing Accuracy by Combining Diverse Dependency Parsers

Slides:

Advertisements

Similar presentations

Dependency tree projection across parallel texts David Mareček Charles University in Prague Institute of Formal and Applied Linguistics.

Advertisements

Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.

Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.

1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.

Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.

Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.

Natural Language Processing Vasile Rus

Separation Lab overview  Given a mixture of Sand, Salt, Poppy Seeds and Iron Filings, design and execute a separation method.  Write a proposed procedure.

Language Identification and Part-of-Speech Tagging

CSC 594 Topics in AI – Natural Language Processing

Linguistic Graph Similarity for News Sentence Searching

Approaches to Machine Translation

k-Nearest neighbors and decision tree

CSC 594 Topics in AI – Natural Language Processing

PRESENTED BY: PEAR A BHUIYAN

Unit 5: Hypothesis Testing

Web News Sentence Searching Using Linguistic Graph Similarity

CRF &SVM in Medication Extraction

David Mareček and Zdeněk Žabokrtský

Table-driven parsing Parsing performed by a finite state machine.

CSSE463: Image Recognition Day 11

Improving a Pipeline Architecture for Shallow Discourse Parsing

A Statistical Model for Parsing Czech

Perceptron Learning Demonstration

Regular Grammar - Finite Automaton

Students: Meiling He Advisor: Prof. Brain Armstrong

Statistical NLP: Lecture 9

Constraining Chart Parsing with Partial Tree Bracketing

System Combination LING 572 Fei Xia 01/31/06.

The CoNLL-2014 Shared Task on Grammatical Error Correction

Introduction Task: extracting relational facts from text

iSRD Spam Review Detection with Imbalanced Data Distributions

Accountability and Attention during Questioning

Approaches to Machine Translation

CSCI 5832 Natural Language Processing

Chunk Parsing CS1573: AI Application Development, Spring 2003

Brian Nisonger Shauna Eggers Joshua Johanson

Ensemble learning.

CSCI 5832 Natural Language Processing

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

Evaluating Classifiers

Word embeddings (continued)

CSSE463: Image Recognition Day 11

Extracting Why Text Segment from Web Based on Grammar-gram

A Joint Model of Orthography and Morphological Segmentation

Presentation transcript:

Improving Parsing Accuracy by Combining Diverse Dependency Parsers Daniel Zeman and Zdeněk Žabokrtský ÚFAL MFF, Univerzita Karlova, Praha

Overview introduction existing parsers and their accuracies methods of combination switching unbalanced voting results conclusion Vancouver, 10.10.2005 Zeman & Žabokrtský

Dependency Parsing parse: S  2N S = set of all sentences N = set of natural numbers Vancouver, 10.10.2005 Zeman & Žabokrtský

Dependency Parsing Píše dopis svému příteli . Píše dopis svému příteli VB-S---3P-AA-- VeYS------A--- NNIS1-----A--- NNIS4-----A--- P8ZS3---------- NNMS3-----A--- NNMS5-----A--- NNMS6-----A--- Z:------ Píše dopis svému příteli . Vancouver, 10.10.2005 Zeman & Žabokrtský

Dependency Parsing letter to his friend . He is writing a letter VB-S---3P-AA-- VeYS------A--- NNIS1-----A--- NNIS4-----A--- P8ZS3---------- NNMS3-----A--- NNMS5-----A--- NNMS6-----A--- Z:------ write letter to his friend . Vancouver, 10.10.2005 Zeman & Žabokrtský

Dependency Parsing 1 4 letter to his friend . He is writing a letter 1 4 write letter to his friend . Vancouver, 10.10.2005 Zeman & Žabokrtský

Prague Dependency Treebank (PDT 1.0) Czech 1 255 590 training tokens in 73 088 non-empty sentences 63 353 tune tokens in 3646 sentences 62 677 test tokens in 3673 sentences accuracy = percentage of tokens with correctly assigned parent nodes (each dep. tree has an artificial root node) Vancouver, 10.10.2005 Zeman & Žabokrtský

Existing Parsers For Czech (PDT; accuracies on Tune set): 83.6 % Eugene Charniak’s (ec) [ported from English] 81.7 % Michael Collins’ (mc) [ported from English] 74.3 % Zdeněk Žabokrtský’s (zz) [hand-made rules] 73.8 % Daniel Zeman’s (dz) [dependency n-grams] Tomáš Holan’s: 71.0 % pshrt 69.5 % left-to-right [push-down automaton] 62.0 % right-to-left [push-down automaton] Vancouver, 10.10.2005 Zeman & Žabokrtský

More Existing Parsers New parsers (2005): Nivre & Jenssen [push-down automaton] McDonald & Ribarov [max span tree] EC++ (Hall & Novák) No accuracy figures for our Tune set. But they are better than most of our parsers pool. Vancouver, 10.10.2005 Zeman & Žabokrtský

Good Old Truth: Two Heads Are More Than One! van Halteren et al.: tagging Brill and Wu: tagging Brill and Hladká: bagging parsers Henderson and Brill: constituent parsing Frederking and Nirenburg: machine translation Fiscus: speech recognition Borthwick: named entity recognition Inui and Inui: partial parsing Florian and Yarowsky: word sense disambiguation Chu-Carroll et al.: question answering Vancouver, 10.10.2005 Zeman & Žabokrtský

Voting Question: “What is the index of the parent of the i-th node?” Answer: ec: “7” mc: “7” zz: “5” dz: “11” Resulting answer: “7” Vancouver, 10.10.2005 Zeman & Žabokrtský

Emerging Issues Are the parsers different enough to contribute uniquely? How do we do if all parsers disagree? What if the resulting structure is not a tree? Vancouver, 10.10.2005 Zeman & Žabokrtský

Uniqueness of a Parser How many parents are there that only parser X found? pool of 7 parsers (ec, mc, zz, dz, thr, thl, thp) test data set ec: 1.7 % zz: 1.2 % (rule-based parser, no statistics, non-projectivities!) mc: 0.9 % others: 0.3 – 0.4 % Vancouver, 10.10.2005 Zeman & Žabokrtský

Uniqueness of a Parser How many parents are there that only parser X found? four best parsers (ec, mc, zz, dz) test data set ec: 3.0 % zz: 2.0 % (rule-based parser, no statistics, non-projectivities!) mc: 1.7 % dz: 1.0 % Vancouver, 10.10.2005 Zeman & Žabokrtský

Uniqueness of a Parser How many parents are there that only parser X found? two best parsers (ec, mc) test data set ec: 8.1 % mc: 6.2 % Vancouver, 10.10.2005 Zeman & Žabokrtský

Uniqueness of a Parser The unique things are hard to push through  The real strength will always be there where parsers agree (voting) Vancouver, 10.10.2005 Zeman & Žabokrtský

Majority vs. Oracle test data ec: 85.0 % ec mc zz dz thr thl thp Majority: >half of the parsers 76.8 % 75.1 % 82.9 % Oracle: at least one parser 95.8 % 94.0 % 93.0 % Vancouver, 10.10.2005 Zeman & Žabokrtský

Majority Voting Three parsers (ec+mc+zz): 83 % of parents known by at least two parsers (majority) However, ec alone achieves 85 %! For some parents, there is no majority! (ecmczz) In such cases, use ec’s opinion. together 86.7 % Vancouver, 10.10.2005 Zeman & Žabokrtský

Weighting the Parsers We have backed-off to ec. Why? — It’s the best parser of all! How do we know? We can measure the accuracies on the Tune data set. Can we use the accuracies in a more sophisticated way? Vancouver, 10.10.2005 Zeman & Žabokrtský

Weighting the Parsers A parser has so many votes, how many percent of accuracy it achieves. E.g., mc+zz would outvote ec+thr: 81.7 + 74.3 = 156 > 154.6 = 83.6 + 71.0 Context is not taken into account (so far). Vancouver, 10.10.2005 Zeman & Žabokrtský

Context Hope: one parser is good at PP attachment while another knows how to build coordination (e.g.). Features such as morphology of the dependent node may help to find the right parser. The context sensitivity of the combining classifier was trained on the Tune Data Set. Vancouver, 10.10.2005 Zeman & Žabokrtský

Context Features For each node of: the dependent, and the parents proposed by the respective parsers: part of speech, subcategory, gender, number, case, inner gender, inner number, person, degree of comparison, negativeness, tense, voice, semantic flags (proper name, geography…) For each gov-dep pair: mutual position (left neighbor, right far…) For each parser pair: do the two parsers agree? Vancouver, 10.10.2005 Zeman & Žabokrtský

Decision Trees We have trained C5 (Quinlan) Very minor improvement (0.1 %) Got quite simple decision trees Mimic voting — parser agreement are the most important features (in fact, this is not context) Did not help with just two parsers (ec+mc) (no voting possible) Vancouver, 10.10.2005 Zeman & Žabokrtský

Example of a Decision Tree agreezzmc = yes: zz (3041/1058) agreezzmc = no: :...agreemcec = yes: ec (7785/1026) agreemcec = no: :...agreezzec = yes: ec (2840/601) agreezzec = no: :...zz_case = 6: zz (150/54) zz_case = 3: zz (34/10) zz_case = X: zz (37/20) zz_case = undef: ec (2006/1102) zz_case = 7: zz (83/48) zz_case = 2: zz (182/110) zz_case = 4: zz (108/57) zz_case = 1: ec (234/109) zz_case = 5: mc (1) zz_case = root: :...ec_negat = A: mc (117/65) ec_negat = undef: ec (139/65) ec_negat = N: ec (1) ec_negat = root: ec (2) Vancouver, 10.10.2005 Zeman & Žabokrtský

It is not guaranteed that the result is a tree! 1 2 3 1 2 3 1 2 3 1 2 3 Vancouver, 10.10.2005 Zeman & Žabokrtský

Note: We Actually May Be Willing to Accept Non-Trees The method of computing accuracy motivates to look at nodes, not the whole structure. Suppose that one edge in a cycle is wrong, we do not know which one, all others are good. If we wrongly select the bad one, we get two wrong edges. When partial relations are sought for, the whole structure may not matter. Vancouver, 10.10.2005 Zeman & Žabokrtský

How to Preserve Treeness In each step (adding a new dependency), rule out parsers whose proposal introduces a cycle. If all parsers propose cycles, abandon the whole structure. Use ec’s tree as is, instead. Vancouver, 10.10.2005 Zeman & Žabokrtský

Results Baseline (ec): 85.0 % Four parsers (ec+mc+zz+dz), cycles allowed: 87.0 % 91.6 % structures are trees Four parsers (ec+mc+zz+dz), cycles banned: 86.9 % (sorry for the typos in the paper sec. 5.4) Vancouver, 10.10.2005 Zeman & Žabokrtský

Unbalanced Combination (Brill & Hladká in Hajič et al., 1998) Is precision more important to us than recall? Better say nothing than make a mistake. That may be our priority when: preprocessing text for annotators extracting various phenomena from a corpus (if there is no parse for a sentence, never mind, we just will not extract anything from here) Vancouver, 10.10.2005 Zeman & Žabokrtský

Unbalanced Combination Include only dependencies proposed by at least half of the parsers. Some nodes won’t get a parent. Results for 7 parsers: precision 90.7 % recall 78.6 % f-measure 84.2 % Vancouver, 10.10.2005 Zeman & Žabokrtský

Unbalanced Combination Interesting: Unbalanced voting of even number of parsers prefers recall over precision! Sometimes one half of the parsers proposes one parent, while the other half agree on another candidate. Results for 4 best parsers: precision 85.4 % recall 87.7 % f-measure 86.5 % Vancouver, 10.10.2005 Zeman & Žabokrtský

Related Work Brill and Hladká combined several “parsers” — in fact one parser, trained on different bags of training data. 6 % error reduction, cf. with our 13 % Brill and Henderson combined three constituency-based parsers. They did not find context helpful either. no crossing bracket introduction lemma Vancouver, 10.10.2005 Zeman & Žabokrtský

Summary Combination techniques successfully applied to dependency parsing. Keeping treeness is not too expensive (in terms of accuracy). Vancouver, 10.10.2005 Zeman & Žabokrtský

Future Work We are preparing the voting right for new parsers (Nivre/Jenssen, Ribarov/McDonald, Charniak/Hall/Novák) As these parsers are better than most of our current parser pool, we expect the results to improve — provided the new parsers are able to contribute new ideas. Vancouver, 10.10.2005 Zeman & Žabokrtský

Thank you. Vancouver, 10.10.2005 Zeman & Žabokrtský