Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.

Slides:



Advertisements
Similar presentations
Dependency tree projection across parallel texts David Mareček Charles University in Prague Institute of Formal and Applied Linguistics.
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.
MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo.
Tools for Data Partnering: How the MAP test puts students and parents in the driver’s seat with an NRT. Ann Thober, MEd– Reading Coach Torri Ortiz Lienemann,
June 2004 D ARPA TIDES MT Workshop Measuring Confidence Intervals for MT Evaluation Metrics Ying Zhang Stephan Vogel Language Technologies Institute Carnegie.
Methodologies for Evaluating Dialog Structure Annotation Ananlada Chotimongkol Presented at Dialogs on Dialogs Reading Group 27 January 2006.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
TectoMT two goals of TectoMT –to allow experimenting with MT based on deep- syntactic (tectogrammatical) transfer –to create a software framework into.
1/36 TectoMT Zdeněk Žabokrtský Institute of Formal and Applied Linguistics MFF UK Software framework for developing MT systems (and other NLP applications)
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Today Evaluation Measures Accuracy Significance Testing
1/21 Introduction to TectoMT Zdeněk Žabokrtský, Martin Popel Institute of Formal and Applied Linguistics Charles University in Prague CLARA Course on Treebank.
Metadata generation and glossary creation in eLearning Lothar Lemnitzer Review meeting, Zürich, 25 January 2008.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
1 TURKOISE: a Mechanical Turk-based Tailor-made Metric for Spoken Language Translation Systems in the Medical Domain Workshop on Automatic and Manual Metrics.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
Cs target cs target en source Subject-PastParticiple agreement Czech subject and past participle must agree in number and gender. Two-step translation.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
JHU WORKSHOP July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva,
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Tokenization & POS-Tagging
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
2/5/01 Morphology technology Different applications -- different needs –stemmers collapse all forms of a word by pairing with “stem” –for (CL)IR –for (aspects.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Chunk Parsing II Chunking as Tagging. Chunk Parsing “Shallow parsing has become an interesting alternative to full parsing. The main goal of a shallow.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
NSF PARTNERSHIP FOR RESEARCH AND EDUCATION : M EANING R EPRESENTATION FOR S TATISTICAL L ANGUAGE P ROCESSING 1 TectoMT TectoMT = highly modular software.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
1/16 TectoMT Zdeněk Žabokrtský ÚFAL MFF UK Software framework for developing MT systems (and other NLP applications)
Language Identification and Part-of-Speech Tagging
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Linguistic Graph Similarity for News Sentence Searching
Web News Sentence Searching Using Linguistic Graph Similarity
Neural Machine Translation by Jointly Learning to Align and Translate
Statistical NLP: Lecture 13
Statistical NLP: Lecture 9
Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B
Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Statistical NLP : Lecture 9 Word Sense Disambiguation
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University in Prague, MFF, ÚFAL Overview Introduced by Kos and Bojar (2009), inspired by Giménez and Márquez (2007). Counts overlapping of deep-syntactic lemmas (t-lemmas) of content words. Lemmas are matched only if semantic parts-of-speech (Sgall et al. 1986) agree. Does not consider word order and auxiliary words. SemPOS metric SemPOS requires full parsing up to the deep syntactic layer. => SemPOS is computational costly. There are tools assigning t-lemmas and semposes only for Czech and English. => SemPOS is difficult to adapt for other languages. Issues Approximate t-lemmas and semposes using only tagger output. => Faster and more adaptable for other languages. => More suitable for MERT tuning. Proposed Solution: Approximations Morphological tag determines sempos. CzEng corpus (Bojar and Žabokrtský, 2009) used to create dictinoray which maps morphological tag to most frequent sempos. Surface lemmas are used instead of t- lemmas. Accuracy on CzEng e-test: 93.6 % for English 88.4 % for Czech Sempos from Tag The number of most frequent words which are thrown away in English Restricting the Set of Semposes Deep syntactic layer does not contain auxiliary words. Assumed that auxiliary words are the most frequent words, we exclude a certain number of most frequent words. Stopwords lists were obtained from CzEng corpus. Exact cut-offs: 100 words for English 220 words for Czech Contribution of each sempos type to the overall performance can differ a lot. We assume that some sempos types raise the correlation and some lower it. We restrict the set of considered semposes to the better ones. Correlation with Human Judgments Excluding Stop-Words Custom Tagger Full text, acknowledgement and the list of references in the proceedings of EMNLP 2011 Workshop on Statistical Machine Translation (WMT11). Results English as a target language ApproximationOverlappingMinMaxAvg approx-restrcap-macro taggercap-macro origcap-macro approx-restrcap-micro taggercap-micro origcap-micro approxcap-micro approx+cap-micro and BLEU approxcap-macro BLEU Czech as a target language Overlapping boost-micro cap-micro We use sequence labeling algorithm to choose the t-lemma and sempos tag. The CzEng corpus served to train two taggers (for English and Czech). At each token, the tagger use word form, surface lemma and morphological tag of the current and previous two tokens. Tagger chooses sempos from all sempos tags which were seen in corpus with the given morphological tags. The t-lemma is often the same as the surface lemma, but it could also be surface lemma with an auxiliary word (kick off, smát se). The tagger can also choose such t-lemma if the auxiliary word is present in the sentence. The overall accuracy on CzEng e-test: 97.9 % for English 94.9 % for Czech Src:Polovina míst v naší nabídce zůstává volná. Ref: Half of our capacity remains available. Hyp:Half of the seats in our offer remains free. Hypothesis half n.denot seat n.denot #perspron n.pron.def.pers offer n.denot remain v free adj.denot half n.denot available adj.denot #perspron n.pron.def.pers capacity n.denot Reference remain v Hypothesis half n.denot seat n.denot #perspron n.pron.def.pers offer n.denot remain v free adj.denot half n.denot available adj.denot #perspron n.pron.def.pers capacity n.denot Reference remain v Hypothesis half n.denot seat n.denot #perspron n.pron.def.pers offer n.denot remain v free adj.denot half n.denot available adj.denot #perspron n.pron.def.pers capacity n.denot Reference remain v cap-macro ApproximationOverlappingMinMaxAvg approxcap-micro origcap-macro approxcap-macro taggercap-micro origcap-micro approx+cap-micro and BLEU taggercap-macro BLEU Overlapping performance Average rank in our experiments Overlappingin Englishin Czech boost-micro1213 cap-macro6.65 cap-micro5.46 Boost-micro is not suitable for sempos- based metrics. Tunable Metric Task We optimized towards linear combination (equal weights) of BLEU and Approx + Cap-micro. BLEU chooses sentences with correct morfology and word order, while SemPOS prefers sentences with correctly translated content words. ≥ others> others the fifththe first Our final result heavily depends on the interpretation of human rankings. Out of 8, we are: TagMinMaxAvg v n.denot adj.denot n.pron.indef Semposes with the highest correlation in English. This is also the restricted set of semposes used in English. Tested on newstest2008, test2008, newstest2009, newsyscombtest2010. Morph. Tag SemposRel. Freq. NNn.denot0.989 VBZv0.766 VBNv0.953 JJadj.denot0.975 NNPn.denot0.999 PRPn.pron.def.pers0.999 Example of used dictionary in English approx approx-stopwords approx-restr tagger (Giménez, Márquez, 2007) (Bojar, Kos, 2007) (our)