Cs target cs target en source Subject-PastParticiple agreement Czech subject and past participle must agree in number and gender. Two-step translation.

Slides:



Advertisements
Similar presentations
Dependency tree projection across parallel texts David Mareček Charles University in Prague Institute of Formal and Applied Linguistics.
Advertisements

Patent documentation - comparison of two MT strategies Lene Offersgaard, Claus Povlsen Center for Sprogteknologi, University of Copenhagen
Development of a German- English Translator Felix Zhang.
Unsupervised Dependency Parsing David Mareček Institute of Formal and Applied Linguistics Charles University in Prague Doctoral thesis defense September.
Grammar Unit 1: Lesson 4.2 Making Collective Nouns Agree with Verbs.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Chapter 4 Basics of English Grammar
English-Hindi Translation in 21 Days Ondřej Bojar, Pavel Straňák, Daniel Zeman ÚFAL MFF, Univerzita Karlova, Praha.
POS based on Jurafsky and Martin Ch. 8 Miriam Butt October 2003.
Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
Formal Commands in Spanish Mandatos formales en español.
Grammar Level 2: The Parts of the Sentence The study of the sentence is the study of thought itself. In order to express a thought, we must do two things:
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Latin Grammar: Singular and Plural Magister Henderson Latin I.
Verb Patterns and the Be Patterns Ed McCorduck English 402--Grammar SUNY Cortland
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
LING 388: Language and Computers Sandiway Fong Lecture 17.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.
Phonemes A phoneme is the smallest phonetic unit in a language that is capable of conveying a distinction in meaning. These units are identified within.
Tree-based Machine Translation using syntax and semantics
April 17, 2007MT Marathon: Tree-based Translation1 Tree-based Translation with Tectogrammatical Representation Jan Hajič Institute of Formal and Applied.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
Ibrahim Badr, Rabih Zbib, James Glass. Introduction Experiment on English-to-Arabic SMT. Two domains: text news,spoken travel conv. Explore the effect.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Rules for the correct pronunciation of the –s ending (1) The sounds /s/ /z/ or / ɪ z/ (plural nouns and third person singular -s) If a word ends with the.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Parsing and Translating
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Subject-Verb Agreement & Parallel Structure
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Word classes and part of speech tagging Chapter 5.
NOUNS CHAPTER 2. WHAT ARE THEY? Nouns name a person, place, thing, or idea. Nouns can be singular or plural. Nouns can be possessive. Nouns can be common.
BAMAE: Buckwalter Arabic Morphological Analyzer Enhancer Sameh Alansary Alexandria University Bibliotheca Alexandrina 4th International.
Subject/Predicate Bell Ringer…
Inverted and Interrupted Order
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
English Grammar Lecture 6: Verb Patterns and the "Be" Patterns
Lecture 9: Part of Speech
Yr 9 grammar review Using venir de + infinitive
Syntax and parsing Introduction to Computational Linguistics – 28 March 2017.
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Grammar Unit 1: Lesson 4.2 Making Collective Nouns Agree with Verbs.
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
VERBS.
Chapter 4 Basics of English Grammar
Los sustantivos (Nouns)
Agreement Notes: Indefinite pronouns ending in one, thing, or body are singular Both, few, many, and several are plural Subjects joined by and are usually.
The CoNLL-2014 Shared Task on Grammatical Error Correction
Chapter 4 Basics of English Grammar
Natural Language Processing
15 minutes of Independent Reading
Conventions of Standard English Anchor 1
ALI139 – Arabic Grammar I Week 2.
Presentation transcript:

cs target cs target en source Subject-PastParticiple agreement Czech subject and past participle must agree in number and gender. Two-step translation with grammatical post-processing David Mareček, Rudolf Rosa, Petra Galuščáková, Ondřej Bojar; {marecek, rosa, galuscakova, Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (ÚFAL) Full text, acknowledgement and the list of references in the proceedings of WMT SystemBeforeAfterImprovement cmu-heaf cu-bojar cu-zeman dcu dcu-combo eurotrans koc koc-combo onlineA onlineB potsdam rwth-combo sfu uedin upv-combo Rule Fired Improved Worsened % Improved SubjCase SubjPP NounAdj NounNum PrepNoun SubjPred ReflTant PrepNoCh Depfix improvements in BLEU score SystemBeforeAfterImprovement cu-twostep cmu-heaf commerc cu-bojar cu-popel cu-tamch cu-zeman jhu online-B udein upv-prhlt System Annotator Changed Improved Worsened Indefinite cu-bojar-twostepA %3914.5% % cu-bojar-twostepB %5018.6% % online-BA % %5221.1% online-BB % %187.3% Test Set Changed Improved Worsened Indefinite BLEU: before after diff newssyscombtest newssyscombtest A/BImpr.Wors.Indef. Impr Wors Indef Noun number In Czech, plural has sometimes the same form as singular. Correct nouns tagged as singular to plural if the English counterpart is in plural. Subject case Czech subject must be in nominative case. Subject-Predicate agreement Subject and predicate must agree in morphological number. Preposition-Noun agreement Nouns must agree with prepositions in morphological case. Noun-Adjective agreement Adjectives must agree with nouns in number, gender, and case. Reflexive particle deletion Czech reflexive verbs are accompanied by reflexive particles (‘se’, ‘si’). Particles not belonging to any verb are deleted. Prepositions without children Prepositions cannot be leaves. Nouns are attached to them according to English source. Czech translated: Řada lidí z Česká republika nedorazilo. (wrong) English source: A number of people from Czech Republic did not arrive. A Det DT number Subj NN from AuxP IN of AuxP IN people Atr NN not AuxV ?? did AuxV VB Czech Atr NN Republic Atr NN arrive. Pred VB Řada Subj N,fem,sg,nom lidí Atr N,pl,gen republika Adv N,fem,sg,nom z AuxP P,gen Česká Atr A,fem,sg,nom nedorazilo. Pred V,neut,sg,past the Det DT Manual and automatic evaluation across different datasets. Impact of rules Manual evaluation BLEU: → BLEU: → Goal: Improve Czech grammar Phrase-based MT often breaks morphological agreement. Some of the dependencies still recovered in parses of MT output. => We can check and fix the agreements. Depfix A rule-based grammar correction of MT to Czech. Applicable on top of any MT system. cu-bojar Simple non-factored Moses setup. Truecasing based on lemmatizer output. Additional parallel data include Official Journal of EU. Large monolingual data: Includes Czech National Corpus, our web collection... Two LMs (5-gr and 6-gr) weighted by MERT. Each LM already interpolated from domain-specific sources. cu-marecek MT in three steps: 1)Moses: English -> Lemmatized Czech 2)Moses: Lemmatized Czech -> Czech 3)Depfix: Rule-based grammar correction. Only 1-best output passed between the steps. The 2nd Moses trained on a large monolingual corpus. Lemmatized Czech includes morphological features over in English. Our WMT11 submissions WMT10 WMT11 Rules nedorazila. Pred V,fem,sg,past republiky Adv N,fem,sg,gen České Atr A,fem,sg,gen Czech after depfix: Řada lidí z České republiky nedorazila. (correct) NounAdj PrepNoun SubjPP Interannotator agreement improved indefinite worsened en source