Generation-Heavy Hybrid Machine Translation Nizar Habash Postdoctoral Researcher Center for Computational Learning Systems Columbia University Columbia.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Machine Translation: Challenges and Approaches
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Machine Translation: Interlingual Methods Thanks to Les Sikos Bonnie J. Dorr, Eduard H. Hovy, Lori S. Levin.
Language Divergences and Solutions Advanced Machine Translation Seminar Alison Alvarez.
The Practical Value of Statistics for Sentence Generation: The Perspective of the Nitrogen System Irene Langkilde-Geary.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
Introduction.  “a technique that enables the computer to encode complex grammatical knowledge such as humans use to assemble sentences, recognize errors.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Machine Translation Dr. Nizar Habash Center for Computational Learning Systems Columbia University COMS 4705: Natural Language Processing Fall 2010.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.
CMSC 723 / LING 645: Intro to Computational Linguistics September 8, 2004: Dorr MT (continued), MT Evaluation Prof. Bonnie J. Dorr Dr. Christof Monz TA:
David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Translation Divergence LING 580MT Fei Xia 1/10/06.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
Creation of a Russian-English Translation Program Karen Shiells.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Intuitive Coding of the Arabic Lexicon Ali Farghaly & Jean Senellart SYSTRAN Software Corporation San Diego, CA & Soisy, France.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Haitham Elmarakeby.  Speech recognition
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Machine Translation Divergences: A Formal Description and Proposed Solution Bonnie J. Dorr University of Maryland Presented by: Soobia Afroz.
Machine Translation: Challenges and Approaches Nizar Habash Associate Research Scientist Center for Computational Learning Systems Columbia University.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Ling 575: Machine Translation Yuval Marton Winter 2016 February 9: MT Evaluation Much of the materials was borrowed from course slides of Chris Callison-Burch.
METEOR: Metric for Evaluation of Translation with Explicit Ordering An Improved Automatic Metric for MT Evaluation Alon Lavie Joint work with: Satanjeev.
Approaches to Machine Translation
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 13
Machine Translation Nov 8, 2006
Approaches to Machine Translation
Introduction to Machine Translation
Presentation transcript:

Generation-Heavy Hybrid Machine Translation Nizar Habash Postdoctoral Researcher Center for Computational Learning Systems Columbia University Columbia University NLP Colloquium October 28, 2004

The Intuition Generation-Heavy Machine Translation Español ‚ عربي ‚ English Dictionary gist E

Introduction Research Contributions A general reusable and extensible Machine Translation (MT) model that transcends the need for large amounts of deep symmetric knowledge Development of reusable large-scale resources for English A large-scale Spanish-English MT system: Matador; Matador is more robust across genre and produce more grammatical output than simple statistical or symbolic techniques

Roadmap  Introduction  Generation-Heavy Machine Translation  Evaluation  Conclusion  Future Work

Introduction MT Pyramid Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration Interlingua Gisting Transfer

Introduction MT Pyramid Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration Interlingual Lexicons Dictionaries/Parallel Corpora Transfer Lexicons

Introduction MT Pyramid Gisting Transfer Source word Source syntax Source meaningTarget meaning Target syntax Target word

Introduction Why gisting is not enough Sobre la base de dichas experiencias se estableció en 1988 una metodología. Envelope her basis out speak experiences them settle at 1988 one methodology. On the basis of these experiences, a methodology was arrived at in 1988.

Introduction Translation Divergences 35% of sentences in TREC El Norte Corpus (Dorr et al 2002) Divergence Types –Categorial (X tener hambre  X be hungry) –Conflational (X dar puñaladas a Z  X stab Z) –Structural (X entrar en Y  X enter Y) –Head Swapping (X cruzar Y nadando  X swim across Y) –Thematic (X gustar a Y  Y like X)

Roadmap  Introduction  Generation-Heavy Machine Translation  Evaluation  Conclusion  Future Work

Generation-Heavy Hybrid Machine Translation Problem: asymmetric resources –High quality, broad coverage, semantic resources for target language –Low quality resources for source language –Low quality (many-to-many) translation lexicon Thesis: we can approximate interlingual MT without the use of symmetric interlingual resources

Relevant Background Work Hybrid Natural Language Generation Constrained Overgeneration  Statistical Ranking Nitrogen (Langkilde and Knight 1998), Halogen (Langkilde 2002) FERGUS (Rambow and Bangalore 2000) Lexical Conceptual Structure (LCS) based MT (Jackendoff 1983), (Dorr 1993)

LCS-based MT Example (Dorr, 1993)

Generation-Heavy Hybrid Machine Translation Analysis Translation Theta Linking Expansion Assignment Pruning Linearization Ranking … Generation

Matador Spanish-English GHMT Analysis Translation Spanish English Theta Linking Expansion Assignment Pruning Linearization Ranking Generation Expansive Rich Generation for English EXERGE

GHMT Analysis Source language syntactic dependency Example: Yo le di puñaladas a Juan. Features of representation –Approximation of predicate-argument structure –Long-distance dependencies dar Yopuñaladaa Juan :obj :mod:subj

GHMT Translation Lexical transfer but NO structural change Translation Lexicon (tener V)  ((have V) (own V) (possess V) (be V)) (deber V)  ((owe V) (should AUX) (must AUX)) (soler V)  ((tend V) (usually AV)) ADMINISTER,CONFER, DELIVER, EXTEND, GIVE, GRANT, HAND, LAND, RENDER I, MY, MINE STAB, KNIFE_WOUND AT, BY, INTO, THROUGH, TO JOHN :obj :mod :subj  dar Yopuñaladaa Juan :obj :mod:subj

GHMT Thematic Linking Syntactic Dependency  Thematic Dependency Which divergence Goal EXTEND, GIVE, GRANT, RENDER I, MY, MINE STAB, KNIFE_WOUND JOHN Theme Agent ADMINISTER,CONFER, DELIVER, EXTEND, GIVE, GRANT, HAND, LAND, RENDER I, MY, MINE STAB, KNIFE_WOUND AT, BY, INTO, THROUGH, TO JOHN :obj :mod :subj

GHMT Thematic Linking Resources Word Class Lexicon :NUMBER "V.13.1.a.ii" :NAME "Give - No Exchange” :POS V :THETA_ROLES ((( ag obl) ( th obl) ( goal obl to)) (( ag obl) ( goal obl) ( th obl))) :LCS_PRIMS (cause go) :WORDS (feed give pass pay peddle refund render repay serve)) Syntactic-Thematic Linking Map (:subj  ag instr th exp loc src goal perc mod-poss poss) (:obj2  goal src th perc ben) (across  goal loc) (in  loc mod-poss perc goal poss prop) (to  prop goal ben info th exp perc pred loc time)

GHMT Thematic Linking Syntactic Dependency  Thematic Dependency ((ADMINISTER V.13.2 ((AG OBL) (TH OBL) (GOAL OPT TO))) (CONFER V.37.6.b ((EXP OBL))) (DELIVER V.11.1 ((AG OBL) (GOAL OBL) (TH OBL) (SRC OPT FROM))) (EXTEND V.47.1 ((TH OBL) (MOD-LOC OPT. T))) (EXTEND V.13.3 ((AG OBL) (TH OBL) (GOAL OPT TO))) (EXTEND V.13.3 ((AG OBL) (GOAL OBL) (TH OBL))) (EXTEND V.13.2 ((AG OBL) (TH OBL) (GOAL OPT TO))) (GIVE V.13.1.a.ii ((AG OBL) (TH OBL) (GOAL OBL TO))) (GIVE V.13.1.a.ii ((AG OBL) (GOAL OBL) (TH OBL))) (GRANT V.29.5.e ((AG OBL) (INFO OBL THAT))) (GRANT V.29.5.d ((AG OBL) (TH OBL) (PROP OBL TO))) (GRANT V.13.3 ((AG OBL) (TH OBL) (GOAL OPT TO))) (GRANT V.13.3 ((AG OBL) (GOAL OBL) (TH OBL))) (HAND V.11.1 ((AG OBL) (TH OBL) (GOAL OPT TO) (SRC OPT FROM))) (HAND V.11.1 ((AG OBL) (GOAL OBL) (TH OBL) (SRC OPT FROM))) (LAND V.9.10 ((AG OBL) (TH OBL))) (RENDER V.13.1.a.ii ((AG OBL) (TH OBL) (GOAL OBL TO))) (RENDER V.13.1.a.ii ((AG OBL) (GOAL OBL) (TH OBL))) (RENDER V.10.6.a ((AG OBL) (TH OBL) (MOD-POSS OPT OF))) (RENDER V.10.6.a.LOCATIVE ((AG OPT) (SRC OBL) (TH OPT OF)))) ADMINISTER,CONFER, DELIVER, EXTEND, GIVE, GRANT, HAND, LAND, RENDER I, MY, MINE STAB, KNIFE_WOUND AT, BY, INTO, THROUGH, TO JOHN :obj :mod :subj

GHMT Thematic Linking Syntactic Dependency  Thematic Dependency ((ADMINISTER V.13.2 ((AG OBL) (TH OBL) (GOAL OPT TO))) (CONFER V.37.6.b ((EXP OBL))) (DELIVER V.11.1 ((AG OBL) (GOAL OBL) (TH OBL) (SRC OPT FROM))) (EXTEND V.47.1 ((TH OBL) (MOD-LOC OPT. T))) (EXTEND V.13.3 ((AG OBL) (TH OBL) (GOAL OPT TO))) (EXTEND V.13.3 ((AG OBL) (GOAL OBL) (TH OBL))) (EXTEND V.13.2 ((AG OBL) (TH OBL) (GOAL OPT TO))) (GIVE V.13.1.a.ii ((AG OBL) (TH OBL) (GOAL OBL TO))) (GIVE V.13.1.a.ii ((AG OBL) (GOAL OBL) (TH OBL))) (GRANT V.29.5.e ((AG OBL) (INFO OBL THAT))) (GRANT V.29.5.d ((AG OBL) (TH OBL) (PROP OBL TO))) (GRANT V.13.3 ((AG OBL) (TH OBL) (GOAL OPT TO))) (GRANT V.13.3 ((AG OBL) (GOAL OBL) (TH OBL))) (HAND V.11.1 ((AG OBL) (TH OBL) (GOAL OPT TO) (SRC OPT FROM))) (HAND V.11.1 ((AG OBL) (GOAL OBL) (TH OBL) (SRC OPT FROM))) (LAND V.9.10 ((AG OBL) (TH OBL))) (RENDER V.13.1.a.ii ((AG OBL) (TH OBL) (GOAL OBL TO))) (RENDER V.13.1.a.ii ((AG OBL) (GOAL OBL) (TH OBL))) (RENDER V.10.6.a ((AG OBL) (TH OBL) (MOD-POSS OPT OF))) (RENDER V.10.6.a.LOCATIVE ((AG OPT) (SRC OBL) (TH OPT OF)))) ADMINISTER,CONFER, DELIVER, EXTEND, GIVE, GRANT, HAND, LAND, RENDER I, MY, MINE STAB, KNIFE_WOUND AT, BY, INTO, THROUGH, TO JOHN :obj :mod :subj

GHMT Thematic Linking Syntactic Dependency  Thematic Dependency Goal EXTEND, GIVE, GRANT, RENDER I, MY, MINE STAB, KNIFE_WOUND JOHN Theme Agent ADMINISTER,CONFER, DELIVER, EXTEND, GIVE, GRANT, HAND, LAND, RENDER I, MY, MINE STAB, KNIFE_WOUND AT, BY, INTO, THROUGH, TO JOHN :obj :mod :subj

Interlingua Approximation through Expansion Operations obj enter John room subj in enter John room subj  in go John room subj  development N develop V  Categorial Variation put V butter N butter V  Node Conflation / Inflation Relation Conflation / Inflation Relation Variation

Interlingua Approximation 2 nd Degree Expansion obj cross John river subj mod swimming across go John river subj mod swimming Relation Inflation across swim John river subj Node Conflation

GHMT Structural Expansion Conflation Example, Goal STAB V I JOHN Agent Goal GIVE V I STAB N JOHN Theme Agent

GHMT Structural Expansion Conflation and Inflation Structural Expansion Resources –Word Class Lexicon :NUMBER "V.42.2" :NAME “Poison Verbs” :POS V :THETA_ROLES (((ag obl)(goal obl))) :LCS_PRIMS (cause go) :WORDS (crucify electrocute garrotte hang knife poison shoot smother stab strangle) –Categorial Variation Database (Habash and Dorr 2003) (:V (hunger) :N (hunger hungriness) :AJ (hungry)) (:V (validate) :N (validation validity) :AJ (valid)) (:V (cross) :N (crossing cross) :P (across)) (:V (stab) :N (stab))

GHMT Structural Expansion Conflation Example Goal GIVE V I STAB N JOHN Theme Agent STAB V

GHMT Structural Expansion Conflation Example Goal STAB V * * Agent [CAUSE GO] Goal GIVE V I STAB N JOHN Theme Agent

GHMT Structural Expansion Conflation Example, Goal STAB V I JOHN Agent Goal GIVE V I STAB N JOHN Theme Agent

Goal STAB V I JOHN Agent Goal GIVE V I STAB N JOHN Theme Agent GHMT Syntactic Assignment Thematic  Syntactic Mapping Object STAB V I, MY … JOHN Subject IObject GIVE V I, MY … STAB N, KNIFE_ WOUND N JOHN Object Subject Object Mod GIVE V I, MY … STAB N, KNIFE_ WOUND N TO, AT, … Object Subject JOHN

GHMT Structural N-gram Pruning Statistical lexical selection Object STAB V I, MY … JOHN Subject IObject GIVE V I, MY … STAB N, KNIFE_ WOUND N JOHN Object Subject Object Mod GIVE V I, MY … STAB N, KNIFE_ WOUND N TO, AT, … Object Subject JOHN Object STAB V I JOHN Subject IObject GIVE V I STAB N JOHN Object Subject Object Mod GIVE V I STAB N v TO Object Subject JOHN

Structural N-gram Model –Long-distance –Lexemes Surface N-gram Model –Local –Surface-forms GHMT Target Statistical Resources cloudeveryliningahassilver a lining silver have cloud every

GHMT Linearization &Ranking Oxygen Linearization (Habash 2000) Halogen Statistical Ranking (Langkilde 2002) I stabbed John. [ ] I gave a stab at John. [ ] I gave the stab at John. [ ] I gave an stab at John. [ ] I gave a stab by John. [ ] I gave a stab to John. [ ] I gave a stab into John. [ ] I gave a stab through John. [ ] I gave a knife wound by John. [ ]

Roadmap  Introduction  Generation-Heavy Machine Translation  Evaluation  Overall Evaluation  Component Evaluation  Conclusion  Future Work

Overall Evaluation Systems Gisting (GIST) Systran (SYST) IBM Model 4 (IBM4) Matador (MTDR) ApproachSymbolic Word-based Symbolic Transfer-based Statistical Word-based Hybrid Generation-Heavy Translation Model 400K surface-lexeme pairs 120K lexeme-lexeme pairs and large transfer lexicon Model 4 Giza Trained 50K UN sentence pairs 50K lexeme-lexeme pairs Language Model Unigrams Brown Corpus 1M words Bigrams 3M words (UN) Bigrams 3M words (UN) and Structural Bigrams 1.5M words (UN) Development Time 1 person-monthHundreds of person-years 1 person-month1 person-year (Brown et al 1990) (Al-Onaizan et al 1999) (Germann and Marcu 2000) (Resnik 1997)

Overall Evaluation Bleu Metric Bleu –BiLingual Evaluation Understudy (Papineni et al 2001) –Modified n-gram precision with length penalty –Quick, inexpensive and language independent –Correlates highly with human evaluation –Bias against synonyms and inflectional variations

Overall Evaluation Test Sets UNFBISBible Genre United Nations documents News broadcastReligious Spanish-English Sentence pairs 2,000 1,000 Sentence Length (words)

Overall Evaluation Results

Systran is overall best Gist is overall worst Matador is more robust than IBM4 Matador is more grammatical than IBM4 Matador has less information loss than IBM4

Overall Evaluation Grammaticality Example –SP: Ademàs dijo que solamente una inyecciòn masiva de capital extranjero... –EN: Further, he said that only a massive injection of foreign capital... –IBM4: further stated that only a massive inyecciòn of capital abroad... –MTDR: Also he spoke only a massive injection of foreign capital... Parsed all sentences (Spanish, English reference and English output) –Can we find main verb? –Pro Drop Restoration

Overall Evaluation Grammaticality: Verb Determination

Overall Evaluation Grammaticality: Subject Realization

Overall Evaluation Loss of Information Example –SP: El daño causado al pueblo de Sudáfrica jamás debe subestimarse. –EN: The damage caused to the people of his country should never be underestimated. –IBM4: the damage * the people of south * must never underestimated. –MTDR: Never the causado damage to the people of South Africa should be underestimated. Gisting (GIST) Systran (SYST) IBM Model 4 (IBM4) Matador (MTDR) Reference length 109% 94%104%

Component Evaluation Conducted several component evaluations –Parser ~75% correct (labeled dependency links) –Categorial Variation Database 81% Precision-Recall –Structural Expansion –Structural N-grams

Component Evaluation Structural Expansion Insignificant increase in Bleu score 40% of divergences pragmatic LCS lexicon coverage issues Minimal handling of nominal divergences Over-expansion –Además, destruyó totalmente sus cultivos de subsistencia … –EN: It had totally destroyed Samoa's staple crops... –MTDR: Furthermore, it totaled their cultivations of subsistence … –SP: Dicha adición se publica sólo en años impares. –EN: That addendum is issued in odd-numbered years only. –MTDR: concerned addendum is excluded in odd years.

Component Evaluation Structural N-grams 60% speed-up with no effect on quality

Roadmap  Introduction  Generation-Heavy Machine Translation  Evaluation  Conclusion  Future Work

Conclusion Research Contributions A general reusable and extensible MT model that transcends the need for large amounts of symmetric knowledge A systematic non-interlingual/non-transfer framework for handling translation divergences Extending the concept of symbolic overgeneration to include conflation and head-swapping of structural variations. A model for language-independent syntactic-to- thematic linking

Conclusion Research Contributions Development of reusable large-scale modules and resources: Exerge, Categorial Variation Database, etc. A large-scale Spanish-English GHMT implementation An evaluation of Matador against four models of machine translation found it to be robust across genre and to produce more grammatical output.

Ongoing Work Retargetability to new languages –Chinese, Arabic Extending system to use bi-texts –Phrase dictionary –Weighted translation pairs Generation-Heavy parsing –Small dependency grammar for foreign language –English structural n-grams to rank parses Extending system with new optional modules –Cross-lingual headline generation DepTrimmer (work with Bonnie Dorr) extending Trimmer (Dorr, et al. 2003) to dependency representation

Future Work Categorial Variation Database –Improving word-cluster correctness Structural Expansion –Extending to nominal divergences –Improving thematic linking with a statistical model Structural N-grams –Enriching with syntactic/thematic relations

Thank you! Questions?

Overall Evaluation Bleu Metric Test Sentence colorless green ideas sleep furiously Gold Standard References all dull jade ideas sleep irately drab emerald concepts sleep furiously colorless immature thoughts nap angrily

Overall Evaluation Bleu Metric Test Sentence colorless green ideas sleep furiously Gold Standard References all dull jade ideas sleep irately drab emerald concepts sleep furiously colorless immature thoughts nap angrily Unigram precision = 4/5

Overall Evaluation Bleu Metric Test Sentence colorless green ideas sleep furiously Gold Standard References all dull jade ideas sleep irately drab emerald concepts sleep furiously colorless immature thoughts nap angrily Unigram precision = 4 / 5 = 0.8 Bigram precision = 2 / 4 = 0.5 Bleu Score = (a 1 a 2 …a n ) 1/n = (0.8 ╳ 0.5) ½ =  63.25

Overall Evaluation Investigating BLEU’s bias towards inflectional variants –SP: Los programas de ajuste estructural se han aplicado rigurosamente. –EN: Structural adjustment programmes had been rigorously implemented. –IBM4: structural adjustment programmes have been applied strictly. –MTDR: programmes of structural adjustment have been added rigurosament.

Overall Evaluation Inflectional Normalization