Machine Translation Overview

Machine Translation Overview
Alon Lavie Language Technologies Institute Carnegie Mellon University LTI Immigration Course August 24, 2006

Machine Translation: History
MT started in 1940’s, one of the first conceived application of computers Promising “toy” demonstrations in the 1950’s, failed miserably to scale up to “real” systems AIPAC Report: MT recognized as an extremely difficult, “AI-complete” problem in the early 1960’s MT Revival started in earnest in 1980s (US, Japan) Field dominated by rule-based approaches, requiring 100s of K-years of manual development Economic incentive for developing MT systems for small number of language pairs (mostly European languages) March 24, 2006 LTI IC 2006

Machine Translation: Where are we today?
Age of Internet and Globalization – great demand for MT: Multiple official languages of UN, EU, Canada, etc. Documentation dissemination for large manufacturers (Microsoft, IBM, Caterpillar) Economic incentive is still primarily within a small number of language pairs Some fairly good commercial products in the market for these language pairs Primarily a product of rule-based systems after many years of development Pervasive MT between most language pairs still non-existent and not on the immediate horizon March 24, 2006 LTI IC 2006

Example of Current Best MT
PAHO’s Spanam system: Mediante petición recibida por la Comisión Interamericana de Derechos Humanos (en adelante …) el 6 de octubre de 1997, el señor Lino César Oviedo (en adelante …) denunció que la República del Paraguay (en adelante …) violó en su perjuicio los derechos a las garantías judiciales … en su contra. Through petition received by the `Inter-American Commission on Human Rights` (hereinafter …) on 6 October 1997, Mr. Linen César Oviedo (hereinafter “the petitioner”) denounced that the Republic of Paraguay (hereinafter …) violated to his detriment the rights to the judicial guarantees, to the political participation, to // equal protection and to the honor and dignity consecrated in articles 8, 23, 24 and 11, respectively, of the `American Convention on Human Rights` (hereinafter …”), as a consequence of judgments initiated against it. March 24, 2006 LTI IC 2006

Core Challenges of MT Ambiguity: Amount of required knowledge:
Human languages are highly ambiguous, and differently in different languages Ambiguity at all “levels”: lexical, syntactic, semantic, language-specific constructions and idioms Amount of required knowledge: At least several 100k words, at least as many phrases, plus syntactic knowledge (i.e. translation rules). How do you acquire and construct a knowledge base that big that is (even mostly) correct and consistent? March 24, 2006 LTI IC 2006

How to Tackle the Core Challenges
Manual Labor: 1000s of person-years of human experts developing large word and phrase translation lexicons and translation rules. Example: Systran’s RBMT systems. Lots of Parallel Data: data-driven approaches for finding word and phrase correspondences automatically from large amounts of sentence-aligned parallel texts. Example: Statistical MT systems. Learning Approaches: learn translation rules automatically from small amounts of human translated and word-aligned data. Example: AVENUE’s XFER approach. Simplify the Problem: build systems that are limited-domain or constrained in other ways. Examples: CATALYST, NESPOLE!. March 24, 2006 LTI IC 2006

State-of-the-Art in MT
What users want: General purpose (any text) High quality (human level) Fully automatic (no user intervention) We can meet any 2 of these 3 goals today, but not all three at once: FA HQ: Knowledge-Based MT (KBMT) FA GP: Corpus-Based (Example-Based) MT GP HQ: Human-in-the-loop (efficiency tool) March 24, 2006 LTI IC 2006

Types of MT Applications:
Assimilation: multiple source languages, uncontrolled style/topic. General purpose MT, no semantic analysis. (GP FA or GP HQ) Dissemination: one source language, controlled style, single topic/domain. Special purpose MT, full semantic analysis. (FA HQ) Communication: Lower quality may be okay, but system robustness, real-time required. March 24, 2006 LTI IC 2006

Approaches to MT: Vaquois MT Triangle
Interlingua Give-information+personal-data (name=alon_lavie) Generation Analysis Transfer [s [vp accusative_pronoun “chiamare” proper_name]] [s [np [possessive_pronoun “name”]] [vp “be” proper_name]] Direct Mi chiamo Alon Lavie My name is Alon Lavie March 24, 2006 LTI IC 2006

Analysis and Generation Main Steps
Morphological analysis (word-level) and POS tagging Syntactic analysis and disambiguation (produce syntactic parse-tree) Semantic analysis and disambiguation (produce symbolic frames or logical form representation) Map to language-independent Interlingua Generation: Generate semantic representation in TL Sentence Planning: generate syntactic structure and lexical selections for concepts Surface-form realization: generate correct forms of words March 24, 2006 LTI IC 2006

Direct Approaches No intermediate stage in the translation
First MT systems developed in the 1950’s-60’s (assembly code programs) Morphology, bi-lingual dictionary lookup, local reordering rules “Word-for-word, with some local word-order adjustments” Modern Approaches: EBMT and SMT March 24, 2006 LTI IC 2006

Statistical MT (SMT) Proposed by IBM in early 1990s: a direct, purely statistical, model for MT Statistical translation models are trained on a sentence-aligned parallel bilingual corpus Train word-level alignment models Extract phrase-to-phrase correspondences Apply them at runtime on source input and “decode” Attractive: completely automatic, no manual rules, much reduced manual labor Main drawbacks: Effective only with large volumes (several mega-words) of parallel text Broad domain, but domain-sensitive Still viable only for small number of language pairs! Impressive progress in last 5 years Large DARPA funding programs (TIDES, GALE) Lots of research in this direction GIZA++, Pharoah, CAIRO March 24, 2006 LTI IC 2006

Matches to Source Found
EBMT Paradigm New Sentence (Source) Yesterday, 200 delegates met with President Clinton. Matches to Source Found Yesterday, 200 delegates met behind closed doors… Difficulties with President Clinton… Gestern trafen sich 200 Abgeordnete hinter verschlossenen… Schwierigkeiten mit Praesident Clinton… Alignment (Sub-sentential) Yesterday, 200 delegates met behind closed doors… Difficulties with President Clinton over… Gestern trafen sich 200 Abgeordnete hinter verschlossenen… Schwierigkeiten mit Praesident Clinton… Translated Sentence (Target) March 24, 2006 Gestern trafen sich 200 Abgeordnete mit Praesident Clinton.

Transfer Approaches Syntactic Transfer: Semantic Transfer:
Analyze SL input sentence to its syntactic structure (parse tree) Transfer SL parse-tree to TL parse-tree (various formalisms for specifying mappings) Generate TL sentence from the TL parse-tree Semantic Transfer: Analyze SL input to a language-specific semantic representation (i.e., Case Frames, Logical Form) Transfer SL semantic representation to TL semantic representation Generate syntactic structure and then surface sentence in the TL March 24, 2006 LTI IC 2006

Transfer Approaches Main Advantages and Disadvantages:
Syntactic Transfer: No need for semantic analysis and generation Syntactic structures are general, not domain specific  Less domain dependent, can handle open domains Requires word translation lexicon Semantic Transfer: Requires deeper analysis and generation, symbolic representation of concepts and predicates  difficult to construct for open or unlimited domains Can better handle non-compositional meaning structures  can be more accurate No word translation lexicon – generate in TL from symbolic concepts March 24, 2006 LTI IC 2006

Knowledge-based Interlingual MT
The classic “deep” Artificial Intelligence approach: Analyze the source language into a detailed symbolic representation of its meaning Generate this meaning in the target language “Interlingua”: one single meaning representation for all languages Nice in theory, but extremely difficult in practice: What kind of representation? What is the appropriate level of detail to represent? How to ensure that the interlingua is in fact universal? “Demonstration” is now set of KANT interlinguas Downloaded here from AMTA interlingua workshop webpage: clicking on sentences show the IL Note use of phrasal lexical units! Start with #4: “simple” example (with a 2-level genl. quantifier!) 1: rel. clause 3: conj “but”, coordination, another genl. quant. 5: implicit rel. clause, apposition 6: “latter”/”former” 7: appositive VP 8: quote marks 13/14: complex [numbers different on frontpage and files!] 21/22: “all” is (:OR SINGULAR PLURAL) [ambiguity packing!] March 24, 2006 LTI IC 2006

Interlingua versus Transfer
With interlingua, need only N parsers/ generators instead of N2 transfer systems: L2 L2 L3 L1 L3 L1 interlingua L6 L4 L6 L4 L5 L5 March 24, 2006 LTI IC 2006

Multi-Engine MT Apply several MT engines to each input in parallel
Create a combined translation from the individual translations Goal is to combine strengths, and avoid weaknesses. Along all dimensions: domain limits, quality, development time/cost, run-time speed, etc. Various approaches to the problem March 24, 2006 LTI IC 2006

Speech-to-Speech MT Speech just makes MT (much) more difficult:
Spoken language is messier False starts, filled pauses, repetitions, out-of-vocabulary words Lack of punctuation and explicit sentence boundaries Current Speech technology is far from perfect Need for speech recognition and synthesis in foreign languages Robustness: MT quality degradation should be proportional to SR quality Tight Integration: rather than separate sequential tasks, can SR + MT be integrated in ways that improves end-to-end performance? March 24, 2006 LTI IC 2006

Major Sources of Translation Problems
Lexical Differences: Multiple possible translations for SL word, or difficulties expressing SL word meaning in a single TL word Structural Differences: Syntax of SL is different than syntax of the TL: word order, sentence and constituent structure Differences in Mappings of Syntax to Semantics: Meaning in TL is conveyed using a different syntactic structure than in the SL Idioms and Constructions March 24, 2006 LTI IC 2006

MT at the LTI LTI originated as the Center for Machine Translation (CMT) in 1985 MT continues to be a prominent sub-discipline of research with the LTI More MT faculty than any of the other areas More MT faculty than anywhere else Active research on all main approaches to MT: Interlingua, Transfer, EBMT, SMT Leader in the area of speech-to-speech MT Multi-Engine MT (MEMT) MT Evaluation (METEOR, BLANC) March 24, 2006 LTI IC 2006

KBMT: KANT, KANTOO, CATALYST
Deep knowledge-based framework, with symbolic interlingua as intermediate representation Syntactic and semantic analysis into a unambiguous detailed symbolic representation of meaning using unification grammars and transformation mappers Generation into the target language using unification grammars and transformation mappers First large-scale multi-lingual interlingua-based MT system deployed commercially: CATALYST at Caterpillar: high quality translation of documentation manuals for heavy equipment Limited domains and controlled English input Minor amounts of post-editing Active follow-on projects Contact Faculty: Eric Nyberg and Teruko Mitamura March 24, 2006 LTI IC 2006

EBMT Developed originally for the PANGLOSS system in the early 1990s
Translation between English and Spanish Generalized EBMT under development for the past several years Used in a variety of projects in recent years DARPA TIDES and GALE programs DIPLOMAT and TONGUES Active research work on improving alignment and indexing, decoding from a lattice Contact Faculty: Ralf Brown and Jaime Carbonell March 24, 2006 LTI IC 2006

Statistical MT Word-to-word and phrase-to-phrase translation pairs are acquired automatically from data and assigned probabilities based on a statistical model Extracted and trained from very large amounts of sentence-aligned parallel text Word alignment algorithms Phrase detection algorithms Translation model probability estimation Main approach pursued in CMU systems in the DARPA/TIDES program and now in GALE Chinese-to-English and Arabic-to-English Most active work is on phrase detection and on advanced decoding techniques Contact Faculty: Stephan Vogel and Alex Waibel March 24, 2006 LTI IC 2006

Speech-to-Speech MT Evolution from JANUS/C-STAR systems to NESPOLE!, LingWear, BABYLON, TC-STAR Early 1990s: first prototype system that fully performed sp-to-sp (very limited domains) Interlingua-based, but with shallow task-oriented representations: “we have single and double rooms available” [give-information+availability] (room-type={single, double}) Semantic Grammars for analysis and generation Multiple languages: English, German, French, Italian, Japanese, Korean, and others Stat-MT applied in Speech-to-Speech scenarios Most active work on portable speech translation on small devices: Arabic/English and Thai/English Contact Faculty: Alan Black, Stephan Vogel, Tanja Schultz and Alex Waibel March 24, 2006 LTI IC 2006

AVENUE/LETRAS: Learning-based Transfer MT
Develop new approaches for automatically acquiring syntactic MT transfer rules from small amounts of elicited translated and word-aligned data Specifically designed to bootstrap MT for languages for which only limited amounts of electronic resources are available (particularly indigenous minority languages) Use machine learning techniques to generalize transfer rules from specific translated examples Combine with SMT-inspired decoding techniques for producing the best translation of new input from a lattice of translation segments Languages: Hebrew, Hindi, Mapudungun, Quechua Most active work on designing a typologically comprehensive elicitation corpus, advanced techniques for automatic rule learning, improved decoding, and rule refinement via user interaction Contact Faculty: Alon Lavie, Lori Levin, Jaime Carbonell and Bob Frederking March 24, 2006 LTI IC 2006

Multi-Engine MT New approach developed over past two years under DoD and DARPA funding (used in GALE) Main ideas: Treat original engines as “black boxes” Align the word and phrase correspondences between the translations Build a collection of synthetic combinations based on the aligned words and phrases Score the synthetic combinations based on Language Model and confidence measures Select the top-scoring synthetic combination Architecture Issues: integrating “workflows” that produce multiple translations and then combine them with MEMT IBM’s UIMA architecture Contact Faculty: Alon Lavie March 24, 2006 LTI IC 2006

Synthetic Combination MEMT
Two Stage Approach: Align: Identify common words and phrases across the translations provided by the engines Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: announced afghan authorities on saturday reconstituted four intergovernmental committees The Afghan authorities on Saturday the formation of the four committees of government March 24, 2006 LTI IC 2006

Synthetic Combination MEMT
Two Stage Approach: Align: Identify common words and phrases across the translations provided by the engines Decode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translation Example: announced afghan authorities on saturday reconstituted four intergovernmental committees The Afghan authorities on Saturday the formation of the four committees of government MEMT: the afghan authorities announced on Saturday the formation of four intergovernmental committees March 24, 2006 LTI IC 2006

Automatic MT Evaluation
METEOR: new metric developed at CMU Improves upon BLEU metric developed by IBM and used extensively in recent years Main ideas: Assess the similarity between a machine-produced translation and (several) human reference translations Similarity is based on word-to-word matching that matches: Identical words Morphological variants of same word (stemming) synonyms Similarity is based on weighted combination of Precision and Recall Address fluency/grammaticality via a direct penalty: how well-ordered is the matching of the MT output with the reference? Improved levels of correlation with human judgments of MT Quality Contact Faculty: Alon Lavie March 24, 2006 LTI IC 2006

The METEOR Metric Example: P = 5/8 =0.625 R = 5/14 = 0.357
Reference: “the Iraqi weapons are to be handed over to the army within two weeks” MT output: “in two weeks Iraq’s weapons will give army” Matching: Ref: Iraqi weapons army two weeks MT: two weeks Iraq’s weapons army P = 5/8 = R = 5/14 = 0.357 Fmean = 10*P*R/(9P+R) = Fragmentation: 3 frags of 5 words = (3-1)/(5-1) = 0.50 Discounting factor: DF = 0.5 * (frag**3) = Final score: Fmean * (1- DF) = * = March 24, 2006 LTI IC 2006

Summary Main challenges for current state-of-the-art MT approaches - Coverage and Accuracy: Acquiring broad-coverage high-accuracy translation lexicons (for words and phrases) learning syntactic mappings between languages from parallel word-aligned data overcoming syntax-to-semantics differences and dealing with constructions Stronger Target Language Modeling March 24, 2006 LTI IC 2006

Questions… March 24, 2006 LTI IC 2006

Example Sys1: feature prominently venezuela ranked fifth in exporting oil field in the world and eighth in production Sys2: Venezuela is occupied by the fifth place to export oil in the world, eighth in production Sys3: Venezuela the top ranked fifth in the oil export in the world and the eighth in the production MEMT Sentence : Selected : venezuela is the top ranked fifth in the oil export in the world to eighth in production. March 24, 2006 LTI IC 2006

MEMT Example IBM: korea stands ready to allow visits to verify that it does not manufacture nuclear weapons ISI: North Korea Is Prepared to Allow Washington to Verify that It Does Not Make Nuclear Weapons CMU: North Korea prepared to allow Washington to the verification of that is to manufacture nuclear weapons Selected MEMT Sentence : north korea is prepared to allow washington to verify that it does not manufacture nuclear weapons ( ) March 24, 2006 LTI IC 2006

Example Sys1: announced afghan authorities on Saturday reconstituted four intergovernmental committees accelerate the process of disarmament removal packing between fighters and pictures of war are still have enjoyed substantial influence Sys2: The Afghan authorities on Saturday the formation of the four committees of government to speed up the process of disarmament demobilization of fighters of the leaders of the war who still have a significant influence. Sys3: the authorities announced Saturday Afghan form four committees government accelerate the process of disarmament and complete disarmament and demobilization followed the leaders of the war who continues to enjoy considerable influence MEMT Sentence : Selected : the afghan authorities on Saturday announced the formation of the four committees of government to speed up the process of disarmament and demobilization of fighters of the leaders of the war who still have a significant influence. March 24, 2006 LTI IC 2006

MEMT Example IBM: the sri lankan prime minister criticizes head of the country's : ISI: The President of the Sri Lankan Prime Minister Criticized the President of the Country : CMU: Lankan Prime Minister criticizes her country: MEMT Sentence : Selected: the sri lankan prime minister criticizes president of the country March 24, 2006 LTI IC 2006

Example Sys1: victims russians are one man and his wife and abusing their eight year old daughter plus a ( 11 and 7 years ) man and his wife and driver , egyptian nationality . : Sys2: The victims were Russian man and his wife, daughter of the most from the age of eight years in addition to the young girls ) 11 7 years ( and a man and his wife and the bus driver Egyptian nationality. : Sys3: the victims Cruz man who wife and daughter both critical of the eight years old addition to two Orient ( 11 ) 7 years ) woman , wife of bus drivers Egyptian nationality . : MEMT Sentence : Selected : the victims were russian man and his wife and daughter of the eight years from the age of a 11 and 7 years in addition to man and his wife and bus drivers egyptian nationality Oracle : the victims were russian man and wife and his daughter of the eight years old from the age of a 11 and 7 years in addition to the man and his wife and bus drivers egyptian nationality young girls March 24, 2006 LTI IC 2006

Lexical Differences SL word has several different meanings, that translate differently into TL Ex: financial bank vs. river bank Lexical Gaps: SL word reflects a unique meaning that cannot be expressed by a single word in TL Ex: English snub doesn’t have a corresponding verb in French or German TL has finer distinctions than SL  SL word should be translated differently in different contexts Ex: English wall can be German wand (internal), mauer (external) March 24, 2006 LTI IC 2006

Lexical Differences Lexical gaps:
Examples: these have no direct equivalent in English: gratiner (v., French, “to cook with a cheese coating”) ōtosanrin (n., Japanese, “three-wheeled truck or van”) March 24, 2006 LTI IC 2006

Lexical Differences [From Hutchins & Somers] March 24, 2006
LTI IC 2006

MT Handling of Lexical Differences
Direct MT and Syntactic Transfer: Lexical Transfer stage uses bilingual lexicon SL word can have multiple translation entries, possibly augmented with disambiguation features or probabilities Lexical Transfer can involve use of limited context (on SL side, TL side, or both) Lexical Gaps can partly be addressed via phrasal lexicons Semantic Transfer: Ambiguity of SL word must be resolved during analysis  correct symbolic representation at semantic level TL Generation must select appropriate word or structure for correctly conveying the concept in TL March 24, 2006 LTI IC 2006

Structural Differences
Syntax of SL is different than syntax of the TL: Word order within constituents: English NPs: art adj n the big boy Hebrew NPs: art n art adj ha yeled ha gadol Constituent structure: English is SVO: Subj Verb Obj I saw the man Modern Arabic is VSO: Verb Subj Obj Different verb syntax: Verb complexes in English vs. in German I can eat the apple Ich kann den apfel essen Case marking and free constituent order German and other languages that mark case: den apfel esse Ich the(acc) apple eat I(nom) March 24, 2006 LTI IC 2006

MT Handling of Structural Differences
Direct MT Approaches: No explicit treatment: Phrasal Lexicons and sentence level matches or templates Syntactic Transfer: Structural Transfer Grammars Trigger rule by matching against syntactic structure on SL side Rule specifies how to reorder and re-structure the syntactic constituents to reflect syntax of TL side Semantic Transfer: SL Semantic Representation abstracts away from SL syntax to functional roles  done during analysis TL Generation maps semantic structures to correct TL syntax March 24, 2006 LTI IC 2006

Syntax-to-Semantics Differences
Meaning in TL is conveyed using a different syntactic structure than in the SL Changes in verb and its arguments Passive constructions Motion verbs and state verbs Case creation and case absorption Main Distinction from Structural Differences: Structural differences are mostly independent of lexical choices and their semantic meaning  addressed by transfer rules that are syntactic in nature Syntax-to-semantic mapping differences are meaning-specific: require the presence of specific words (and meanings) in the SL March 24, 2006 LTI IC 2006

Structure-change example: I like swimming “Ich scwhimme gern” I swim gladly March 24, 2006 LTI IC 2006

Verb-argument example: Jones likes the film. “Le film plait à Jones.” (lit: “the film pleases to Jones”) Use of case roles can eliminate the need for this type of transfer Jones = Experiencer film = Theme March 24, 2006 LTI IC 2006

Passive Constructions Example: French reflexive passives: Ces livres se lisent facilement *”These books read themselves easily” These books are easily read March 24, 2006 LTI IC 2006

Same intention, different syntax
rigly bitiwgacny my leg hurts candy wagac fE rigly I have pain in my leg rigly bitiClimny fE wagac fE rigly there is pain in my leg rigly bitinqaH calya my leg bothers on me Romanization of Arabic from CallHome Egypt. March 24, 2006 LTI IC 2006

MT Handling of Syntax-to-Semantics Differences
Direct MT Approaches: No Explicit treatment: Phrasal Lexicons and sentence level matches or templates Syntactic Transfer: “Lexicalized” Structural Transfer Grammars Trigger rule by matching against “lexicalized” syntactic structure on SL side: lexical and functional features Rule specifies how to reorder and re-structure the syntactic constituents to reflect syntax of TL side Semantic Transfer: SL Semantic Representation abstracts away from SL syntax to functional roles  done during analysis TL Generation maps semantic structures to correct TL syntax March 24, 2006 LTI IC 2006

Example of Structural Transfer Rule (verb-argument)
[From Hutchins & Somers] March 24, 2006 LTI IC 2006

Semantic Transfer: Theta Structure (case roles)
[From Hutchins & Somers] Abstracts away from grammatical functions Looks more like a “semantic f-structure” The basis for “semantic transfer” March 24, 2006 LTI IC 2006

Idioms and Constructions
Main Distinction: meaning of whole is not directly compositional from meaning of its sub-parts  no compositional translation Examples: George is a bull in a china shop He kicked the bucket Can you please open the window? March 24, 2006 LTI IC 2006

Formulaic Utterances Good night. tisbaH cala xEr waking up on good
Romanization of Arabic from CallHome Egypt March 24, 2006 LTI IC 2006

Constructions Identifying speaker intention rather than literal meaning for formulaic and task-oriented sentences. How about … suggestion Why don’t you… suggestion Could you tell me… request info. I was wondering… request info. March 24, 2006 LTI IC 2006

MT Handling of Constructions and Idioms
Direct MT Approaches: No Explicit treatment: Phrasal Lexicons and sentence level matches or templates Syntactic Transfer: No effective treatment “Highly Lexicalized” Structural Transfer rules can handle some constructions Trigger rule by matching against entire construction, including structure on SL side Rule specifies how to generate the correct construction on the TL side Semantic Transfer: Analysis must capture non-compositional representation of the idiom or construction  specialized rules TL Generation maps construction semantic structures to correct TL syntax and lexical words March 24, 2006 LTI IC 2006

Transfer with Strong Decoding
Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP NP] ((X2::Y1) (X1::Y2)) Translation Lexicon Run Time Transfer System Lattice Decoder English Language Model Word-to-Word Translation Probabilities Word-aligned elicited data March 24, 2006 LTI IC 2006

MT for Minority and Indigenous Languages: Challenges
Minimal amount of parallel text Possibly competing standards for orthography/spelling Often relatively few trained linguists Access to native informants possible Need to minimize development time and cost March 24, 2006 LTI IC 2006

Learning Transfer-Rules for Languages with Limited Resources
Rationale: Large bilingual corpora not available Bilingual native informant(s) can translate and align a small pre-designed elicitation corpus, using elicitation tool Elicitation corpus designed to be typologically comprehensive and compositional Transfer-rule engine and new learning approach support acquisition of generalized transfer-rules from the data March 24, 2006 LTI IC 2006

English-Hindi Example
March 24, 2006 LTI IC 2006

GEBMT vs. Statistical MT
Generalized-EBMT (GEBMT) uses examples at run time, rather than training a parameterized model. Thus: GEBMT can work with a smaller parallel corpus than Stat MT Large target language corpus still useful for generating target language model Much faster to “train” (index examples) than Stat MT; until recently was much faster at run time as well Generalizes in a different way than Stat MT (whether this is better or worse depends on match between Statistical model and reality): Stat MT can fail on a training sentence, while GEBMT never will GEBMT generalizations based on linguistic knowledge, rather than statistical model design March 24, 2006 LTI IC 2006

MEMT chart example Russian leaders signed KBMT (0.8) compact of peace
EBMT (0.65) political leaders EBMT (0.9) compact of EBMT (0.7) civilian GLOSS (1.0) tactful DICT (1.0) pact of peace EBMT (1.0) civil expedients bargain for civil peace political Russians subscribe of quiet leaders politic Russian sign compact peace lideres politicos rusos firman pacto de paz Point out overlapping edges March 24, 2006 LTI IC 2006

Why Machine Translation for Minority and Indigenous Languages?
Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers) Is there hope for MT for languages with limited resources? Benefits include: Better government access to indigenous communities (Epidemics, crop failures, etc.) Better indigenous communities participation in information-rich activities (health care, education, government) without giving up their languages. Language preservation Civilian and military applications (disaster relief) March 24, 2006 LTI IC 2006

English-Chinese Example
March 24, 2006 LTI IC 2006

Spanish-Mapudungun Example
March 24, 2006 LTI IC 2006

English-Arabic Example
March 24, 2006 LTI IC 2006

The Elicitation Corpus
Translated, aligned by bilingual informant Corpus consists of linguistically diverse constructions Based on elicitation and documentation work of field linguists (e.g. Comrie 1977, Bouquiaux 1992) Organized compositionally: elicit simple structures first, then use them as building blocks Goal: minimize size, maximize linguistic coverage March 24, 2006 LTI IC 2006

Transfer Rule Formalism
;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) ) Type information Part-of-speech/constituent information Alignments x-side constraints y-side constraints xy-constraints, e.g. ((Y1 AGR) = (X1 AGR)) March 24, 2006 LTI IC 2006

Transfer Rule Formalism (II)
;SL: the old man, TL: ha-ish ha-zaqen NP::NP [DET ADJ N] -> [DET N DET ADJ] ( (X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) ((X1 AGR) = *3-SING) ((X1 DEF = *DEF) ((X3 AGR) = *3-SING) ((X3 COUNT) = +) ((Y1 DEF) = *DEF) ((Y3 DEF) = *DEF) ((Y2 AGR) = *3-SING) ((Y2 GENDER) = (Y4 GENDER)) ) Value constraints Agreement constraints March 24, 2006 LTI IC 2006

The Transfer Engine Analysis Transfer Generation
Source text is parsed into its grammatical structure. Determines transfer application ordering. Example: 他看书。(he read book) S NP VP N V NP 他看书 Transfer A target language tree is created by reordering, insertion, and deletion. he read DET N a book Article “a” is inserted into object NP. Source words translated with transfer lexicon. Generation Target language constraints are checked and final translation produced. E.g. “reads” is chosen over “read” to agree with “he”. Final translation: “He reads a book” March 24, 2006 LTI IC 2006

Rule Learning - Overview
Goal: Acquire Syntactic Transfer Rules Use available knowledge from the source side (grammatical structure) Three steps: Flat Seed Generation: first guesses at transfer rules; flat syntactic structure Compositionality: use previously learned rules to add hierarchical structure Seeded Version Space Learning: refine rules by learning appropriate feature constraints March 24, 2006 LTI IC 2006

Flat Seed Rule Generation
Learning Example: NP Eng: the big apple Heb: ha-tapuax ha-gadol Generated Seed Rule: NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) March 24, 2006 LTI IC 2006

Flat Seed Generation Create a transfer rule that is specific to the sentence pair, but abstracted to the POS level. No syntactic structure. Element Source SL POS sequence f-structure TL POS sequence TL dictionary, aligned SL words Type information corpus, same on SL and TL Alignments informant x-side constraints y-side constraints TL dictionary, aligned SL words (list of projecting features) March 24, 2006 LTI IC 2006

Compositionality Initial Flat Rules:
S::S [ART ADJ N V ART N]  [ART N ART ADJ V P ART N] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2) (X4::Y5) (X5::Y7) (X6::Y8)) NP::NP [ART ADJ N]  [ART N ART ADJ] ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] ((X1::Y1) (X2::Y2)) Generated Compositional Rule: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4)) March 24, 2006 LTI IC 2006

Compositionality - Overview
Traverse the c-structure of the English sentence, add compositional structure for translatable chunks Adjust constituent sequences, alignments Remove unnecessary constraints, i.e. those that are contained in the lower-level rule March 24, 2006 LTI IC 2006

Seeded Version Space Learning
Input: Rules and their Example Sets S::S [NP V NP]  [NP V P NP] {ex1,ex12,ex17,ex26} ((X1::Y1) (X2::Y2) (X3::Y4)) NP::NP [ART ADJ N]  [ART N ART ADJ] {ex2,ex3,ex13} ((X1::Y1) (X1::Y3) (X2::Y4) (X3::Y2)) NP::NP [ART N]  [ART N] {ex4,ex5,ex6,ex8,ex10,ex11} ((X1::Y1) (X2::Y2)) Output: Rules with Feature Constraints: S::S [NP V NP]  [NP V P NP] ((X1::Y1) (X2::Y2) (X3::Y4) (X1 NUM = X2 NUM) (Y1 NUM = Y2 NUM) (X1 NUM = Y1 NUM)) March 24, 2006 LTI IC 2006

Seeded Version Space Learning: Overview
Goal: add appropriate feature constraints to the acquired rules Methodology: Preserve general structural transfer Learn specific feature constraints from example set Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments) Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary The seed rules in a group form the specific boundary of a version space The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints March 24, 2006 LTI IC 2006

Seeded Version Space Learning: Generalization
The partial order of the version space: Definition: A transfer rule tr1 is strictly more general than another transfer rule tr2 if all f-structures that are satisfied by tr2 are also satisfied by tr1. Generalize rules by merging them: Deletion of constraint Raising two value constraints to an agreement constraint, e.g. ((x1 num) = *pl), ((x3 num) = *pl)  ((x1 num) = (x3 num)) March 24, 2006 LTI IC 2006

Seeded Version Space Learning
NP v det n NP VP … Group seed rules into version spaces as above. Make use of partial order of rules in version space. Partial order is defined via the f-structures satisfying the constraints. Generalize in the space by repeated merging of rules: Deletion of constraint Moving value constraints to agreement constraints, e.g. ((x1 num) = *pl), ((x3 num) = *pl)  ((x1 num) = (x3 num) 4. Check translation power of generalized rules against sentence pairs March 24, 2006 LTI IC 2006

Seeded Version Space Learning: The Search
The Seeded Version Space algorithm itself is the repeated generalization of rules by merging A merge is successful if the set of sentences that can correctly be translated with the merged rule is a superset of the union of sets that can be translated with the unmerged rules, i.e. check power of rule Merge until no more successful merges March 24, 2006 LTI IC 2006

Seeded VSL: Some Open Issues
Three types of constraints: X-side constrain applicability of rule Y-side assist in generation X-Y transfer features from SL to TL Which of the three types improves translation performance? Use rules without features to populate lattice, decoder will select the best translation… Learn only X-Y constraints, based on list of universal projecting features Other notions of version-spaces of feature constraints: Current feature learning is specific to rules that have identical transfer components Important issue during transfer is to disambiguate among rules that have same SL side but different TL side – can we learn effective constraints for this? March 24, 2006 LTI IC 2006

Examples of Learned Rules (Hindi-to-English)
{NP,14244} ;;Score:0.0429 NP::NP [N] -> [DET N] ( (X1::Y2) ) {NP,14434} ;;Score:0.0040 NP::NP [ADJ CONJ ADJ N] -> [ADJ CONJ ADJ N] (X1::Y1) (X2::Y2) (X3::Y3) (X4::Y4) {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP NP] ( (X2::Y1) (X1::Y2) ) March 24, 2006 LTI IC 2006

Manual Transfer Rules: Hindi Example
;; PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT VERB ;; passive of 43 (7b) {VP,28} VP::VP : [V V V] -> [Aux V] ( (X1::Y2) ((x1 form) = root) ((x2 type) =c light) ((x2 form) = part) ((x2 aspect) = perf) ((x3 lexwx) = 'jAnA') ((x3 form) = part) ((x3 aspect) = perf) (x0 = x1) ((y1 lex) = be) ((y1 tense) = past) ((y1 agr num) = (x3 agr num)) ((y1 agr pers) = (x3 agr pers)) ((y2 form) = part) ) March 24, 2006 LTI IC 2006

Manual Transfer Rules: Example
NP PP NP1 NP P Adj N N1 ke eka aXyAya N jIvana NP NP PP Adj N P NP one chapter of N1 N life ; NP1 ke NP2 -> NP2 of NP1 ; Ex: jIvana ke eka aXyAya ; life of (one) chapter ; ==> a chapter of life ; {NP,12} NP::NP : [PP NP1] -> [NP1 PP] ( (X1::Y2) (X2::Y1) ; ((x2 lexwx) = 'kA') ) {NP,13} NP::NP : [NP1] -> [NP1] (X1::Y1) {PP,12} PP::PP : [NP Postp] -> [Prep NP] March 24, 2006 LTI IC 2006

A Limited Data Scenario for Hindi-to-English
Conducted during a DARPA “Surprise Language Exercise” (SLE) in June 2003 Put together a scenario with “miserly” data resources: Elicited Data corpus: phrases Cleaned portion (top 12%) of LDC dictionary: ~2725 Hindi words (23612 translation pairs) Manually acquired resources during the SLE: 500 manual bigram translations 72 manually written phrase transfer rules 105 manually written postposition rules 48 manually written time expression rules No additional parallel text!! March 24, 2006 LTI IC 2006

Manual Grammar Development
Covers mostly NPs, PPs and VPs (verb complexes) ~70 grammar rules, covering basic and recursive NPs and PPs, verb complexes of main tenses in Hindi (developed in two weeks) March 24, 2006 LTI IC 2006

Adding a “Strong” Decoder
XFER system produces a full lattice of translation fragments, ranging from single words to long phrases or sentences Edges are scored using word-to-word translation probabilities, trained from the limited bilingual data Decoder uses an English LM (70m words) Decoder can also reorder words or phrases (up to 4 positions ahead) For XFER(strong) , ONLY edges from basic XFER system are used! March 24, 2006 LTI IC 2006

Testing Conditions Tested on section of JHU provided data: 258 sentences with four reference translations SMT system (stand-alone) EBMT system (stand-alone) XFER system (naïve decoding) XFER system with “strong” decoder No grammar rules (baseline) Manually developed grammar rules Automatically learned grammar rules XFER+SMT with strong decoder (MEMT) March 24, 2006 LTI IC 2006

Automatic MT Evaluation Metrics
Intends to replace or complement human assessment of translation quality of MT produced translation Principle idea: compare how similar is the MT produced translation with human translation(s) of the same input Main metric in use today: IBM’s BLEU Count n-gram (unigrams, bigrams, trigrams, etc) overlap between the MT output and several reference translations Calculate a combined n-gram precision score NIST variant of BLEU used for official DARPA evaluations March 24, 2006 LTI IC 2006

Results on JHU Test Set System BLEU M-BLEU NIST EBMT 0.058 0.165 4.22
SMT 0.093 0.191 4.64 XFER (naïve) man grammar 0.055 0.177 4.46 XFER (strong) no grammar 0.109 0.224 5.29 XFER (strong) learned grammar 0.116 0.231 5.37 XFER (strong) man grammar 0.135 0.243 5.59 XFER+SMT 0.136 5.65 March 24, 2006 LTI IC 2006

Effect of Reordering in the Decoder
March 24, 2006 LTI IC 2006

Observations and Lessons (I)
XFER with strong decoder outperformed SMT even without any grammar rules in the miserly data scenario SMT Trained on elicited phrases that are very short SMT has insufficient data to train more discriminative translation probabilities XFER takes advantage of Morphology Token coverage without morphology: Token coverage with morphology: Manual grammar currently somewhat better than automatically learned grammar Learned rules did not yet use version-space learning Large room for improvement on learning rules Importance of effective well-founded scoring of learned rules March 24, 2006 LTI IC 2006

Observations and Lessons (II)
MEMT (XFER and SMT) based on strong decoder produced best results in the miserly scenario. Reordering within the decoder provided very significant score improvements Much room for more sophisticated grammar rules Strong decoder can carry some of the reordering “burden” March 24, 2006 LTI IC 2006

XFER MT for Hebrew-to-English
Two month intensive effort to apply our XFER approach to the development of a Hebrew-to-English MT system Challenges: No large parallel corpus Only limited coverage translation lexicon Morphology: incomplete analyzer available Plan: Collect available resources, establish methodology for processing Hebrew input Translate and align Elicitation Corpus Learn XFER rules Develop (small) manual XFER grammar as a point of comparison Evaluate performance on unseen test data using automatic evaluation metrics March 24, 2006 LTI IC 2006

Hebrew-to-English XFER System
First end-to-end integration of system completed yesterday (March 2nd) No transfer rules yet, just word-to-word Hebrew-to-English translation No strong decoding yet Amusing Example: office brains the government crack H$BW& in committee the elections the central et the possibility conduct poll crowd about TWKNIT the NSIGH from goat March 24, 2006 LTI IC 2006

Conclusions Transfer rules (both manual and learned) offer significant contributions that can complement existing data-driven approaches Also in medium and large data settings? Initial steps to development of a statistically grounded transfer-based MT system with: Rules that are scored based on a well-founded probability model Strong and effective decoding that incorporates the most advanced techniques used in SMT decoding Working from the “opposite” end of research on incorporating models of syntax into “standard” SMT systems [Knight et al] Our direction makes sense in the limited data scenario March 24, 2006 LTI IC 2006

Future Directions Continued work on automatic rule learning (especially Seeded Version Space Learning) Improved leveraging from manual grammar resources, interaction with bilingual speakers Developing a well-founded model for assigning scores (probabilities) to transfer rules Improving the strong decoder to better fit the specific characteristics of the XFER model MEMT with improved Combination of output from different translation engines with different scorings strong decoding capabilities March 24, 2006 LTI IC 2006

Language Modeling for MT
Technique stolen from Speech Recognition Try to match the statistics of English Trigram example: “George W. …” Combine quality score with trigram score, to factor in “English-like-ness” Problem: this gives billions of possible overall translations Solution: “beam search”. At each step, throw out all but the “best” possibilities March 24, 2006 LTI IC 2006

Speech-to-speech translation for eCommerce
CMU, Karlsruhe, IRST, CLIPS, 2 commercial partners Improved limited-domain speech translation Experiment with multimodality and with MEMT EU-side has strict scheduling and deliverables First test domain: Italian travel agency Second “showcase”: international Help desk Tied in to CSTAR-III March 24, 2006 LTI IC 2006

Machine Translation Overview

Similar presentations

Presentation on theme: "Machine Translation Overview"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Translation Overview

Similar presentations

Presentation on theme: "Machine Translation Overview"— Presentation transcript:

Similar presentations

About project

Feedback