Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules
Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004 ! posar arbre amb fulles Thank the advisors and the committee and the audience for attending

Interactive and Automatic Rule Refinement
Outline Introduction Thesis statement and scope Preliminary Research Interactive elicitation of error information A framework for automatic rule adaptation Proposed Research Contributions and Thesis Timeline Motivation, related work, goals Interactive and Automatic Rule Refinement

Machine Translation (MT)
Source Language (SL) sentence: Gaudi was a great artist Spanish translation: Gaudi era un gran artista MT System outputs : *Gaudi estaba un artista grande  *Gaudi era un artista grande If we are given the sentence (SL): Gaudi is a great artist we would like our MT system to translate it like: Gaudi es un gran artista But instead our MT system outputs two incorrect Target Language (TL) sentences Interactive and Automatic Rule Refinement

Completed Work Spanish Adjectives Automatic Rule Adaptation General order: grande  big in size NP DET N ADJ DET ADJ N a big house una casa grande Exception: gran  exceptional This is an exception to the rule, since “gran” is a pre-nominal adjective NP DET ADJ N a great artist un gran artista Interactive and Automatic Rule Refinement

Commercial and Online Systems
Correct Translation: Gaudi era un gran artista Systran, Babelfish (Altavista), WorldLingo, Translated.net : *Gaudi era  gran artista ImTranslation: *El Gaudi era un gran artista 1-800-Translate  *Gaudi era un fenomenal artista Even for such a short and simple sentence like this, the output of commercial MT systems is often incorrect. In this case, and unlike the output of our system, the meaning is pretty much preserved, but more often than not, State-of-the-art MT systems mistranslate sentences giving them a totally incorrect meaning. [Show this is a big/general problem -> solution has great impact Other MT systems also produce incorrect translations  MT output requires post-editing] ImTranslation: translation2.paralink.com/ Interactive and Automatic Rule Refinement

Post-editing Current solutions: Post-editing [Allen, 2003] by human linguists or editors (experts) Automated post-edition module (APE) [Allen & Hogan, 2000] to alleviate the tedious task of correcting most frequent errors over and over No solution to fully automate post-editing process MT output still requires post-editing!!! post-editing: the correction of MT output by human linguists or editors human post-editors: some tools exist to assist in this tedious task – - APE: prototype model, much better before my research, as far as I know, there is no solution to fully automate the post-editing process, in the sense of automatically learning the post-editing rules that would fix any MT output. Interactive and Automatic Rule Refinement

Drawbacks of Current Methods
Manual post-editing  Corrections do not generalize  Gaudi era un artista grande  Juan es un amigo grande (Juan is a great friend)  Era una oportunidad grande (It is a great opportunity) APE  Humans need to predict all the errors ahead of time and code for the post-editing rules; given new error   Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data. Manual post-editing: most common method very tedious, since they need to do the same actions over and over for the same MT error, it would be nice to automatize APE: prototype module to do precisely that, but errors still need to be detected and predicted by a human first and then post-editing rules need to be written manually for each type of error In the face of a new error type, the APE doesn’t do anything, human editor needs to fix it, over an over. Interactive and Automatic Rule Refinement

My Solution Automate post-editing efforts by feeding them back into the MT system. Possible alternatives: Automatic learning of post-editing rules + system independent - several thousands of sentences might need to be corrected for the same error Automatic refinement of translation rules + attacks the core of the problem for transfer-based MT systems (need rules to fix!) ALPER: system independent, but interaction of two incorrect rules can generate thousands of (very) incorrect sentences ARTR: system dependent, goes to the core of the problem, fixing one rule can prevent thousands of incorrect sentences from being generated (both are language-pair dependent) Interactive and Automatic Rule Refinement

Related Work [Corston-Oliver & Gammon, 2003] [Imamura et al. 2003] [Menezes & Richardson, 2001] [Brill, 1993] [Gavaldà, 2000] [Callison-Burch, 2004] Fixing Machine Translation Rule Adaptation [Su et al. 1995] My Thesis Post-editing My research vs. Post-editing: replacing expert post-editor by a non-expert bilingual speakers + software but it does more than just mere correcting the current sentence  it corrects the problem in its core so that the same error would not occur twice. Callison-Burch  Fixing MT: experts fixing alignments (training data) for SMT Su et al  adjusting evaluation parameters of a SMT to respect user style My research: transfer-based MT systems (there need to be rules to refine) The idea of Rule Adaptation to correct language Technology applications is not new. There have been researchers applying RA to (among other NLP applications): - parsing [brill 93] (reduce error by learning a set of simple structural transformations), - NLU [gavalda, 2000] (mechanism to enable a non-expert user to dynamically extend the coverage of a NLU module, just by answering simple classification questions) - and, more relevant for my thesis, to MT [click] where researchers have applied different techniques ranging from [Corston-Oliver and Gammon] DTs to fix automatically learned linguistic representation to reduce noise (very similar to our goal!) to [Imamura and Menezes and Richardson] comparing MT translations with human reference translations to correct and eliminate redundant rules. In contrast to filtering out incorrect or redundant rules, though, I propose to actually refine and expand the translation rules themselves, by editing valid but inaccurate rules that might be lacking a constraint for example, which will result into correctly translating sentences, that were incorrectly translated before. Furthermore, and most importantly, my research does not rely on the existence of a parallel training corpus or a set of human reference translations, unlike all other existing methods. No pre-existing training data required No human reference translations required Non-expert user feedback [Allen & Hogan, 2000] Interactive and Automatic Rule Refinement

Resource-poor Scenarios (AVENUE)
Lack of electronic parallel data Lack of computational linguists  Lack of manual grammar Why bother? Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.) Preservation of their language and culture Resource-poor Languages: Mapudungun Quechua Aymara My research approach can be applied to any language-pair (given a transfer MT system), but it’s specifically designed to be able to tackle languages with little or no electronic resources, such as Mapudungun and Quechua, which makes the problem of refining translation rules even harder. Since my work is embedded in the AVENUE project, I can’t assume that there is any pre-existing training corpus or even reference translations for my testing sets. 1. (often oral tradition)  SMT and EBMT out of the question 2. lack of computational linguists with knowledge of the RP language Interactive and Automatic Rule Refinement

How is MT possible for resource-poor languages?
Bilingual speakers Interactive and Automatic Rule Refinement

AVENUE Project Overview
Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Word-Aligned Parallel Corpus Elicitation Tool Elicitation Corpus Elicitation Rule Learning Run-Time System Handcrafted rules Morphology Morpho-logical analyzer For those of you who are not familiar with the Avenue project, there are 4 main modules. The first one, elicits translations and word alignments from bilingual speakers with a user-friendly tool. The Rule Learner (3rd module) learns the translation rules from the word-aligned parallel corpus result of the Elicitation phase, and The last module is the actual transfer engine, which given the translation rules and lexical entries, and a sentence to be translated, produces a list of translation candidates or lattice. But, until now there was no way to validate the generalizations learned by the Automatic Rule Learner Nor to verify the quality of the translations produced by the system Interactive and Automatic Rule Refinement

My Thesis Interactive and Automatic Rule Refinement Elicitation
Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Learning Module Translation Correction Tool Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Corpus So, my thesis work consists of extracting bilingual speaker feedback with an online GUI (the TCTool) and hypothesizing refinement operations to be performed both on the grammar (transfer rules) and the lexicon so as to improve translation quality and coverage. Note that my this approach is specifically well-suited for languages with scarce resources, but can be applied to any language pair, as long as there are bilingual speakers available and there is an initial set of transfer rules for that pair. Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement

Resource-poor languages
Related Work Post-editing Rule Adaptation Fixing Machine Translation My Thesis Resource-poor languages My research vs. Post-editing: replacing expert post-editor by a non-expert bilingual speakers + software but it does more than just mere correcting the current sentence  it corrects the problem in its core so that the same error would not occur twice. Callison-Burch  Fixing MT: experts fixing alignments (training data) Su et al  adjusting evaluation parameters of a SMT to respect user style The idea of Rule Adaptation to correct language Technology applications is not new. There have been researchers applying RA to (among other NLP applications): - parsing [brill 93] (reduce error by learning a set of simple structural transformations), - NLU [gavalda, 2000] (mechanism to enable a non-expert user to dynamically extend the coverage of a NLU module, just by answering simple classification questions) - and, more relevant for my thesis, to MT [click] where researchers have applied different techniques ranging from [Corston-Oliver and Gammon] DTs to fix automatically learned linguistic representation to reduce noise (very similar to our goal!) to [Imamura and Menezes and Richardson] comparing MT translations with human reference translations to correct and eliminate redundant rules. In contrast to filtering out incorrect or redundant rules, though, I propose to actually refine and expand the translation rules themselves, by editing valid but inaccurate rules that might be lacking a constraint for example, which will result into correctly translating sentences, that were incorrectly translated before. Furthermore, and most importantly, my research does not rely on the existence of a parallel training corpus or a set of human reference translations, unlike all other existing methods. Interactive and Automatic Rule Refinement

Thesis Statement Given a rule-based Transfer MT system: - Extract useful information from non-expert bilingual speakers to correct MT output. - Automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality. Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. Furthermore, We can automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality. Interactive and Automatic Rule Refinement

Assumptions No parallel training data available No human reference translations available The SL sentence needs to be fully parsed by the translation grammar. Bilingual speakers can give enough information about the MT errors. First 2, imposed by the AVENUE project  makes the problem more challenging  3. some rule needs to have applied, so that we can attempt to refine it automatically. 4. Which I’ll show you is true in the preliminary work section. Interactive and Automatic Rule Refinement

Scope Automatically refine types of errors with: 1. Just user correction information. 2. Correction and error information. 3. A reasonable amount of further user interaction and available correction and error information. Both in manually written and automatically learned grammars [AMTA 2004]. I plan to cover (types of errors that) that can be refined fully automatically with: 1 and 2 And investigate 3. Fall outside the scope Types of errors for which - users cannot give the required information or - further user interaction might take too long and might not provide with the required information In the AMTA paper we compare manual and automatically learned grammars for the purpose of automatic RR. We found that there is a difference (albeit) small between the TQ produced by the manual grammar and the AL grammar, but the important distinction is that I need to apply different types of RR for different types of grammar. Interactive and Automatic Rule Refinement

Automatic Evaluation of Refinement process
Technical Challenges Automatic Evaluation of Refinement process Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Base: crucial component, without which it makes no sense to try to automatically learn Refinement operations.  Minimal in this context refers to minimal post-editing, which usually means to make the least amount of changes possible for producing an understandable working document, rather than producing a high quality document. However, unlike Allen’s (2003) definition of MPE, I do not mean MPE in the sense of producing MT output of less quality, but rather of having bilingual users perform the least amount of changes to the translation so that the same rule that originally generated the SL sentence can be refined. Because my goal is to elicit rather technical information from non-expert users, I need to find a way to classify errors that is not too hard for naive users to do accurately, and at the same time is useful for automatic RR. 2. (minimal description length) MDL = parsimony so that grammar size doesn't increase unnecessarily (blow up) with lots of local refinements  Part of the challenge of this second aspect of my thesis is to refine both manual and automatically learned translation rules In order to be able to improve translation quality and coverage automatically, We need to devise a method evaluate the refined system output also automatically so that evaluation metrics results can drive the refinement process. Elicit minimal MT information from non-expert users Interactive and Automatic Rule Refinement

Preliminary Work Interactive elicitation of error information
A framework for automatic rule adaptation

Error Typology for Automatic Rule Refinement (simplified)
Completed Work Interactive elicitation of error information Error Typology for Automatic Rule Refinement (simplified) Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Missing word Extra word Wrong word order Incorrect word Wrong agreement After looking at several hundred sentences (eng2spa) I organized the different types of MT errors into a typology, keeping in mind the ultimate goal of automatic rule refinement. Walk them thru it, explain!!!  Need to Find appropriate level of granularity for MT error classification Interactive and Automatic Rule Refinement

TCTool (Demo) Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order To address the types of MT errors identified (described in previous page), we built an online GUI which allows non-expert bilingual users to reliably detect and minimally correct MT errors by doing one of the following correcting actions: (missing a word ) Adding a word (Extra word ) deleting a word, etc. (incorrect word + wrong agreement ) Modify a word (wrong word order ) change word order Given: SL sentence (e.g. I see them) TL sentence (e.g. Yo veo los) word-to-word alignments (I-yo, see-veo, them-los) (context) Interactive and Automatic Rule Refinement

1st Eng2Spa User Study Completed Work Interactive elicitation of error information [LREC 2004] MT error classification  9 linguistically-motivated classes [Flanagan, 1994], [White et al. 1994]: word order, sense, agreement error (number, person, gender, tense), form, incorrect word and no translation precision recall error detection 90% 89% error classification 72% 71% In the LREC paper, we show that users can reliably detect and classify MT errors, given an initial error classification with 9 linguistically-motivated classes  it turns out this is harder than needs be, since some of these distinctions are not important for automatic RR, whereas some other info which could be elicited is not currently being elicited with this classification. Manual grammar (12 rules) lexical entries Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect) Interested in high precision, even at the expense of lower recall Users did not always fix a translation in the same way When the final translation was not = gold standard, it was still correct or better (better stylistically) ------ For more details, see LREC paper or thesis proposal document Interactive and Automatic Rule Refinement

Translation Rules {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ;; English parsing: ((x0 def) = (x1 def))  NP definiteness = DET definiteness (x0 = x3)  NP = N (N is the head of the NP) ;; Spanish generation: ((y1 agr) = (y2 agr))  DET agreement = N agreement ((y3 agr) = (y2 agr))  ADJ agreement = N agreement (y2 = x3) )  Pass the features of English N to Spanish N ADJ::ADJ |: [nice] -> [bonito] ((X1::Y1) ((x0 pos) = adj) ((x0 form) = nice) ((y0 agr num) = sg)  Spanish ADJ is singular in number ((y0 agr gen) = masc))  Spanish ADJ is masculine in number Rule formalism Showing raw material my module works on (grammar + lexicon) X-side = English (SL) Y-side = Spanish (TL) Interactive and Automatic Rule Refinement

Automatic Rule Refinement Framework
Completed Work Automatic Rule Adaptation Find best RR operations given a: Grammar (G), Lexicon (L), (Set of) Source Language sentence(s) (SL), (Set of) Target Language sentence(s) (TL), Its Parse tree (P), and Minimal correction of TL (TL’) such that TQ2 > TQ1 Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) such that the translation quality of the refined grammar is higher than that of the original grammar. And/Or coverage increased And/Or ambiguity reduced Interactive and Automatic Rule Refinement

Types of Refinement Operations
Completed Work Types of Refinement Operations Automatic Rule Adaptation 1. Refine a translation rule: R0  R1 (R0 modified, either made more specific or more general) R0: NP DET N ADJ DET ADJ N a nice house una casa bonito una casa bonita N gender = ADJ gender There are 2 main refinement operations that can be applied both to grammar rules and lexical entries, with some minor differences. Sometimes a rule is mostly correct, but is missing an agreement constraint:  Example on this slide Automatically learned grammars some times over-generalize, and such cases the rules have an extra agreement constraint, which makes the rule incorrect.  Results in higher precision and a tighter grammar reducing size of candidate list R1: Interactive and Automatic Rule Refinement

Types of Refinement Operations (2)
Completed Work Types of Refinement Operations (2) Automatic Rule Adaptation 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (R0 modified, specific rule) R0: NP DET ADJ N NP DET N ADJ a nice house una casa bonita The second type of operation I’ll like to mention is to add a specific rule by making a copy of a general rule (leaving the original untouched) and making changes to the copy (specific rule) Rop1: leave original R0 rule as is, modify the duplicate, R1, This example shows the refinement operation required to also cover the pre-nominal order of ADJ in Spanish (exception to the general rule). You can find a more comprehensive discussion of possible types of refinement operations in the proposal document. Unfortunately I cannot cover them all in my talk. In the document I give a more comprehensive discussion of When users add or delete a word, the safest Rop type is bifurcate, until we have evidence that the original rule can never apply ( interactive mode) High precision, but increases size of candidate list Coverage in terms of TL patterns: every time R0 is modified ->R0’, the coverage on the TL side increases R1: NP DET ADJ N a great artist un gran artista Interactive and Automatic Rule Refinement

Formalizing Error Information
Completed Work Formalizing Error Information Automatic Rule Adaptation Wi = error Wi’ = correction Wc = clue word NP DET ADJ N NP DET N ADJ Wi = bonito a nice house una casa bonito Wi = namely the word that needs to be modified, deleted or dragged into a different position by the user in order for the sentence to be correct Wi’= namely the user modification of Wi or the word that needs to be added by the user in order for the sentence to be correct. Wc = represents the word that gives away the clue with respect to what triggered the correction, namely the cause of the error. Example: in the case of lack of agreement between a noun and the adjective that modifies it, as in *el auto roja (the red car), Wc is instantiated with auto, namely the word that gives away the clue about what the gender agreement feature value of Wi (roja) should be. Wc can also be a phrase or constituent like a plural subject (e.g.. *[Juan y Maria] cai). Wc is not always present and it can be before or after Wi (i>c or i<c). They can be contiguous or separated by one or more words. Wc = casa NP DET ADJ N NP DET N ADJ N gender = ADJ gender a nice house una casa bonita Wi’ = bonita Interactive and Automatic Rule Refinement

Triggering Feature Detection
Completed Work Triggering Feature Detection Automatic Rule Adaptation Comparison at the feature level to detect triggering feature(s) Delta function: (Wi,Wi’) Examples: (bonito,bonita) = {gender} (comiamos,comia) = {person,number} (mujer,guitarra) = {} If  set is empty, need to postulate a new binary feature gen = masc gen = masc Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. Triggering feature: namely what feature attributes has a different value in Wi and Wi’ Delta set empty = the existing feature language is not expressive enough to distinguish between Wi and Wi’ Interactive and Automatic Rule Refinement

Deciding on the Refinement Op
Completed Work Deciding on the Refinement Op Automatic Rule Adaptation Given: - Action performed by the user (add, delete, modify, change word order) - Error information available (clue word, word alignments, etc.)  Refinement Action Deciding on the appropriate rule refinement operation: Given (1 and 2), the method proposed can determine what Refinement action to take Interactive and Automatic Rule Refinement

Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This tree represents the space of all possible Rule refinement operations, given a user correcting action and the amount of information available at refinement time And here is an example of how the proposed approach would work thru a simulation Interactive and Automatic Rule Refinement

- Batch and Interactive mode User Studies Evaluation
Proposed Work - Batch and Interactive mode User Studies Evaluation

Rule Refinement Example
Automatic Rule Adaptation Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Goal (given by user correction): *Gaudí era un artista grande Gaudí era un gran artista Given incorrect MT output and the translation grammar that produced it, an expert would know what to do to fix the error (editor: fix sentence + computational linguist: fix the translation rule), but it is a very hard problem for a piece of software to do that. Here is a successful example of how my approach can solve this. Interactive and Automatic Rule Refinement

Automatic Rule Adaptation 1. Error Information Elicitation Given the TL produced by the MT system for a specific SL sentence, Feed the translation pair thru the TCTool to elicit the correction and error information. Refinement Operation Typology Interactive and Automatic Rule Refinement

2. Variable Instantiation from Log File
Automatic Rule Adaptation Correcting Actions: 1. Word order change (artista grande  grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word  Wc = artista In this case, even if user had not identified Wc, refinement process would have been the same Input correction log file with transfer engine output (parse tree) to Refinement module  variable instantiation. Assumption:  The reason I know “great” should be “gran” and not “grande” is that it modifies “artist”. (in this case artista might not really be the triggering word (gran is always a pre-nominal adj) but I do it for the sake of the argument, I don’t have time to introduce a new example and explain it well) Interactive and Automatic Rule Refinement

3. Retrieve Relevant Lexical Entries
Automatic Rule Adaptation No lexical entry for [great  gran] Duplicate lexical entry [great  grande] and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) (Morphological analyzer: grande = gran) ADJ::ADJ |: [great] -> [grande] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) First step is to retrieve Relevant Lexical Entries, but since there is no lexical entry for greatgran, need to Add Lexical Entry for “gran” Modify grammar and lexicon by applying RR ops. (The type of this operation is bifurcate) Lex0  Lex0 + Lex1[Lex0 +  TLword] Even if we had a morphological analyzer , it would find no difference between grande and gran  my research can do something we couldn’t do even with a morphology module grande grande AQ0CS0 grande NCCS000 gran gran AQ0CS0 Parole tags: A=adjective Q=qualifying 0=no degree C=common gender S= number sg 0= no case Interactive and Automatic Rule Refinement

4. Finding Triggering Feature(s)
Automatic Rule Adaptation Feature  function: (Wi, Wi’) =   need to postulate a new binary feature: feat1 5. Blame assignment (from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Grammar Empty delta set = feature language is not expressive enough to distinguish between “gran” and “grande”, need to postulate a new feature Blame assignment = what rules need to be refined (from parse output by xfer engine):  This is the MT output tracing what rules applied and in what order. Even if user had not identified Wc, the same RR process could be done fully automatically, since “grande/gran” is moved around locally within the NP. S,1 … NP,1 NP,8 Interactive and Automatic Rule Refinement

6. Variable Instantiation in the Rules
Automatic Rule Adaptation Wi = grande  POSi = ADJ = Y3, y3 Wc = artista  POSc = N = Y2, y2 {NP,8} ;; Y1 Y2 Y3 NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) So that when the order gets flipped we know what variables need to be changed in the rule Interactive and Automatic Rule Refinement

7. Refining Rules Automatic Rule Adaptation Bifurcate NP,8  NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) =c requires the feature to be specified in the ADJ lexical entry and to be + If it’s underspecified or -, won’t unify Interactive and Automatic Rule Refinement

8. Refining Lexical Entries
Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((y0 feat1) = +)) 5. Modify grammar and lexicon by applying RR ops. Interactive and Automatic Rule Refinement

Done? Not yet Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Need to restrict application of general rule (R0) to just post-nominal ADJ un artista grande un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces an incorrect translation that we know is incorrect (by the minimal post-editing maximum). Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Given another adjective underspecified with respect to feat1 (all new adjectives), it will unify with the general rule (NP,8), which is the desired effect  default. Interactive and Automatic Rule Refinement

Add Blocking Constraint
Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Can we also eliminate incorrect translations automatically? un artista grande *un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces incorrect translations that we know are incorrect. Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Eliminate incorrect translations automatically?  tighter grammar (minimal description length)  Since great translates both as grande and gran, both rules will be applied, one with each type of adjective. Interactive and Automatic Rule Refinement

Making the grammar tighter
Automatic Rule Adaptation If Wc = artista Add [feat1= +] to N(artista) Add agreement constraint to NP,8 (R0) between N and ADJ ((N feat1) = (ADJ feat1)) *un artista grande *un artista gran un gran artista *un grande artista If user identified Wc, the RR module can also tag artista with ((feat1)= +) and add an agreement constraint between ADJ and N ((y2 feat1) = (y3 feat1)) to {NP,8} For each new adjective that appears in this context, the output produced by one of the two NP rules will be picked (as best) by users, and the RR module will be able to tag them as being pre-nominal (feat1 = +) or post-nominal (feat1 = -) in the lexicon. Note that without Wc  RR generated correct translation, but can’t eliminate all incorrect translations Interactive and Automatic Rule Refinement

Batch Mode Implementation
Proposed Work Batch Mode Implementation Automatic Rule Adaptation Given a set of user corrections, apply refinement module. For Refinement Operations of errors that can be refined fully automatically: Just by using correction information 2. Using correction and error information Main focus of my thesis 2. Such as type of error and whether there is a clue word Interactive and Automatic Rule Refinement

Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This is the Refinement typology introduced at the end of the preliminary work. (This tree represents the space of all possible Rule refinement operations, given a user correcting action and the amount of information available at refinement time) [click] Interactive and Automatic Rule Refinement

1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Fully automatically just by using correction information There is no information about the clue word or error type, just the info from the user correcting with the TCTool Add –Wc +al(ignment) John and May fell  Juan y Maria cayeron  Juan y Maria se cayeron from “se” to “fell” (Se cayeron  fell) Change word order WiWc WiWi’ POS!=POS: I will help him fix the car  ayudare a el a arreglar el auto  le ayudare… It is a nice house – Es una casa bonito  Es una casa bonita Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista Interactive and Automatic Rule Refinement

2. Correction and Error info Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ 2: Fully automatically using correction and error information, such as clue words, error type Need to know that orgullosa is the word which is triggering the correction (namely the addition of “de”) Wc=orgullosa in order to be able to automatically refine it. The rule PP(Prep NP) already exists (the level of generality appropriate for the constraint is set by default and can be confirmed thru user interaction). PP  PREP NP I am proud of you – Estoy orgullosa tu  Estoy orgullosa de ti Interactive and Automatic Rule Refinement

Interactive Mode Implementation
Proposed Work Interactive Mode Implementation Automatic Rule Adaptation Extra error information is required to determine triggering context automatically  Need to give other relevant sentences to the user at run-time (minimal pairs) For Refinement Operations of errors that can be refined fully automatically but: 3. require a reasonable amount of further user interaction and can be solved by available correction and error information. In some cases, however, extra info required So I’ll also need to implement an interactive mode of the rule refinement module so that when it doesn’t have enough info to determine the triggering context, it can present users with more sentences to evaluate This mode of operation will be implemented for Ref Op of errors… Typically requires more effort from users and takes longer  smaller test set Interactive and Automatic Rule Refinement

Focus 3 Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This shows the refinements for error types that require a reasonable amount of further user interaction and can be solved by available correction and error information (Focus 3) -rule: suppose the rule required to generate the new POS sequence is not in the grammar (if the grammar didn’t have a PP) Let’s take a closer look at this last example and how the proposed approach would go about refining it. I see them – Veo los  Los veo Interactive and Automatic Rule Refinement

Example Requiring Minimal Pair
Proposed Work Example Requiring Minimal Pair Automatic Rule Adaptation 1. Run SL sentence through the transfer engine I see them  *veo los  los veo 2. Wi = los but no Wi’ nor Wc Need a minimal pair to determine appropriate refinement: I see cars  veo autos 3. Triggering feature(s): (los,autos) = {pos} PRON(los)[pos=pron] N(autos)[pos=n] In Spanish if the direct Obj is a pronoun (clitic), it needs to be in a pre-verbal position. Wc  Not a specific word, but rather a combination of features in Wi(pron+acc) If operating in batch mode, I would flip the order of the VP rule and would generate the correct translation, But would not eliminate incorrect translations (autos veo, etc.) Interactive and Automatic Rule Refinement

Refining and Adding Constraints
Proposed Work Refining and Adding Constraints VP,3: VP NP  VP NP VP,3’: VP NP  NP VP + [NP pos =c pron] Percolate triggering features up to the constituent level: NP: PRON  PRON + [NP pos = PRON pos] Block application of general rule (VP,3): VP,3: VP NP  VP NP + [NP pos = (*NOT* pron)] To the appropriate rules … but for that to have any effect, we need to make sure that they pos feature is passed from the PRON to the constituent level if applicable [know if it’s applicable and which rule from MT output tree] NP… Which has the desired effect of not overgenerating for cases where the Npobj is not a pronoun But we’d still need to block the application of the general rule… Interactive and Automatic Rule Refinement

Generalization Power  When triggering feature already exists in the feature language (pos, gender, number, etc.) - I love him  lo amo (before: *amo lo) - They called me yesterday  me llamaron ayer (before: *llamaron me ayer) - Mary helps her with her homework  Maria le ayuda con sus tareas (before: *Maria ayuda le con sus tareas) Unlike when the feat language is not expressive enough, and a new feature has to be postulated (feat1) With previous refinement,all these sentences will now be correctly translated by the system: Now X instead of Y Different rule (NP VP NP) John gave her the book  Juan le dio el libro Interactive and Automatic Rule Refinement

Proposed Work User Studies TCTool: new MT classification (Eng2Spa) Different language pair Mapudungun or Quechua  Spanish Batch vs Interactive mode Amount of information elicited just corrections vs + error information 1. New way of eliciting MT error information simple statements/questions about the error possibly ordering from most informative to least informative Eng2Spa II: test new version of TCTool (v0.2) and compare error correction + classification accuracy results. Need to find the tradeoff between safety and generality. In particular, how much does the “interactivity” add (in safety) to the most general batch settings? (Alon’s suggestion) This will probably result into at least 4-8 more user studies  Interactive and Automatic Rule Refinement

Evaluation of Refined MT System
Evaluate best translation  Automatic evaluation metrics (BLEU, NIST, METEOR) Evaluate translation candidate list  precision (+parsimony) Automatic evaluation of Refined MT output Automatic Eval Necessary to maximize TQ and C while deciding on what RR operations to apply have 2 or more reference translations (sets of user corrections) Interactive and Automatic Rule Refinement

Evaluate Best translation
Hypothesis file (translations to be evaluated automatically) Raw MT output: Best sentence (picked by user to be correct or requiring the least amount of correction) Refined MT output: Use METEOR score at sentence level to pick best candidate from the list  Run all automatic metrics on the new hypothesis file using user corrections as reference translations. Assumption: user corrections = gold standard  reference translations Method: compare raw MT output with MT output by the refined grammar and lexicon using automatic evaluation metrics, such as BLEU and METEOR Interactive and Automatic Rule Refinement

Evaluate Candidate List
Precision: “tp” binary {0,1} tp + fp total number of TC SL TL SL TL SL TL SL TL   = “tp” binary, since it’s either like the user correction (reference translation)  1 or it isn’t  0. precision is also a measure of parsimony, since if the size of CL grows, precision decreases recall (tp/tp+fn) makes no sense in this context since  we would need to know all the possible correct translations in order to calculate fn (but there is no predefined set of fn) Interactive and Automatic Rule Refinement

Expected Contributions
An efficient online GUI to display translations and alignments and solicit pinpoint fixes from non-expert bilingual users. An expandable set of rule refinement operators triggered by user corrections, to automatically refine and expand different types of grammars. A mechanism to automatically evaluate rule refinements with user corrections as reference translations. 1. contribution 90% done 2. 50% done (laying out the theory, figuring out how to do it (is an important contribution) 3. 70% (method identified, easy to implement) Interactive and Automatic Rule Refinement

Thesis Timeline Research components Duration (months) Back-end implementation User Studies Resource-poor language (data + manual grammar) 2 Adapt system to new language pair 1 Active Learning methods Evaluation Write and defend thesis Total Acknowledgements Committee Avenue team Friends and colleagues Expected graduation date: May 2006 Interactive and Automatic Rule Refinement

References Add references: Related work Probst et al. 2002 AL Interactive and Automatic Rule Refinement

Thanks! Questions? Interactive and Automatic Rule Refinement

Proposed Work Data Set Split development set (~400 sentence) into: Dev set  Run User Studies Develop Refinement Module Validate functionality Test set  Evaluate effect of Refinement operations + Wild test set (from naturally occurring text) Requirement: need to be fully parsed by grammar 1. which can be fully parsed by the original manual grammar ( rules) [Right now ~40 rules (from 12 in the previous manual grammar)] from Typological and Structural Elicitation Corpus categorized by error MT error correction larger data set MT error correction+error info smaller data set Interactive and Automatic Rule Refinement

Some Questions Is the refinement process deterministic? Interactive and Automatic Rule Refinement

Others TCTool Demo Simulation RR operation patterns Automatic Evaluation feasibility study AMTA paper results BLEU, NIST and METEOR Precision, recall and F1 Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Interactive and Automatic Rule Refinement

Automatic Rule Adaptation SL + best TL picked by user Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Changing “grande” into “gran” Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Back to main Interactive and Automatic Rule Refinement

Automatic Rule Adaptation 1 2 3 Interactive and Automatic Rule Refinement

Input to RR module Automatic Rule Adaptation User correction log file Transfer engine output (+ parse tree): sl: I see them tl: VEO LOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,0 (PRON,2:3 "LOS") ) ) ) )> sl: I see cars tl: VEO AUTOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,2 (N,1:3 “AUTOS") ) ) ) )> Interactive and Automatic Rule Refinement

Completed Work Types of RR Operations Automatic Rule Adaptation Grammar: R0  R0 + R1 [=R0’ + constr] Cov[R0]  Cov[R0,R1] R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] Lexicon Lex0  Lex0 + Lex1[=Lex0 + constr] Lex0  Lex1[=Lex0 + constr] Lex0  Lex0 + Lex1[Lex0 +  TLword]   Lex1 (adding lexical item) bifurcate refine Grammar: Bifurcates: leave original as is, modify the other one (maybe change word oder + make the other one more specific [VP: V Nppron in Spanish] Make more specific [agreement constraint missing] Bifurcate: general case (blocking constraint) + specific case (triggering constraint) [pre-nominal adjectives] Cov in terms of TL patterns, every time R0 is modified ->R0’, the coverage on the TL side increases Lexicon 1. cayeron  se cayeron 2. woman + animate= + 4. Sense missing 3. OVW Interactive and Automatic Rule Refinement

Manual vs Learned Grammars
Automatic Rule Adaptation Manual inspection: Automatic MT Evaluation: [AMTA 2004] Same test set than user study (32 sentences) Same manual grammar + automatically learned grammar using Avenue’s Rule Learning module [Probst et al. 2002] (+ added feature constraints) Looking at 5 first translations Looking at the best translation for each grammar and compared Automatic eval reflects the findings thru manual evaluation that even though MG output is better, it is also the same in many cases, scores are not that different Different types of errors  different types of RR operations required Learned grammars higher level of lexicalization, need RR module to achieve appropriate level of generalization. Conclusions: - Manual G will need to be refined to encode exceptions, whereas Learned G will need to be refined to achieve the right level of generalization. - We expect the RR to give the most leverage when combined with the Learned Grammar. NIST BLEU METEOR Manual grammar 4.3 0.16 0.6 Learned grammar 3.7 0.14 0.55 Interactive and Automatic Rule Refinement

Human Oracle experiment
Completed Work Human Oracle experiment Automatic Rule Adaptation As a feasibility experiment, compared raw output with manually corrected MT: statistically significant (confidence interval test) These is an upper-bound on how much difference we should expect any refinement approach to make. NIST is the least discriminative of metrics, since for bigrams that appear only once (very small set) the system doesn’t get any credit. Interactive and Automatic Rule Refinement

Proposed Work Active Learning Automatic Rule Adaptation Minimize the number of examples a human annotator must label [Cohn et al. 1994] usually by processing examples in order of usefulness. . Minimize the number of Minimal Pairs presented to users Method to ... usually by processing examples in order of usefulness. [Lewis and Catlett 94] used uncertainty as a measure of usefulness. 2. [Callison-Burch 2003] proposed AL to reduce the cost of creating a corpus of labeled training examples for Statistical MT. Add example of RR op requiring AL and minimal pair Interactive and Automatic Rule Refinement

Order deterministic? Application of Rule Refinement operations is not deterministic, it directly depends on: The order in which it sees the corrected sentences Example: 1st agr constraint + bifurcate (WWO) C-set Reverse order C-set (!=) Given the same set of corrected sentences, the application of RR operations is not deterministic Interactive and Automatic Rule Refinement

Recycle corrections of Machine Translation output back into the system by refining and expanding existing translation rules In other words, my thesis focuses on Recyling non-expert user corrections back into the Machine Translation system by refining and expanding existing translation rules Left: represents an initial grammar (small manual grammar or automatically learned grammar) Right: refined grammar (codes for exceptions to general rules, has more (agreement) constraints, new rules, etc.)

1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information  There is no information about the clue word or error type, just the info from the user correcting with the TCTool 2nd example: +al(ignment) from “se” to “fell” (Se cayeron  fell) It is a nice house – Es una casa bonito  Es una casa bonita John and Mary fell – Juan y Maria  cayeron  Juan y Maria se cayeron Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista Interactive and Automatic Rule Refinement

1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information  There is no information about the clue word or error type, just the info from the user correcting with the TCTool Es una casa bonito  Es una casa bonita J y M cayeron  J y M se cayeron Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista I will help him fix the car – Ayudaré a él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information  There is no information about the clue word or error type, just the info from the user correcting with the TCTool I would like to go – Me gustaria que ir  Me gustaria  ir I will help him fix the car – Ayudaré a él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

2. Correction and Error info Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 2: Fully automatically using correction and error information, such as clue words, error type The rule PP(Prep NP) already exists (the level of generality appropriate for the constraint is set by default and can be confirmed thru user interaction). PP  PREP NP I am proud of you – Estoy orgullosa tu  Estoy orgullosa de ti Interactive and Automatic Rule Refinement

Focus 3 Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This shows the refinements for error types that require a reasonable amount of further user interaction and can be solved by available correction and error information (Focus 3) -rule: suppose the rule required to generate the new POS sequence is not in the grammar (if the grammar didn’t have a PP) Let’s take a closer look at this last example and how the proposed approach would go about refining it. Wally plays the guitar – Wally juega la guitarra  Wally toca la guitarra I saw the woman – Vi  la mujer  Vi a la mujer I see them – Veo los  Los veo Interactive and Automatic Rule Refinement

Outside Scope of Thesis Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ By definition, there is no MP that will provide with the triggering context for this example  John would have to be in the object position (I gave John the book – le di el libro a Juan), and there are only two words in common. If a word is moved outside the local rule, the RR module will feed the corrected sentence pair back into the system (to the RL) as a new training example. John read the book – A Juan leyó el libro   Juan leyó el libro Where are you from? – Donde eres tu de?  De donde eres tu? Interactive and Automatic Rule Refinement

Towards Interactive and Automatic Refinement of Translation Rules

Similar presentations

Presentation on theme: "Towards Interactive and Automatic Refinement of Translation Rules"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards Interactive and Automatic Refinement of Translation Rules

Similar presentations

Presentation on theme: "Towards Interactive and Automatic Refinement of Translation Rules"— Presentation transcript:

Similar presentations

About project

Feedback