Towards Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004
Outline Introduction Thesis statement and scope Preliminary Research Interactive elicitation of error information A framework for automatic rule adaptation Proposed Research Contributions and Thesis Timeline Motivation, related work, goals
Machine Translation (MT) Source Language (SL) sentence: Gaudi was a great artist Spanish translation: Gaudi era un gran artista MT System outputs : *Gaudi estaba un artista grande *Gaudi era un artista grande If we are given the sentence (SL): Gaudi is a great artist we would like our MT system to translate it like: Gaudi es un gran artista But instead our MT system outputs two incorrect Target Language (TL) sentences
Completed Work Spanish Adjectives Automatic Rule Adaptation General order: grande big in size NP DET N ADJ DET ADJ N a big house una casa grande Exception: gran exceptional This is an exception to the rule, since "gran" is a pre-nominal adjective NP DET ADJ N a great artist un gran artista
Commercial and Online Systems Correct Translation: Gaudi era un gran artista Systran, Babelfish (Altavista), WorldLingo, : *Gaudi era gran artista ImTranslation: *El Gaudi era un gran artista 1-800-Translate *Gaudi era un fenomenal artista Even for such a short and simple sentence like this, the output of commercial MT systems is often incorrect. In this case, and unlike the output of our system, the meaning is pretty much preserved, but more often than not, State-of-the-art MT systems mistranslate sentences giving them a totally incorrect meaning.
Post-editing Current solutions: Post-editing [Allen, 2003] by human linguists or editors (experts) Automated post-edition module (APE) [Allen & Hogan, 2000] to alleviate the tedious task of correcting most frequent errors over and over No solution to fully automate post-editing process MT output still requires post-editing!!!
Drawbacks of Current Methods Manual post-editing Corrections do not generalize Gaudi era un artista grande Juan es un amigo grande (Juan is a great friend) Era una oportunidad grande (It is a great opportunity) APE Humans need to predict all the errors ahead of time and code for the post-editing rules; given new error Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data.
My Solution Automate post-editing efforts by feeding them back into the MT system. Possible alternatives: Automatic learning of post-editing rules + system independent - several thousands of sentences might need to be corrected for the same error Automatic refinement of translation rules + attacks the core of the problem for transfer-based MT systems (need rules to fix!)
Related Work [Corston-Oliver & Gammon, 2003] [Imamura et al. 2003] [Menezes & Richardson, 2001] [Brill, 1993] [Gavaldà, 2000] [Callison-Burch, 2004] Fixing Machine Translation Rule Adaptation [Su et al. 1995] My Thesis Post-editing My research vs. Post-editing: replacing expert post-editor by a non-expert bilingual speakers + software but it does more than just mere correcting the current sentence it corrects the problem in its core so that the same error would not occur twice. No pre-existing training data required No human reference translations required Non-expert user feedback [Allen & Hogan, 2000]
Resource-poor Scenarios (AVENUE) Lack of electronic parallel data Lack of computational linguists Lack of manual grammar Why bother? Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.) Preservation of their language and culture Resource-poor Languages: Mapudungun Quechua Aymara My research approach can be applied to any language-pair (given a transfer MT system), but it's specifically designed to be able to tackle languages with little or no electronic resources, such as Mapudungun and Quechua, which makes the problem of refining translation rules even harder. Since my work is embedded in the AVENUE project, I can't assume that there is any pre-existing training corpus or even reference translations for my testing sets.
How is MT possible for resource-poor languages? Bilingual speakers
AVENUE Project Overview Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Word-Aligned Parallel Corpus Elicitation Tool Elicitation Corpus Elicitation Rule Learning Run-Time System Handcrafted rules Morphology Morpho-logical analyzer For those of you who are not familiar with the Avenue project, there are 4 main modules. The first one, elicits translations and word alignments from bilingual speakers with a user-friendly tool. The Rule Learner (3rd module) learns the translation rules from the word-aligned parallel corpus result of the Elicitation phase, and The last module is the actual transfer engine, which given the translation rules and lexical entries, and a sentence to be translated, produces a list of translation candidates or lattice. But, until now there was no way to validate the generalizations learned by the Automatic Rule Learner Nor to verify the quality of the translations produced by the system
My Thesis Elicitation Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Learning Module Translation Correction Tool Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Corpus So, my thesis work consists of extracting bilingual speaker feedback with an online GUI (the TCTool) and hypothesizing refinement operations to be performed both on the grammar (transfer rules) and the lexicon so as to improve translation quality and coverage. Note that my this approach is specifically well-suited for languages with scarce resources, but can be applied to any language pair, as long as there are bilingual speakers available and there is an initial set of transfer rules for that pair. Lexical Resources Lattice Elicitation Tool
Resource-poor languages Related Work Post-editing Rule Adaptation Fixing Machine Translation My Thesis Resource-poor languages My research vs. Post-editing: replacing expert post-editor by a non-expert bilingual speakers + software but it does more than just mere correcting the current sentence it corrects the problem in its core so that the same error would not occur twice.
Thesis Statement Given a rule-based Transfer MT system: - Extract useful information from non-expert bilingual speakers to correct MT output. - Automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality. Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. Furthermore, We can automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality.
Assumptions No parallel training data available No human reference translations available The SL sentence needs to be fully parsed by the translation grammar. Bilingual speakers can give enough information about the MT errors.
Scope Automatically refine types of errors with: 1. Just user correction information. 2. Correction and error information. 3. A reasonable amount of further user interaction and available correction and error information. Both in manually written and automatically learned grammars [AMTA 2004]. I plan to cover (types of errors that) that can be refined fully automatically with: 1 and 2 And investigate 3. Fall outside the scope Types of errors for which - users cannot give the required information or - further user interaction might take too long and might not provide with the required information In the AMTA paper we compare manual and automatically learned grammars for the purpose of automatic RR. We found that there is a difference (albeit) small between the TQ produced by the manual grammar and the AL grammar, but the important distinction is that I need to apply different types of RR for different types of grammar.
Technical Challenges Automatic Evaluation of Refinement process Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Elicit minimal MT information from non-expert users
Preliminary Work Interactive elicitation of error information A framework for automatic rule adaptation
Error Typology for Automatic Rule Refinement (simplified) Completed Work Interactive elicitation of error information Error Typology for Automatic Rule Refinement (simplified) Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Missing word Extra word Wrong word order Incorrect word Wrong agreement After looking at several hundred sentences (eng2spa) I organized the different types of MT errors into a typology, keeping in mind the ultimate goal of automatic rule refinement. Need to Find appropriate level of granularity for MT error classification
TCTool (Demo) Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order To address the types of MT errors identified (described in previous page), we built an online GUI which allows non-expert bilingual users to reliably detect and minimally correct MT errors by doing one of the following correcting actions: Adding a word, deleting a word, Modify a word, change word order Given: SL sentence (e.g. I see them) TL sentence (e.g. Yo veo los) word-to-word alignments (I-yo, see-veo, them-los) (context)
1st Eng2Spa User Study Completed Work Interactive elicitation of error information [LREC 2004] MT error classification 9 linguistically-motivated classes [Flanagan, 1994], [White et al. 1994]: word order, sense, agreement error (number, person, gender, tense), form, incorrect word and no translation precision recall error detection 90% 89% error classification 72% 71% In the LREC paper, we show that users can reliably detect and classify MT errors, given an initial error classification with 9 linguistically-motivated classes it turns out this is harder than needs be, since some of these distinctions are not important for automatic RR, whereas some other info which could be elicited is not currently being elicited with this classification. Manual grammar (12 rules) + 442 lexical entries Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect)
Translation Rules {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ;; English parsing: ((x0 def) = (x1 def)) NP definiteness = DET definiteness (x0 = x3) NP = N (N is the head of the NP) ;; Spanish generation: ((y1 agr) = (y2 agr)) DET agreement = N agreement ((y3 agr) = (y2 agr)) ADJ agreement = N agreement (y2 = x3) ) Pass the features of English N to Spanish N ADJ::ADJ |: [nice] -> [bonito] ((X1::Y1) ((x0 pos) = adj) ((x0 form) = nice) ((y0 agr num) = sg) Spanish ADJ is singular in number ((y0 agr gen) = masc)) Spanish ADJ is masculine in number Rule formalism X-side = English (SL) Y-side = Spanish (TL)
Automatic Rule Refinement Framework Completed Work Automatic Rule Adaptation Find best RR operations given a: Grammar (G), Lexicon (L), (Set of) Source Language sentence(s) (SL), (Set of) Target Language sentence(s) (TL), Its Parse tree (P), and Minimal correction of TL (TL') such that TQ2 > TQ1 Which can also be expressed as: max TQ(TL|TL',P,SL,RR(G,L)) such that the translation quality of the refined grammar is higher than that of the original grammar. And/Or coverage increased And/Or ambiguity reduced
Types of Refinement Operations Completed Work Types of Refinement Operations Automatic Rule Adaptation 1. Refine a translation rule: R0 R1 (R0 modified, either made more specific or more general) R0: NP DET N ADJ DET ADJ N a nice house una casa bonito una casa bonita N gender = ADJ gender There are 2 main refinement operations that can be applied both to grammar rules and lexical entries, with some minor differences. Sometimes a rule is mostly correct, but is missing an agreement constraint: Example on this slide Automatically learned grammars some times over-generalize, and such cases the rules have an extra agreement constraint, which makes the rule incorrect. Results in higher precision and a tighter grammar reducing size of candidate list R1:
Types of Refinement Operations (2) Completed Work Types of Refinement Operations (2) Automatic Rule Adaptation 2. Bifurcate a translation rule: R0 R0 (same, general rule) R1 (R0 modified, specific rule) R0: NP DET ADJ N NP DET N ADJ a nice house una casa bonita The second type of operation I'll like to mention is to add a specific rule by making a copy of a general rule (leaving the original untouched) and making changes to the copy (specific rule) Rop1: leave original R0 rule as is, modify the duplicate, R1, This example shows the refinement operation required to also cover the pre-nominal order of ADJ in Spanish (exception to the general rule). You can find a more comprehensive discussion of possible types of refinement operations in the proposal document. Unfortunately I cannot cover them all in my talk. In the document I give a more comprehensive discussion of When users add or delete a word, the safest Rop type is bifurcate, until we have evidence that the original rule can never apply ( interactive mode) High precision, but increases size of candidate list Coverage in terms of TL patterns: every time R0 is modified ->R0', the coverage on the TL side increases R1: NP DET ADJ N a great artist un gran artista
Formalizing Error Information Completed Work Formalizing Error Information Automatic Rule Adaptation Wi = error Wi' = correction Wc = clue word NP DET ADJ N NP DET N ADJ Wi = bonito a nice house una casa bonito Wi = namely the word that needs to be modified, deleted or dragged into a different position by the user in order for the sentence to be correct Wi'= namely the user modification of Wi or the word that needs to be added by the user in order for the sentence to be correct. Wc = represents the word that gives away the clue with respect to what triggered the correction, namely the cause of the error. Example: in the case of lack of agreement between a noun and the adjective that modifies it, as in *el auto roja (the red car), Wc is instantiated with auto, namely the word that gives away the clue about what
Triggering Feature Detection Completed Work Triggering Feature Detection Automatic Rule Adaptation Comparison at the feature level to detect triggering feature(s) Delta function: (Wi,Wi’) Examples: (bonito,bonita) = {gender} (comiamos,comia) = {person,number} (mujer,guitarra) = {} If set is empty, need to postulate a new binary feature gen = masc gen = masc Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. Triggering feature: namely what feature attributes has a different value in Wi and Wi’ Delta set empty = the existing feature language is not expressive enough to distinguish between Wi and Wi’ Interactive and Automatic Rule Refinement
Deciding on the Refinement Op Completed Work Deciding on the Refinement Op Automatic Rule Adaptation Given: - Action performed by the user (add, delete, modify, change word order) - Error information available (clue word, word alignments, etc.) Refinement Action Deciding on the appropriate rule refinement operation: Given (1 and 2), the method proposed can determine what Refinement action to take Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This tree represents the space of all possible Rule refinement operations, given a user correcting action and the amount of information available at refinement time And here is an example of how the proposed approach would work thru a simulation Interactive and Automatic Rule Refinement
- Batch and Interactive mode User Studies Evaluation Proposed Work - Batch and Interactive mode User Studies Evaluation
Rule Refinement Example Automatic Rule Adaptation Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Goal (given by user correction): *Gaudí era un artista grande Gaudí era un gran artista Given incorrect MT output and the translation grammar that produced it, an expert would know what to do to fix the error (editor: fix sentence + computational linguist: fix the translation rule), but it is a very hard problem for a piece of software to do that. Here is a successful example of how my approach can solve this. Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Automatic Rule Adaptation 1. Error Information Elicitation Given the TL produced by the MT system for a specific SL sentence, Feed the translation pair thru the TCTool to elicit the correction and error information. Refinement Operation Typology Interactive and Automatic Rule Refinement
2. Variable Instantiation from Log File Automatic Rule Adaptation Correcting Actions: 1. Word order change (artista grande grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word Wc = artista In this case, even if user had not identified Wc, refinement process would have been the same Input correction log file with transfer engine output (parse tree) to Refinement module variable instantiation. Assumption: The reason I know “great” should be “gran” and not “grande” is that it modifies “artist”. (in this case artista might not really be the triggering word (gran is always a pre-nominal adj) but I do it for the sake of the argument, I don’t have time to introduce a new example and explain it well) Interactive and Automatic Rule Refinement
3. Retrieve Relevant Lexical Entries Automatic Rule Adaptation No lexical entry for [great gran] Duplicate lexical entry [great grande] and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) (Morphological analyzer: grande = gran) ADJ::ADJ |: [great] -> [grande] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) First step is to retrieve Relevant Lexical Entries, but since there is no lexical entry for greatgran, need to Add Lexical Entry for “gran” Modify grammar and lexicon by applying RR ops. (The type of this operation is bifurcate) Lex0 Lex0 + Lex1[Lex0 + TLword] Even if we had a morphological analyzer , it would find no difference between grande and gran my research can do something we couldn’t do even with a morphology module grande grande AQ0CS0 grande NCCS000 gran gran AQ0CS0 Parole tags: A=adjective Q=qualifying 0=no degree C=common gender S= number sg 0= no case Interactive and Automatic Rule Refinement
4. Finding Triggering Feature(s) Automatic Rule Adaptation Feature function: (Wi, Wi’) = need to postulate a new binary feature: feat1 5. Blame assignment (from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Grammar Empty delta set = feature language is not expressive enough to distinguish between “gran” and “grande”, need to postulate a new feature Blame assignment = what rules need to be refined (from parse output by xfer engine): This is the MT output tracing what rules applied and in what order. Even if user had not identified Wc, the same RR process could be done fully automatically, since “grande/gran” is moved around locally within the NP. S,1 … NP,1 NP,8 Interactive and Automatic Rule Refinement
6. Variable Instantiation in the Rules Automatic Rule Adaptation Wi = grande POSi = ADJ = Y3, y3 Wc = artista POSc = N = Y2, y2 {NP,8} ;; Y1 Y2 Y3 NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) So that when the order gets flipped we know what variables need to be changed in the rule Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement 7. Refining Rules Automatic Rule Adaptation Bifurcate NP,8 NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) =c requires the feature to be specified in the ADJ lexical entry and to be + If it’s underspecified or -, won’t unify Interactive and Automatic Rule Refinement
8. Refining Lexical Entries Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((y0 feat1) = +)) 5. Modify grammar and lexicon by applying RR ops. Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Done? Not yet Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Need to restrict application of general rule (R0) to just post-nominal ADJ un artista grande un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces an incorrect translation that we know is incorrect (by the minimal post-editing maximum). Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Given another adjective underspecified with respect to feat1 (all new adjectives), it will unify with the general rule (NP,8), which is the desired effect default. Interactive and Automatic Rule Refinement
Add Blocking Constraint Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Can we also eliminate incorrect translations automatically? un artista grande *un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces incorrect translations that we know are incorrect. Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Eliminate incorrect translations automatically? tighter grammar (minimal description length) Since great translates both as grande and gran, both rules will be applied, one with each type of adjective. Interactive and Automatic Rule Refinement
Making the grammar tighter Automatic Rule Adaptation If Wc = artista Add [feat1= +] to N(artista) Add agreement constraint to NP,8 (R0) between N and ADJ ((N feat1) = (ADJ feat1)) *un artista grande *un artista gran un gran artista *un grande artista If user identified Wc, the RR module can also tag artista with ((feat1)= +) and add an agreement constraint between ADJ and N ((y2 feat1) = (y3 feat1)) to {NP,8} For each new adjective that appears in this context, the output produced by one of the two NP rules will be picked (as best) by users, and the RR module will be able to tag them as being pre-nominal (feat1 = +) or post-nominal (feat1 = -) in the lexicon. Note that without Wc RR generated correct translation, but can’t eliminate all incorrect translations Interactive and Automatic Rule Refinement
Batch Mode Implementation Proposed Work Batch Mode Implementation Automatic Rule Adaptation Given a set of user corrections, apply refinement module. For Refinement Operations of errors that can be refined fully automatically: Just by using correction information 2. Using correction and error information Main focus of my thesis 2. Such as type of error and whether there is a clue word Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This is the Refinement typology introduced at the end of the preliminary work. (This tree represents the space of all possible Rule refinement operations, given a user correcting action and the amount of information available at refinement time) [click] Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement 1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Fully automatically just by using correction information There is no information about the clue word or error type, just the info from the user correcting with the TCTool Add –Wc +al(ignment) John and May fell Juan y Maria cayeron Juan y Maria se cayeron from “se” to “fell” (Se cayeron fell) Change word order WiWc WiWi’ POS!=POS: I will help him fix the car ayudare a el a arreglar el auto le ayudare… It is a nice house – Es una casa bonito Es una casa bonita Gaudi was a great artist – Gaudi era un artista grande Gaudi era un gran artista Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement 2. Correction and Error info Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ 2: Fully automatically using correction and error information, such as clue words, error type Need to know that orgullosa is the word which is triggering the correction (namely the addition of “de”) Wc=orgullosa in order to be able to automatically refine it. The rule PP(Prep NP) already exists (the level of generality appropriate for the constraint is set by default and can be confirmed thru user interaction). PP PREP NP I am proud of you – Estoy orgullosa tu Estoy orgullosa de ti Interactive and Automatic Rule Refinement
Interactive Mode Implementation Proposed Work Interactive Mode Implementation Automatic Rule Adaptation Extra error information is required to determine triggering context automatically Need to give other relevant sentences to the user at run-time (minimal pairs) For Refinement Operations of errors that can be refined fully automatically but: 3. require a reasonable amount of further user interaction and can be solved by available correction and error information. In some cases, however, extra info required So I’ll also need to implement an interactive mode of the rule refinement module so that when it doesn’t have enough info to determine the triggering context, it can present users with more sentences to evaluate This mode of operation will be implemented for Ref Op of errors… Typically requires more effort from users and takes longer smaller test set Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Focus 3 Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This shows the refinements for error types that require a reasonable amount of further user interaction and can be solved by available correction and error information (Focus 3) -rule: suppose the rule required to generate the new POS sequence is not in the grammar (if the grammar didn’t have a PP) Let’s take a closer look at this last example and how the proposed approach would go about refining it. I see them – Veo los Los veo Interactive and Automatic Rule Refinement
Example Requiring Minimal Pair Proposed Work Example Requiring Minimal Pair Automatic Rule Adaptation 1. Run SL sentence through the transfer engine I see them *veo los los veo 2. Wi = los but no Wi’ nor Wc Need a minimal pair to determine appropriate refinement: I see cars veo autos 3. Triggering feature(s): (los,autos) = {pos} PRON(los)[pos=pron] N(autos)[pos=n] In Spanish if the direct Obj is a pronoun (clitic), it needs to be in a pre-verbal position. Wc Not a specific word, but rather a combination of features in Wi(pron+acc) If operating in batch mode, I would flip the order of the VP rule and would generate the correct translation, But would not eliminate incorrect translations (autos veo, etc.) Interactive and Automatic Rule Refinement
Refining and Adding Constraints Proposed Work Refining and Adding Constraints VP,3: VP NP VP NP VP,3’: VP NP NP VP + [NP pos =c pron] Percolate triggering features up to the constituent level: NP: PRON PRON + [NP pos = PRON pos] Block application of general rule (VP,3): VP,3: VP NP VP NP + [NP pos = (*NOT* pron)] To the appropriate rules … but for that to have any effect, we need to make sure that they pos feature is passed from the PRON to the constituent level if applicable [know if it’s applicable and which rule from MT output tree] NP… Which has the desired effect of not overgenerating for cases where the Npobj is not a pronoun But we’d still need to block the application of the general rule… Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Generalization Power When triggering feature already exists in the feature language (pos, gender, number, etc.) - I love him lo amo (before: *amo lo) - They called me yesterday me llamaron ayer (before: *llamaron me ayer) - Mary helps her with her homework Maria le ayuda con sus tareas (before: *Maria ayuda le con sus tareas) Unlike when the feat language is not expressive enough, and a new feature has to be postulated (feat1) With previous refinement,all these sentences will now be correctly translated by the system: Now X instead of Y Different rule (NP VP NP) John gave her the book Juan le dio el libro Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Proposed Work User Studies TCTool: new MT classification (Eng2Spa) Different language pair Mapudungun or Quechua Spanish Batch vs Interactive mode Amount of information elicited just corrections vs + error information 1. New way of eliciting MT error information simple statements/questions about the error possibly ordering from most informative to least informative Eng2Spa II: test new version of TCTool (v0.2) and compare error correction + classification accuracy results. Need to find the tradeoff between safety and generality. In particular, how much does the “interactivity” add (in safety) to the most general batch settings? (Alon’s suggestion) This will probably result into at least 4-8 more user studies Interactive and Automatic Rule Refinement
Evaluation of Refined MT System Evaluate best translation Automatic evaluation metrics (BLEU, NIST, METEOR) Evaluate translation candidate list precision (+parsimony) Automatic evaluation of Refined MT output Automatic Eval Necessary to maximize TQ and C while deciding on what RR operations to apply have 2 or more reference translations (sets of user corrections) Interactive and Automatic Rule Refinement
Evaluate Best translation Hypothesis file (translations to be evaluated automatically) Raw MT output: Best sentence (picked by user to be correct or requiring the least amount of correction) Refined MT output: Use METEOR score at sentence level to pick best candidate from the list Run all automatic metrics on the new hypothesis file using user corrections as reference translations. Assumption: user corrections = gold standard reference translations Method: compare raw MT output with MT output by the refined grammar and lexicon using automatic evaluation metrics, such as BLEU and METEOR Interactive and Automatic Rule Refinement
Evaluate Candidate List Precision: “tp” binary {0,1} tp + fp total number of TC SL TL SL TL SL TL SL TL = “tp” binary, since it’s either like the user correction (reference translation) 1 or it isn’t 0. precision is also a measure of parsimony, since if the size of CL grows, precision decreases recall (tp/tp+fn) makes no sense in this context since we would need to know all the possible correct translations in order to calculate fn (but there is no predefined set of fn) Interactive and Automatic Rule Refinement
Expected Contributions An efficient online GUI to display translations and alignments and solicit pinpoint fixes from non-expert bilingual users. An expandable set of rule refinement operators triggered by user corrections, to automatically refine and expand different types of grammars. A mechanism to automatically evaluate rule refinements with user corrections as reference translations. 1. contribution 90% done 2. 50% done (laying out the theory, figuring out how to do it (is an important contribution) 3. 70% (method identified, easy to implement) Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Thesis Timeline Research components Duration (months) Back-end implementation 7 User Studies 3 Resource-poor language (data + manual grammar) 2 Adapt system to new language pair 1 Active Learning methods 1 Evaluation 1 Write and defend thesis 3 Total 18 Acknowledgements Committee Avenue team Friends and colleagues Expected graduation date: May 2006 Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement References Add references: Related work Probst et al. 2002 AL Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Thanks! Questions? Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Proposed Work Data Set Split development set (~400 sentence) into: Dev set Run User Studies Develop Refinement Module Validate functionality Test set Evaluate effect of Refinement operations + Wild test set (from naturally occurring text) Requirement: need to be fully parsed by grammar 1. which can be fully parsed by the original manual grammar (50-100 rules) [Right now ~40 rules (from 12 in the previous manual grammar)] from Typological and Structural Elicitation Corpus categorized by error MT error correction larger data set MT error correction+error info smaller data set Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Some Questions Is the refinement process deterministic? Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Others TCTool Demo Simulation RR operation patterns Automatic Evaluation feasibility study AMTA paper results BLEU, NIST and METEOR Precision, recall and F1 Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Input to RR module Automatic Rule Adaptation User correction log file Transfer engine output (+ parse tree): sl: I see them tl: VEO LOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,0 (PRON,2:3 "LOS") ) ) ) )> sl: I see cars tl: VEO AUTOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,2 (N,1:3 “AUTOS") ) ) ) )> Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Completed Work Types of RR Operations Automatic Rule Adaptation Grammar: R0 R0 + R1 [=R0’ + constr] Cov[R0] Cov[R0,R1] R0 R1[=R0 + constr= -] R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2] R0 R1 [=R0 + constr] Cov[R0] Cov[R1] Lexicon Lex0 Lex0 + Lex1[=Lex0 + constr] Lex0 Lex1[=Lex0 + constr] Lex0 Lex0 + Lex1[Lex0 + TLword] Lex1 (adding lexical item) bifurcate refine Grammar: Bifurcates: leave original as is, modify the other one (maybe change word oder + make the other one more specific [VP: V Nppron in Spanish] Make more specific [agreement constraint missing] Bifurcate: general case (blocking constraint) + specific case (triggering constraint) [pre-nominal adjectives] Cov in terms of TL patterns, every time R0 is modified ->R0’, the coverage on the TL side increases Lexicon 1. cayeron se cayeron 2. woman + animate= + 4. Sense missing 3. OVW Interactive and Automatic Rule Refinement
Manual vs Learned Grammars Automatic Rule Adaptation Manual inspection: Automatic MT Evaluation: [AMTA 2004] Same test set than user study (32 sentences) Same manual grammar + automatically learned grammar using Avenue’s Rule Learning module [Probst et al. 2002] (+ added feature constraints) Looking at 5 first translations Looking at the best translation for each grammar and compared Automatic eval reflects the findings thru manual evaluation that even though MG output is better, it is also the same in many cases, scores are not that different Different types of errors different types of RR operations required Learned grammars higher level of lexicalization, need RR module to achieve appropriate level of generalization. Conclusions: - Manual G will need to be refined to encode exceptions, whereas Learned G will need to be refined to achieve the right level of generalization. - We expect the RR to give the most leverage when combined with the Learned Grammar. NIST BLEU METEOR Manual grammar 4.3 0.16 0.6 Learned grammar 3.7 0.14 0.55 Interactive and Automatic Rule Refinement
Human Oracle experiment Completed Work Human Oracle experiment Automatic Rule Adaptation As a feasibility experiment, compared raw output with manually corrected MT: statistically significant (confidence interval test) These is an upper-bound on how much difference we should expect any refinement approach to make. NIST is the least discriminative of metrics, since for bigrams that appear only once (very small set) the system doesn’t get any credit. Interactive and Automatic Rule Refinement
Proposed Work Active Learning Automatic Rule Adaptation Minimize the number of examples a human annotator must label [Cohn et al. 1994] usually by processing examples in order of usefulness. . Minimize the number of Minimal Pairs presented to users Method to ... usually by processing examples in order of usefulness. [Lewis and Catlett 94] used uncertainty as a measure of usefulness. 2. [Callison-Burch 2003] proposed AL to reduce the cost of creating a corpus of labeled training examples for Statistical MT. Add example of RR op requiring AL and minimal pair Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Order deterministic? Application of Rule Refinement operations is not deterministic, it directly depends on: The order in which it sees the corrected sentences Example: 1st agr constraint + bifurcate (WWO) C-set Reverse order C-set (!=) Given the same set of corrected sentences, the application of RR operations is not deterministic Interactive and Automatic Rule Refinement
Recycle corrections of Machine Translation output back into the system by refining and expanding existing translation rules In other words, my thesis focuses on Recyling non-expert user corrections back into the Machine Translation system by refining and expanding existing translation rules Left: represents an initial grammar (small manual grammar or automatically learned grammar) Right: refined grammar (codes for exceptions to general rules, has more (agreement) constraints, new rules, etc.)
Interactive and Automatic Rule Refinement 1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information There is no information about the clue word or error type, just the info from the user correcting with the TCTool 2nd example: +al(ignment) from “se” to “fell” (Se cayeron fell) It is a nice house – Es una casa bonito Es una casa bonita John and Mary fell – Juan y Maria cayeron Juan y Maria se cayeron Gaudi was a great artist – Gaudi era un artista grande Gaudi era un gran artista Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement 1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information There is no information about the clue word or error type, just the info from the user correcting with the TCTool Es una casa bonito Es una casa bonita J y M cayeron J y M se cayeron Gaudi was a great artist – Gaudi era un artista grande Gaudi era un gran artista I will help him fix the car – Ayudaré a él a arreglar el auto Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement 1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information There is no information about the clue word or error type, just the info from the user correcting with the TCTool I would like to go – Me gustaria que ir Me gustaria ir I will help him fix the car – Ayudaré a él a arreglar el auto Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement 2. Correction and Error info Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 2: Fully automatically using correction and error information, such as clue words, error type The rule PP(Prep NP) already exists (the level of generality appropriate for the constraint is set by default and can be confirmed thru user interaction). PP PREP NP I am proud of you – Estoy orgullosa tu Estoy orgullosa de ti Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Focus 3 Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This shows the refinements for error types that require a reasonable amount of further user interaction and can be solved by available correction and error information (Focus 3) -rule: suppose the rule required to generate the new POS sequence is not in the grammar (if the grammar didn’t have a PP) Let’s take a closer look at this last example and how the proposed approach would go about refining it. Wally plays the guitar – Wally juega la guitarra Wally toca la guitarra I saw the woman – Vi la mujer Vi a la mujer I see them – Veo los Los veo Interactive and Automatic Rule Refinement
Interactive and Automatic Rule Refinement Outside Scope of Thesis Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc +Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ By definition, there is no MP that will provide with the triggering context for this example John would have to be in the object position (I gave John the book – le di el libro a Juan), and there are only two words in common. If a word is moved outside the local rule, the RR module will feed the corrected sentence pair back into the system (to the RL) as a new training example. John read the book – A Juan leyó el libro Juan leyó el libro Where are you from? – Donde eres tu de? De donde eres tu? Interactive and Automatic Rule Refinement