Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Interactive and Automatic Refinement of Translation Rules

Similar presentations


Presentation on theme: "Towards Interactive and Automatic Refinement of Translation Rules"— Presentation transcript:

1 Towards Interactive and Automatic Refinement of Translation Rules
Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004 Thank the advisors and the committee and the audience for attending

2 Interactive and Automatic Rule Refinement
Outline Introduction Thesis statement and scope Preliminary Research Interactive elicitation of error information A framework for automatic rule adaptation Proposed Research Contributions and Thesis Timeline Motivation, related work, goals think if there is a better place to put the outline... before post-editing? Interactive and Automatic Rule Refinement

3 Machine Translation (MT)
Source Language (SL) sentence: Gaudi was a great artist Spanish translation: Gaudi era un gran artista MT System outputs : *Gaudi estaba un artista grande  *Gaudi era un artista grande If we are given the sentence (SL): Gaudi is a great artist we would like our MT system to translate it like: Gaudi es un gran artista But instead our MT system outputs two incorrect Target Language (TL) sentences Spanish has two forms of for the verb to be, one that indicates a temporary state (estaba) and one that indicates a more permanent (state). In this case, to be a great artist is permanent (unlike to be sick). On the other hand, “gran” is a pre-nominal adjective, which instanciates an exception to the general rule... [next slide] Interactive and Automatic Rule Refinement

4 Interactive and Automatic Rule Refinement
Completed Work Spanish Adjectives Automatic Rule Adaptation General order: grande  big in size NP DET ADJ N NP DET N ADJ a big house una casa grande Exception: gran  exceptional This is an exception to the rule, since “gran” is a pre-nominal adjective NP DET ADJ N a great artist un gran artista Interactive and Automatic Rule Refinement

5 Commercial and Online Systems
Correct Translation: Gaudi era un gran artista Systran, Babelfish (Altavista), WorldLingo, Translated.net : *Gaudi era  gran artista ImTranslation: *El Gaudi era un gran artista 1-800-Translate  *Gaudi era un fenomenal artista Explain errors!! Even for such a short and simple sentence like this, the output of commercial MT systems is often incorrect. In this case, and unlike the output of our system, the meaning is pretty much preserved, but more often than not, State-of-the-art MT systems mistranslate sentences giving them a totally incorrect meaning. [Show this is a big/general problem -> solution has great impact Other MT systems also produce incorrect translations  MT output requires post-editing] ImTranslation: translation2.paralink.com/ Interactive and Automatic Rule Refinement

6 Interactive and Automatic Rule Refinement
Post-editing Current solutions: Post-editing [Allen, 2003] by human linguists or editors (experts) Automated Post-Editing prototype module (APE) [Allen & Hogan, 2000] to alleviate the tedious task of correcting most frequent errors over and over No solution to fully automate post-editing process MT output still requires post-editing!!! post-editing: the correction of MT output by human linguists or editors human post-editors: some tools exist to assist in this tedious task – - APE: prototype model, much better before my research, as far as I know, there is no solution to fully automate the post-editing process, in the sense of automatically learning the post-editing rules that would fix any MT output. Interactive and Automatic Rule Refinement

7 Drawbacks of Current Methods
Manual post-editing  Corrections do not generalize  Gaudi era un artista grande  Juan es un amigo grande (Juan is a great friend)  Era una oportunidad grande (It is a great opportunity) APE  Humans need to predict all the errors ahead of time and code for the post-editing rules; given new error   Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data. Manual post-editing: most common method very tedious, since they need to do the same actions over and over for the same MT error, it would be nice to automatize APE: prototype module to do precisely that, but errors still need to be detected and predicted by a human first and then post-editing rules need to be written manually for each type of error In the face of a new error type, the APE doesn’t do anything, human editor needs to fix it, over an over. Interactive and Automatic Rule Refinement

8 Interactive and Automatic Rule Refinement
My Solution Automate post-editing efforts by feeding them back into the MT system. Possible alternatives: Automatic learning of post-editing rules + system independent - several thousands of sentences might need to be corrected for the same error Automatic refinement of translation rules + attacks the core of the problem for transfer-based MT systems (need rules to fix!) Goal: take an existing MT system and improve it (with user feedback) ALPER: system independent, but interaction of two incorrect rules can generate thousands of (very) incorrect sentences ARTR: system dependent, goes to the core of the problem, fixing one rule can prevent thousands of incorrect sentences from being generated (both are language-pair dependent) Interactive and Automatic Rule Refinement

9 Interactive and Automatic Rule Refinement
Related Work [Corston-Oliver & Gammon, 2003] [Imamura et al. 2003] [Menezes & Richardson, 2001] [Brill, 1993] [Gavaldà, 2000] [Callison-Burch, 2004] Fixing Machine Translation Rule Adaptation [Su et al. 1995] My Thesis Post-editing My research vs. Post-editing: replacing expert post-editor by a non-expert bilingual speakers + software but it does more than just mere correcting the current sentence  it corrects the problem in its core so that the same error would not occur twice. Callison-Burch  Fixing MT: experts fixing alignments (training data) for SMT (human intervention to create training data) Su et al  adjusting evaluation parameters of a SMT to respect user style [My research: transfer-based MT systems (there need to be rules to refine)] The idea of Rule Adaptation to correct language Technology applications is not new. There have been researchers applying RA to (among other NLP applications): - parsing [brill 93] (reduce error by learning a set of simple structural transformations), - NLU [gavalda, 2000] (mechanism to enable a non-expert user to dynamically extend the coverage of a NLU module, just by answering simple classification questions) - and, more relevant for my thesis, to MT [click] where researchers have applied different techniques ranging from [Corston-Oliver and Gammon] DTs to fix automatically learned linguistic representation to reduce noise (very similar to our goal!) to [Imamura and Menezes and Richardson] comparing MT translations with human reference translations to correct and eliminate redundant rules.  In contrast to filtering out incorrect or redundant rules, though, I propose to actually refine and expand the translation rules themselves, by editing valid but inaccurate rules that might be lacking a constraint for example, which will result into correctly translating sentences, that were incorrectly translated before. make sure this is clear:  As far as I know, nobody has taken user corrections and incorporated into the MT system (My niche!) Furthermore, and most importantly, my research does not rely on the existence of a parallel training corpus or a set of human reference translations, unlike all other existing methods. No pre-existing training data required No human reference translations required Non-expert user feedback [Allen & Hogan, 2000] Interactive and Automatic Rule Refinement

10 Resource-poor Scenarios (AVENUE)
Lack of electronic parallel data Lack of computational linguists  Lack of manual grammar Why bother? Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.) Preservation of their language and culture Resource-poor Languages: Mapudungun Quechua Aymara My research approach can be applied to any language-pair (given a transfer MT system), but it’s specifically designed to be able to tackle languages with little or no electronic resources, such as Mapudungun and Quechua, which makes the problem of refining translation rules even harder. Since my work is embedded in the AVENUE project, I can’t assume that there is any pre-existing training corpus or even reference translations for my testing sets. 1. (often oral tradition)  SMT and EBMT out of the question 2. lack of computational linguists with knowledge of the RP language Interactive and Automatic Rule Refinement

11 How is MT possible for resource-poor languages?
Bilingual speakers Mapuche people from Chile I took the picture  Interactive and Automatic Rule Refinement

12 AVENUE Project Overview
Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Word-Aligned Parallel Corpus Elicitation Tool Elicitation Corpus Elicitation Rule Learning Run-Time System Handcrafted rules Morphology Morpho-logical analyzer For those of you who are not familiar with the Avenue project, there are 4 main modules. The first one, elicits translations and word alignments from bilingual speakers with a user-friendly tool. The Rule Learner (3rd module) learns the translation rules from the word-aligned parallel corpus result of the Elicitation phase, and The last module is the actual transfer engine, which given the translation rules and lexical entries, and a sentence to be translated, produces a list of translation candidates or lattice. But, until now there was no way to validate the generalizations learned by the Automatic Rule Learner Nor to verify the quality of the translations produced by the system Interactive and Automatic Rule Refinement

13 My Thesis Interactive and Automatic Rule Refinement Elicitation
Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Learning Module Translation Correction Tool Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Corpus So, my thesis work consists of extracting bilingual speaker feedback with an online GUI (the TCTool) and hypothesizing refinement operations to be performed both on the grammar (transfer rules) and the lexicon so as to improve translation quality and coverage. Note that my this approach is specifically well-suited for languages with scarce resources, but can be applied to any language pair, as long as there are bilingual speakers available and there is an initial set of transfer rules for that pair. Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement

14 Resource-poor languages
Related Work Post-editing Rule Adaptation Fixing Machine Translation My Thesis Resource-poor languages Interactive and Automatic Rule Refinement

15 Interactive and Automatic Rule Refinement
Thesis Statement Given a rule-based Transfer MT system, we can: - Extract useful information from non-expert bilingual speakers to correct MT output. Automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information. So that the set of refined rules has better coverage and higher overall MT quality. Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. Furthermore, We can automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information,  so that the resulting set of rules has better coverage and higher overall MT quality. Interactive and Automatic Rule Refinement

16 Interactive and Automatic Rule Refinement
Assumptions No pre-existing parallel training data No pre-existing human reference translations The SL sentence needs to be fully parsed by the translation grammar. Bilingual speakers can give enough information about the MT errors. Explain what these mean!! 1 & 2: No pre-existing data of this form is assumed  imposed by the AVENUE project  makes the problem more challenging  3. To be able to reconstruct the entire TL tree so that I can search thru that tree to find the source of the error. (some rule needs to have applied, so that we can attempt to refine it automatically.) 4. Which I’ll show you is true in the preliminary work section. Interactive and Automatic Rule Refinement

17 Interactive and Automatic Rule Refinement
Scope Evaluate automatic refinement for the following conditions: 1. User correction information only. 2. Correction and error information. 3. Extra information is required  user interaction. Both in manually written and automatically learned grammars [AMTA 2004]. I will evaluate how well I can do Automatic RR for each of these conditions: 1. only taking into account the correction of the MT output given by user 2. taking into account correction plus any other error information elicited from user, such as error type, triggering word, etc. 3. a reasonable amount of user interaction is required Fall outside the scope: types of errors for which - users cannot give the required information, or - further user interaction might take too long and might not provide with the required information In the AMTA paper we compare manual and automatically learned grammars for the purpose of automatic RR. We found that there is a difference (albeit) small between the TQ produced by the manual grammar and the AL grammar, but the important distinction is that I need to apply different types of RR for different types of grammar. Interactive and Automatic Rule Refinement

18 Automatic Evaluation of Refinement process
Technical Challenges Automatic Evaluation of Refinement process Automatically Refine and Expand Translation Rules minimally Manually written Automatically Learned Base: crucial component, without which it makes no sense to try to automatically learn Refinement operations.  Minimal in this context refers to minimal post-editing, which usually means to make the least amount of changes possible for producing an understandable working document, rather than producing a high quality document. However, unlike Allen’s (2003) definition of MPE, I do not mean MPE in the sense of producing MT output of less quality, but rather of having bilingual users perform the least amount of changes to the translation so that the same rule that originally generated the SL sentence can be refined.  Because my goal is to elicit rather technical information from non-expert users, I need to find a way to classify errors that is not too hard for naive users to do accurately, and at the same time is useful for automatic RR. 2. (minimal description length) MDL = parsimony so that grammar size doesn't increase unnecessarily (blow up) with lots of local refinements  Part of the challenge of this second aspect of my thesis is to refine both manual and automatically learned translation rules In order to be able to improve translation quality and coverage automatically, We need to devise a method evaluate the refined system output also automatically so that evaluation metrics results can drive the refinement process. Elicit minimal MT information from non-expert users Interactive and Automatic Rule Refinement

19 Preliminary Work Interactive elicitation of error information
A framework for automatic rule adaptation

20 Error Typology for Automatic Rule Refinement (simplified)
Completed Work Interactive elicitation of error information Error Typology for Automatic Rule Refinement (simplified) Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom Missing constraint Extra constraint Missing word Extra word Wrong word order Incorrect word Wrong agreement Walk them thru it, explain!!! After looking at several hundred sentences (eng2spa) I organized the different types of MT errors into a typology, keeping in mind the ultimate goal of automatic rule refinement.  Need to Find appropriate level of granularity for MT error classification Interactive and Automatic Rule Refinement

21 Interactive and Automatic Rule Refinement
TCTool (Demo) Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order To address the types of MT errors identified (described in previous page), we built an online GUI which allows non-expert bilingual users to reliably detect and minimally correct MT errors by doing one of the following correcting actions: (missing a word ) Adding a word (Extra word ) deleting a word, etc. (incorrect word + wrong agreement ) Modify a word (wrong word order ) change word order  for edit-word window, just say: asks user to choose among several non-technical statements (as opposed to asking them to classify between linguistically motivated classes) Given: SL sentence (e.g. I see them) TL sentence (e.g. Yo veo los) word-to-word alignments (I-yo, see-veo, them-los) (context) Interactive and Automatic Rule Refinement

22 Interactive and Automatic Rule Refinement
1st Eng2Spa User Study Completed Work Interactive elicitation of error information [LREC 2004] MT error classification  9 linguistically-motivated classes [Flanagan, 1994], [White et al. 1994]: word order, sense, agreement error (number, person, gender, tense), form, incorrect word and no translation precision recall error detection 90% 89% error classification 72% 71% In the LREC paper, we show that users can reliably detect and classify MT errors, given an initial error classification with 9 linguistically-motivated classes  it turns out this is harder than needs be, since some of these distinctions are not important for automatic RR, whereas some other info which could be elicited is not currently being elicited with this classification. Manual grammar (12 rules) lexical entries Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect) Interested in high precision, even at the expense of lower recall Users did not always fix a translation in the same way When the final translation was not = gold standard, it was still correct or better (better stylistically) ------ For more details, see LREC paper or thesis proposal document Interactive and Automatic Rule Refinement

23 Interactive and Automatic Rule Refinement
Translation Rules {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ;; English parsing: ((x0 def) = (x1 def))  NP definiteness = DET definiteness (x0 = x3)  NP = N (N is the head of the NP) ;; Spanish generation: ((y1 agr) = (y2 agr))  DET agreement = N agreement ((y3 agr) = (y2 agr))  ADJ agreement = N agreement (y2 = x3) )  Pass the features of English N to Spanish N ADJ::ADJ |: [nice] -> [bonito] ((X1::Y1) ((x0 pos) = adj) ((x0 form) = nice) ((y0 agr num) = sg)  Spanish ADJ is singular in number ((y0 agr gen) = masc))  Spanish ADJ is masculine in number Rule formalism Showing raw material my module works on (grammar + lexicon) X-side = English (SL) Y-side = Spanish (TL) Interactive and Automatic Rule Refinement

24 Automatic Rule Refinement Framework
Completed Work Automatic Rule Adaptation Find best RR operations given a: Grammar (G), Lexicon (L), (Set of) Source Language sentence(s) (SL), (Set of) Target Language sentence(s) (TL), Its Parse tree (P), and Minimal correction of TL (TL’) such that TQ2 > TQ1 Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) such that the translation quality of the refined grammar is higher than that of the original grammar. And/Or coverage increased And/Or ambiguity reduced Interactive and Automatic Rule Refinement

25 Types of Refinement Operations
Completed Work Types of Refinement Operations Automatic Rule Adaptation 1. Refine a translation rule: R0  R1 (change R0 to make it more specific or more general) R0: NP DET N ADJ DET ADJ N a nice house una casa bonito There are 2 main refinement operations that can be applied both to grammar rules and lexical entries, with some minor differences. Refine: change existing rule, replace old (incorrect) rule with the new corrected rule Sometimes a rule is mostly correct, but is missing an agreement constraint:  Example on this slide Automatically learned grammars some times over-generalize, and such cases the rules have an extra agreement constraint, which makes the rule incorrect.  Results in higher precision and a tighter grammar reducing size of candidate list R1: NP DET N ADJ DET ADJ N N gender = ADJ gender a nice house una casa bonita Interactive and Automatic Rule Refinement

26 Types of Refinement Operations (2)
Completed Work Types of Refinement Operations (2) Automatic Rule Adaptation 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (add a new more specific rule) R0: NP DET ADJ N NP DET N ADJ a nice house una casa bonita The second type of operation I’ll like to mention is bifurcate, which adds a copy of the general rule and makes it more specific rule without removing the original (more general) rule Rop1: leave original R0 rule as is, modify the duplicate, R1, This example shows the refinement operation required to also cover the pre-nominal order of ADJ in Spanish (exception to the general rule). You can find a more comprehensive discussion of possible types of refinement operations in the proposal document. Unfortunately I cannot cover them all in my talk. In the document I give a more comprehensive discussion of When users add or delete a word, the safest Rop type is bifurcate, until we have evidence that the original rule can never apply ( interactive mode) High precision, but increases size of candidate list Coverage in terms of TL patterns: every time R0 is modified ->R0’, the coverage on the TL side increases R1: NP DET ADJ N ADJ type: pre-nominal a great artist un gran artista Interactive and Automatic Rule Refinement

27 Formalizing Error Information
Completed Work Formalizing Error Information Automatic Rule Adaptation Wi = error Wi’ = correction Wc = clue word NP DET ADJ N NP DET N ADJ Wi = bonito a nice house una casa bonito Wi = namely the word that needs to be modified, deleted or dragged into a different position by the user in order for the sentence to be correct Wi’= namely the user modification of Wi or the word that needs to be added by the user in order for the sentence to be correct. Wc = represents the word that gives away the clue with respect to what triggered the correction, namely the cause of the error. Example: in the case of lack of agreement between a noun and the adjective that modifies it, as in *el auto roja (the red car), Wc is instantiated with auto, namely the word that gives away the clue about what the gender agreement feature value of Wi (roja) should be. Wc can also be a phrase or constituent like a plural subject (e.g.. *[Juan y Maria] cai). Wc is not always present and it can be before or after Wi (i>c or i<c). They can be contiguous or separated by one or more words. Wc = casa NP DET ADJ N NP DET N ADJ N gender = ADJ gender a nice house una casa bonita Wi’ = bonita Interactive and Automatic Rule Refinement

28 Triggering Feature Detection
Completed Work Triggering Feature Detection Automatic Rule Adaptation Comparison at the feature level to detect triggering feature(s) Delta function: (Wi,Wi’) Examples: (bonito,bonita) = {gender} (comiamos,comia) = {person,number} (mujer,guitarra) = {} If  set is empty, need to postulate a new binary feature gen = masc gen = masc Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. Triggering feature: namely what feature attributes has a different value in Wi and Wi’ Delta set empty = the existing feature language is not expressive enough to distinguish between Wi and Wi’ Interactive and Automatic Rule Refinement

29 Deciding on the Refinement Op
Completed Work Deciding on the Refinement Op Automatic Rule Adaptation Given: - Action performed by the user (add, delete, modify, change word order) - Error information available (clue word, word alignments, etc.)  Refinement Action Deciding on the appropriate rule refinement operation: Given (1 and 2), the method proposed can determine what Refinement action to take Interactive and Automatic Rule Refinement

30 Interactive and Automatic Rule Refinement
Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This tree represents the space of all possible Rule refinement operations, given a user correcting action and the amount of information available at refinement time And here is an example of how the proposed approach would work thru a simulation Interactive and Automatic Rule Refinement

31 Proposed Work Rule Refinement Example Batch mode implementation
Interactive mode implementation User Studies Evaluation

32 Rule Refinement Example
Automatic Rule Adaptation Change word order SL: Gaudí was a great artist MT system output: TL: Gaudí era un artista grande Goal (given by user correction): *Gaudí era un artista grande Gaudí era un gran artista Given incorrect MT output and the translation grammar that produced it, an expert would know what to do to fix the error (editor: fix sentence + computational linguist: fix the translation rule), but it is a very hard problem for a piece of software to do that. Here is a successful example of how my approach can solve this. Interactive and Automatic Rule Refinement

33 Interactive and Automatic Rule Refinement
Automatic Rule Adaptation 1. Error Information Elicitation Given the TL produced by the MT system for a specific SL sentence, Feed the translation pair thru the TCTool to elicit the correction and error information. Refinement Operation Typology Interactive and Automatic Rule Refinement

34 2. Variable Instantiation from Log File
Automatic Rule Adaptation Correcting Actions: 1. Word order change (artista grande  grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word  Wc = artista In this case, even if user had not identified Wc, refinement process would have been the same Input correction log file with transfer engine output (parse tree) to Refinement module  variable instantiation. Assumption:  The reason I know “great” should be “gran” and not “grande” is that it modifies “artist”. (in this case artista might not really be the triggering word (gran is always a pre-nominal adj) but I do it for the sake of the argument, I don’t have time to introduce a new example and explain it well) Interactive and Automatic Rule Refinement

35 3. Retrieve Relevant Lexical Entries
Automatic Rule Adaptation No lexical entry for [great  gran] Duplicate lexical entry [great  grande] and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) (Morphological analyzer: grande = gran) ADJ::ADJ |: [great] -> [grande] ((X1::Y1) (…) ((y0 agr num) = sg) ((y0 agr gen) = masc)) First step is to retrieve Relevant Lexical Entries, but since there is no lexical entry for greatgran, need to Add Lexical Entry for “gran” Lexical bifurcation (adding a new sense to the lexicon)  always? simple starting position, we might want to just copy certain features to the new entry (maybe POS-dependant), but it’s a research question which features. Lex0  Lex0 + Lex1[Lex0 +  TLword] Even if we had a morphological analyzer , it would find no difference between grande and gran  my research can do something we couldn’t do even with a morphology module grande grande AQ0CS0 grande NCCS000 gran gran AQ0CS0 Parole tags: A=adjective Q=qualifying 0=no degree C=common gender S= number sg 0= no case Interactive and Automatic Rule Refinement

36 4. Finding Triggering Feature(s)
Automatic Rule Adaptation Feature  function: (Wi, Wi’) =   need to postulate a new binary feature: feat1 5. Blame assignment (from MT system output) tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Grammar Empty delta set = feature language is not expressive enough to distinguish between “gran” and “grande”, need to postulate a new feature Blame assignment = what rules need to be refined (from parse output by xfer engine):  This is the MT output tracing what rules applied and in what order. Even if user had not identified Wc, the same RR process could be done fully automatically, since “grande/gran” is moved around locally within the NP. S,1 NP,1 NP,8 Interactive and Automatic Rule Refinement

37 6. Variable Instantiation in the Rules
Automatic Rule Adaptation Wi = grande  POSi = ADJ = Y3, y3 Wc = artista  POSc = N = Y2, y2 {NP,8} ;; Y1 Y2 Y3 NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) So that when the order gets flipped we know what variables need to be changed in the rule Interactive and Automatic Rule Refinement

38 Interactive and Automatic Rule Refinement
7. Refining Rules Automatic Rule Adaptation Bifurcate NP,8  NP,8 (R0) + NP,8’ (R1) (flip order of ADJ-N) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) =c requires the feature to be specified in the ADJ lexical entry and to be + If it’s underspecified or -, won’t unify Interactive and Automatic Rule Refinement

39 8. Refining Lexical Entries
Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((y0 feat1) = +)) 5. Modify grammar and lexicon by applying RR ops. Interactive and Automatic Rule Refinement

40 Interactive and Automatic Rule Refinement
Done? Not yet Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Need to restrict application of general rule (R0) to just post-nominal ADJ un artista grande un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces an incorrect translation that we know is incorrect (by the minimal post-editing maximum). Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Given another adjective underspecified with respect to feat1 (all new adjectives), it will unify with the general rule (NP,8), which is the desired effect  default. Interactive and Automatic Rule Refinement

41 Add Blocking Constraint
Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Can we also eliminate incorrect translations automatically? un artista grande *un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces incorrect translations that we know are incorrect. Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Eliminate incorrect translations automatically?  tighter grammar (minimal description length)  Since great translates both as grande and gran, both rules will be applied, one with each type of adjective. Interactive and Automatic Rule Refinement

42 Making the grammar tighter
Automatic Rule Adaptation If Wc = artista Add [feat1= +] to N(artista) Add agreement constraint to NP,8 (R0) between N and ADJ ((N feat1) = (ADJ feat1)) *un artista grande *un artista gran un gran artista *un grande artista If user identified Wc, the RR module can also tag artista with ((feat1)= +) and add an agreement constraint between ADJ and N ((y2 feat1) = (y3 feat1)) to {NP,8} For each new adjective that appears in this context, the output produced by one of the two NP rules will be picked (as best) by users, and the RR module will be able to tag them as being pre-nominal (feat1 = +) or post-nominal (feat1 = -) in the lexicon. Note that without Wc  RR generated correct translation, but can’t eliminate all incorrect translations I plan to make the grammar as tight as possible with the information available. If no triggering word is identified for a particular error, it might be impossible to do so automatically Interactive and Automatic Rule Refinement

43 Batch Mode Implementation
Proposed Work Batch Mode Implementation Automatic Rule Adaptation Given a set of user corrections, apply refinement module. For Refinement Operations of errors that can be refined fully automatically using: Correction information only 2. Correction and error information Main focus of my thesis 2. Such as type of error and whether there is a clue word error type, clue word Interactive and Automatic Rule Refinement

44 Interactive and Automatic Rule Refinement
Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This is the Refinement typology introduced at the end of the preliminary work. (This tree represents the space of all possible Rule refinement operations, given a user correcting action and the amount of information available at refinement time) [click] Interactive and Automatic Rule Refinement

45 Interactive and Automatic Rule Refinement
1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Fully automatically just by using correction information There is no information about the clue word or error type, just the info from the user correcting with the TCTool  default settings (look for words with the value fem for gender in the sentence and hypothesize constraints, validate with test set) Add –Wc +al(ignment) John and May fell  Juan y Maria cayeron  Juan y Maria se cayeron from “se” to “fell” (Se cayeron  fell) Change word order WiWc WiWi’ POS!=POS: I will help him fix the car  ayudare a el a arreglar el auto  le ayudare… It is a nice house – *Es una casa bonito  Es una casa bonita Gaudi was a great artist – *Gaudi era un artista grande  Gaudi era un gran artista Interactive and Automatic Rule Refinement

46 Interactive and Automatic Rule Refinement
2. Correction and Error info Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ 2: Fully automatically using correction and error information, such as clue words, error type Need to know that orgullosa is the word which is triggering the correction (namely the addition of “de”) Wc=orgullosa in order to be able to automatically refine it. The rule PP(Prep NP) already exists (the level of generality appropriate for the constraint is set by default and can be confirmed thru user interaction). PP  PREP NP I am proud of you – *Estoy orgullosa de tu  Estoy orgullosa de ti Interactive and Automatic Rule Refinement

47 Interactive Mode Implementation
Proposed Work Interactive Mode Implementation Automatic Rule Adaptation Extra error information is required to determine triggering context automatically  Need to give other relevant sentences to the user at run-time (minimal pairs) For Refinement Operations of errors that can be refined fully automatically but: 3. require a further user interaction In some cases, however, extra info required So I’ll also need to implement an interactive mode of the rule refinement module so that when it doesn’t have enough info to determine the triggering context, it can present users with more sentences to evaluate This mode of operation will be implemented for Ref Op of errors… require a reasonable amount of further user interaction and can be solved by available correction and error information. Typically requires more effort from users and takes longer  smaller test set I’d like to investigate Active Learning methods to maximize user time (by presenting them with the most informative MP first) AL = Method to Minimize the number of examples a human annotator must label [Cohn et al. 1994] usually by processing examples in order of usefulness usually by processing examples in order of usefulness. [Lewis and Catlett 94] used uncertainty as a measure of usefulness. [Callison-Burch 2003] proposed AL to reduce the cost of creating a corpus of labeled training examples for Statistical MT. Interactive and Automatic Rule Refinement

48 Interactive and Automatic Rule Refinement
3. Further user interaction Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This shows the refinements for error types that require a reasonable amount of further user interaction and can be solved by available correction and error information (-rule: suppose the rule required to generate the new POS sequence is not in the grammar (if the grammar didn’t have a PP))  object pronouns in Spanish need to precede the verb Let’s take a closer look at this last example and how the proposed approach would go about refining it. I see them – *Veo los  Los veo Interactive and Automatic Rule Refinement

49 Example Requiring Minimal Pair
Proposed Work Example Requiring Minimal Pair Automatic Rule Adaptation 1. Run SL sentence through the transfer engine I see them  *veo los Correct TL: los veo 2. Wi = los but no Wi’ nor Wc Need a minimal pair to determine appropriate refinement: I see cars  veo autos 3. Triggering feature(s): (veo los, veo autos) (los,autos) = {pos} PRON(los)[pos=pron] N(autos)[pos=n] In Spanish if the direct Obj is a pronoun (clitic), it needs to be in a pre-verbal position. (explain it!!!) Wc  Not a specific word, but rather a combination of features in Wi(pron+acc) Interactive and Automatic Rule Refinement

50 Refining and Adding Constraints
Proposed Work Refining and Adding Constraints VP,3: VP NP  VP NP (veo los, veo autos) VP,3’: VP NP  NP VP + [NP pos =c pron] (los veo, *autos veo) Percolate triggering features up to the constituent level: NP: PRON  PRON + [NP pos = PRON pos] Block application of general rule (VP,3): VP,3: VP NP  VP NP + [NP pos = (*NOT* pron)] *veo los, veo autos (los veo, *autos veo) To the appropriate rules … but for that to have any effect, we need to make sure that they pos feature is passed from the PRON to the constituent level if applicable [know if it’s applicable and which rule from MT output tree] NP… Which has the desired effect of not overgenerating for cases where the Npobj is not a pronoun If operating in batch mode, the refined rule would have the order of the VP and NP flipped and would generate the correct translation, but would not be able to eliminate incorrect translations (such as autos veo, etc.) But we’d still need to block the application of the general rule… Interactive and Automatic Rule Refinement

51 Interactive and Automatic Rule Refinement
Generalization Power When triggering feature already exists in the feature language (pos, gender, number, etc.) I see them  *veo los  los veo - I love him  lo amo (before: *amo lo) - They called me yesterday  me llamaron ayer (before: *llamaron me ayer) - Mary helps her with her homework  Maria le ayuda con sus tareas (before: *Maria ayuda le con sus tareas) Is the correction specific to a particular lexical entry or can I generalize it across a group of lexical entries that have the same properties (and thus behave the same way)? Unlike when the feat language is not expressive enough, and a new feature has to be postulated (feat1) (where refinement doesn’t generalize to other lexical entries that exhibit the same behaviour, unless tagged by the system with new (local) user corrections.  these cases represent a step towards generalization (semantic annotation), which can then be used a posteriori to make generalizations automatically. (this is the case of the Gaudi example I’ve shown you)  can use wordnet to find relation between words that required same refinement With previous refinement, all these sentences will now be correctly translated by the system: Now X instead of Y Different rule (NP VP NP) John gave her the book  Juan le dio el libro Interactive and Automatic Rule Refinement

52 Interactive and Automatic Rule Refinement
Proposed Work User Studies TCTool: new MT classification (Eng2Spa) Different language pair Mapudungun or Quechua  Spanish Batch vs Interactive mode Amount of information elicited just corrections vs + error information 1. New way of eliciting MT error information simple statements/questions about the error possibly ordering from most informative to least informative Eng2Spa II: test new version of TCTool (v0.2) and compare error correction + classification accuracy results. Need to find the tradeoff between safety and generality. In particular, how much does the “interactivity” add (in safety) to the most general batch settings? (Alon’s suggestion) This will probably result into at least 4-8 more user studies  Interactive and Automatic Rule Refinement

53 Evaluation of Refined MT Output
1. Evaluate best translation  Automatic evaluation metrics (BLEU, NIST, METEOR) 2. Evaluate translation candidate list size  precision (includes parsimony) Automatic evaluation of Refined MT output Necessary to maximize TQ and C while deciding on what RR operations to apply (at least 2 users correcting the same set of sentences  have 2 or more reference translations (sets of user corrections) Interactive and Automatic Rule Refinement

54 1. Evaluate Best translation
Hypothesis file (translations to be evaluated automatically) Raw MT output: Best sentence (picked by user to be correct or requiring the least amount of correction) Refined MT output: Use METEOR score at sentence level to pick best candidate from the list  Run all automatic metrics on the new hypothesis file using user corrections as reference translations. Assumption: user corrections = gold standard  reference translations Method: compare raw MT output with MT output by the refined grammar and lexicon using automatic evaluation metrics, such as BLEU and METEOR Interactive and Automatic Rule Refinement

55 2. Evaluate Translation Candidate List
Precision: tp binary {0,1} (1 =user correction) tp + fp total number of TLs SL TL X SL TL TL X SL TL TL X SL TL TL X SL TL TL X = “tp” binary, since it’s either like the user correction (reference translation)  1 or it isn’t  0. precision is also a measure of parsimony, since if the size of CL grows, precision decreases Say: 1: 3 incorrect  precision = 0 2: 1 correct, 4 incorrect = precision 0.2 3: 2 correct and 3 incorrect, but precision is still since the last correct translation is not like any user correction (reference translation), the system doesn’t know it’s correct, so it doesn’t contribute to the final score. This demonstrates the problem of automatic eval metrics, however there is no predefined set of all possible translations (fn). So if no user corrected the sentence in that way, it won’t be picked up by the measures. That’s is why recall cannot be used in this context to measure improvement. recall (tp/tp+fn) makes no sense in this context since  we would need to know all the possible correct translations in order to calculate fn (correct translations not identified by users) 4: 1 correct, 2 incorrect = precision 0.333 5: 1 correct 1 incorrect = precision 0.5  user correction = 0/3 1/5 1/5 1/3 1/2 Interactive and Automatic Rule Refinement

56 Expected Contributions
An efficient online GUI to display translations and alignments and solicit pinpoint fixes from non-expert bilingual users. An expandable set of rule refinement operations triggered by user corrections, to automatically refine and expand different types of grammars. A mechanism to automatically evaluate rule refinements with user corrections as reference translations. 1. contribution  almost completed % done (laying out the theory, figuring out how to do it (is an important contribution) 3. 70% (method identified, easy to implement) Interactive and Automatic Rule Refinement

57 Interactive and Automatic Rule Refinement
Thesis Timeline Research components Duration (months) Back-end implementation User Studies Resource-poor language (data + manual grammar) 2 Adapt system to new language pair 1 Evaluation Write and defend thesis Total Acknowledgements Committee Avenue team Friends and colleagues Expected graduation date: May 2006 Interactive and Automatic Rule Refinement

58 Interactive and Automatic Rule Refinement
Thanks! Questions? Interactive and Automatic Rule Refinement

59 Interactive and Automatic Rule Refinement

60 Interactive and Automatic Rule Refinement

61 Interactive and Automatic Rule Refinement
Some Questions What if users corrections are different (user noise)? More than one correction per sentence? Wc example Data set Where is appropriate to refine vs bifurcate? Lexical bifurcate Is the refinement process deterministic? Interactive and Automatic Rule Refinement

62 Interactive and Automatic Rule Refinement
Others TCTool Demo Simulation RR operation patterns Automatic Evaluation feasibility study AMTA paper results User studies map Precision, recall, F1 NIST, BLEU, METEOR Interactive and Automatic Rule Refinement

63 Interactive and Automatic Rule Refinement
Precision, Recall and F1 Precision: tp tp + fp (selected, incorrect) Recall: tp tp + fn (correct, not selected) F1: PR (P + R) F measures overall performance F1 measures an even combination of precision and recall. It is defined as 2*p*r/(p+r). The F1 measure is often the only simple measure that is worth trying to maximize on its own - consider the fact that you can get a perfect precision score by always assigning zero categories, or a perfect recall score by always assigning every category. A truly smart system will assign the correct categories and only the correct categories, maximizing precision and recall at the same time, and therefore maximizing the F1 score. Percentage of things I got right  accuracy Percentage of things I got wrong  error (usually not good measures, since tn is huge!) back to main Interactive and Automatic Rule Refinement

64 Automatic Evaluation Metrics
BLEU: averages the precision for unigram, bigram and up to 4-grams and applies a length penalty [Papineni, 2001]. NIST: instead of n-gram precision the information gain from each n-gram is taken into account [NIST 2002]. METEOR: assigns most of the weight to recall, instead of precision and uses stemming [Lavie, 2004] BLEU: length penalty if the generated sentence is shorter than the best matching (in length) reference translation The NIST metric is derived from the BLEU evaluation criterion but differs in one fundamental aspect: Idea behind NIST score: gives more credit if a system gets an n-gram match that is difficult, but to give less credit for an n-gram match which is easy Recent research has shown that a balanced harmonic mean (F1 measure) of unigram precision and recall outperforms the widely used BLEU and NIST metrics for Machine Translation evaluation in terms of correlation with human judgments of translation quality. METEOR (Metric for Evaluation of Translation with Explicit word Ordering) [lavie04] performs a maximal-cardinality match between translations and references, and uses the match to compute a coherence-based penalty. This computation is done by assessing the extent to which the matched words between translation and reference constitute well ordered coherent ``chunks'' back to main Interactive and Automatic Rule Refinement

65 Interactive and Automatic Rule Refinement
Proposed Work Data Set Split development set (~400 sentence) into: Dev set  Run User Studies Develop Refinement Module Validate functionality Test set  Evaluate effect of Refinement operations + Wild test set (from naturally occurring text) Requirement: need to be fully parsed by grammar 1. which can be fully parsed by the original manual grammar ( rules) [Right now ~40 rules (from 12 in the previous manual grammar)] from Typological and Structural Elicitation Corpus categorized by error MT error correction larger data set MT error correction+error info smaller data set back to questions Interactive and Automatic Rule Refinement

66 Interactive and Automatic Rule Refinement
Refine vs Bifurcate Batch mode  bifurcate (no way to tell if the original rule should never apply) Interactive mode  refine (change original rule) if can get enough evidence that original rule never applies. Corrections involving agreement constraints seem to hold for all cases  refine (Open research question) batc mode: need to be conservative, no way to tell... unless there are plenty of relevant MP examples in the test set that support the hypothesis that the general rule is incorrect for all cases, back to questions Interactive and Automatic Rule Refinement

67 More than one correction/sentence
A B TL X X A 1st X B 2nd X Tetris approach to Automatic Rule Refinement Assumption: different corrections to different words  different error 2 different corrections to 2 different words  assume they are different corrections Exception: structural divergences (I swam across the lake  nade a traves del lago  cruze el lago nadando) next slide back to questions Interactive and Automatic Rule Refinement

68 Exception: structural divergences
He danced her out of the room  * La bailó fuera de la habitación her he-danced out of the room  La sacó de la habitación bailando her he-take-out of the room dancing Have no way of knowing that these corrections are related Do one error at a time, if TQ decreases over the test set, hypothesize that it’s a divergence Feed to the Rule Learner as a new (manually corrected) training example How can we automatically distinguish between structural divergences and totally garbled sentences? it essentially does what Bonnie’s divergences module (DUSTER) does. back to questions Interactive and Automatic Rule Refinement

69 Constituent order change
I gave him the tools  *di a él las herramientas I-gave to him the tools  le di las herramientas a él him I-gave the tools to him desired refinement: VPVP PP(a PRON) NP VPVP NP PP(a PRON)  Can extract constituent information from MT output (parse tree) and treat as one error/correction Even if user changes a whole constituent to a different position, if it has more than one words, the RR module will have 2 different correcting actions done to 2 different words  might want to check in the parse tree (MT output) if this is the case, treat as one correction/error. Interactive and Automatic Rule Refinement

70 More than one correction/error
A+B TL X Example: edit and move same word (like: gran, bailó) Occam’s razor Assumption: both corrections are part of the same error can always validate with automatic evaluation back to questions Interactive and Automatic Rule Refinement

71 Interactive and Automatic Rule Refinement
Wc Example I am proud of you  *Estoy orgullosa de tu Wc=de tuti I-am proud of you-nom  Estoy orgullosa de ti I-am proud of you-oblic Without Wc information, would need to increase the ambiguity of the grammar significantly! +[you ti]: I love you  *ti quiero (te quiero,…) you read  *ti lees (tu lees,…) If we added a lexical entry for ti (supposing there isn’t one already), without any restriction about the context it should appear in, The MT system will now output “ti” as a translation for “you” both in object and subject positions  back to questions Interactive and Automatic Rule Refinement

72 Interactive and Automatic Rule Refinement
Lexical bifurcate Should the system copy all the features to the new entry? Good starting point Might want to copy just a subset of features (possibly POS-dependent)  Open Research Question But it more research needs to be done to see which features should get copied. back to questions Interactive and Automatic Rule Refinement

73 Interactive and Automatic Rule Refinement
Automatic Rule Adaptation Interactive and Automatic Rule Refinement

74 Interactive and Automatic Rule Refinement
Automatic Rule Adaptation SL + best TL picked by user Interactive and Automatic Rule Refinement

75 Interactive and Automatic Rule Refinement
Automatic Rule Adaptation Changing “grande” into “gran” Interactive and Automatic Rule Refinement

76 Interactive and Automatic Rule Refinement
Automatic Rule Adaptation back to main Interactive and Automatic Rule Refinement

77 Interactive and Automatic Rule Refinement
Automatic Rule Adaptation 1 2 3 Interactive and Automatic Rule Refinement

78 Interactive and Automatic Rule Refinement
Input to RR module Automatic Rule Adaptation User correction log file Transfer engine output (+ parse tree): sl: I see them tl: VEO LOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,0 (PRON,2:3 "LOS") ) ) ) )> sl: I see cars tl: VEO AUTOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,2 (N,1:3 “AUTOS") ) ) ) )> back to main Interactive and Automatic Rule Refinement

79 Interactive and Automatic Rule Refinement
Completed Work Types of RR Operations Automatic Rule Adaptation Grammar: R0  R0 + R1 [=R0’ + constr] Cov[R0]  Cov[R0,R1] R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] Lexicon Lex0  Lex0 + Lex1[=Lex0 + constr] Lex0  Lex1[=Lex0 + constr] Lex0  Lex0 + Lex1[Lex0 +  TLword]   Lex1 (adding lexical item) bifurcate refine Grammar: Bifurcates: leave original as is, modify the other one (maybe change word oder + make the other one more specific [VP: V Nppron in Spanish] Make more specific [agreement constraint missing] Bifurcate: general case (blocking constraint) + specific case (triggering constraint) [pre-nominal adjectives] Cov in terms of TL patterns, every time R0 is modified ->R0’, the coverage on the TL side increases Lexicon 1. cayeron  se cayeron 2. woman + animate= + 4. Sense missing 3. OVW back to main Interactive and Automatic Rule Refinement

80 Manual vs Learned Grammars
Automatic Rule Adaptation Manual inspection: Automatic MT Evaluation: [AMTA 2004] Same test set than user study (32 sentences) Same manual grammar + automatically learned grammar using Avenue’s Rule Learning module [Probst et al. 2002] (+ added feature constraints) Looking at 5 first translations Looking at the best translation for each grammar and compared Automatic eval reflects the findings thru manual evaluation that even though MG output is better, it is also the same in many cases, scores are not that different Different types of errors  different types of RR operations required Learned grammars higher level of lexicalization, need RR module to achieve appropriate level of generalization. Conclusions: - Manual G will need to be refined to encode exceptions, whereas Learned G will need to be refined to achieve the right level of generalization. - We expect the RR to give the most leverage when combined with the Learned Grammar. NIST BLEU METEOR Manual grammar 4.3 0.16 0.6 Learned grammar 3.7 0.14 0.55 back to main Interactive and Automatic Rule Refinement

81 Human Oracle experiment
Completed Work Human Oracle experiment Automatic Rule Adaptation As a feasibility experiment, compared raw output with manually corrected MT: statistically significant (confidence interval test) These is an upper-bound on how much difference we should expect any refinement approach to make. NIST is the least discriminative of metrics, since for bigrams that appear only once (very small set) the system doesn’t get any credit. back to main Interactive and Automatic Rule Refinement

82 Interactive and Automatic Rule Refinement
Order deterministic? RR op application is not deterministic  Order of the corrected sentences input to the system Example: 1st: gran artista  bifurcate (2 rules) 2nd: casa bonito  add agr constraint to only 1 rule (original, general rule) the specific rule is still incorrect (missing agr constraint) 1st: casa bonito  add agr constraint 2nd: gran artista  bifurcate  both rules have agreement constraint (optimal order) Given the same set of corrected sentences, the application of RR operations is not deterministic, it directly depends on  Establish a Refinement Operation Hierarchy: 1. Refine operations apply first (agreement constraints) 2. Bifurcate operations apply second (different word order, add a word, delete a word) back to questions Interactive and Automatic Rule Refinement

83 Interactive and Automatic Rule Refinement
User noise? Solution: Have several users evaluate and correct the same test set  threshold, 90% agreement - correction - error information (type, clue word) Only modify the grammar if enough evidence of incorrect rule. in general, we only want to refine the grammar/lexicon, if we have enough evidence that... back to questions Interactive and Automatic Rule Refinement

84 Interactive and Automatic Rule Refinement
Proposed Work User Studies Map Batch mode Only Corrections Learned grammars Corrections+error info RR module Eng2spa X Manual grammars X Mapu2Spa X interactive mode Active learning Most of these dimensions interact with each other, and thus several sets of user studies need to be run. back to main Interactive and Automatic Rule Refinement

85 Recycle corrections of Machine Translation output back into the system by refining and expanding existing translation rules In other words, my thesis focuses on Recyling non-expert user corrections back into the Machine Translation system by refining and expanding existing translation rules Left: represents an initial grammar (small manual grammar or automatically learned grammar) Right: refined grammar (codes for exceptions to general rules, has more (agreement) constraints, new rules, etc.)

86 Interactive and Automatic Rule Refinement
1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information  There is no information about the clue word or error type, just the info from the user correcting with the TCTool 2nd example: +al(ignment) from “se” to “fell” (Se cayeron  fell) It is a nice house – Es una casa bonito  Es una casa bonita John and Mary fell – Juan y Maria  cayeron  Juan y Maria se cayeron Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista Interactive and Automatic Rule Refinement

87 Interactive and Automatic Rule Refinement
1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information  There is no information about the clue word or error type, just the info from the user correcting with the TCTool Es una casa bonito  Es una casa bonita J y M cayeron  J y M se cayeron Gaudi was a great artist – Gaudi era un artista grande  Gaudi era un gran artista I will help him fix the car – Ayudaré a él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

88 Interactive and Automatic Rule Refinement
1. Correction info only Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 1: Fully automatically just by using correction information  There is no information about the clue word or error type, just the info from the user correcting with the TCTool I would like to go – Me gustaria que ir  Me gustaria  ir I will help him fix the car – Ayudaré a él a arreglar el auto  Le ayudare a arreglar el auto Interactive and Automatic Rule Refinement

89 Interactive and Automatic Rule Refinement
2. Correction and Error info Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ Focus 2: Fully automatically using correction and error information, such as clue words, error type The rule PP(Prep NP) already exists (the level of generality appropriate for the constraint is set by default and can be confirmed thru user interaction). PP  PREP NP I am proud of you – Estoy orgullosa tu  Estoy orgullosa de ti Interactive and Automatic Rule Refinement

90 Interactive and Automatic Rule Refinement
Focus 3 Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ This shows the refinements for error types that require a reasonable amount of further user interaction and can be solved by available correction and error information (Focus 3) -rule: suppose the rule required to generate the new POS sequence is not in the grammar (if the grammar didn’t have a PP) Let’s take a closer look at this last example and how the proposed approach would go about refining it. Wally plays the guitar – Wally juega la guitarra  Wally toca la guitarra I saw the woman – Vi  la mujer  Vi a la mujer I see them – Veo los  Los veo Interactive and Automatic Rule Refinement

91 Interactive and Automatic Rule Refinement
Outside Scope of Thesis Rule Refinement Operations Modify Add Delete Change W Order +Wc –Wc Wc –Wc +al –al Wi Wc Wi(…) Wc –Wc  = +rule –rule +al –al =Wi WiWi’ RuleLearner POSi=POSi’ POSiPOSi’ By definition, there is no MP that will provide with the triggering context for this example  John would have to be in the object position (I gave John the book – le di el libro a Juan), and there are only two words in common. If a word is moved outside the local rule, the RR module will feed the corrected sentence pair back into the system (to the RL) as a new training example. John read the book – A Juan leyó el libro   Juan leyó el libro Where are you from? – Donde eres tu de?  De donde eres tu? Interactive and Automatic Rule Refinement


Download ppt "Towards Interactive and Automatic Refinement of Translation Rules"

Similar presentations


Ads by Google