Towards Interactive and Automatic Refinement of Translation Rules

Towards Interactive and Automatic Refinement of Translation Rules
Ariadna Font Llitjós PhD Thesis Proposal Jaime Carbonell (advisor) Alon Lavie (co-advisor) Lori Levin Bonnie Dorr (Univ. Maryland) 5 November 2004 ! posar arbre amb fulles Thank the advisors and the committee and the audience for attending

Interactive and Automatic Rule Refinement
Outline Introduction Thesis statement and scope Preliminary Research Interactive elicitation of error information A framework for automatic rule adaptation Proposed Research Contributions and Thesis Timeline Interactive and Automatic Rule Refinement

Machine Translation (MT)
Source Language (SL) sentence:  Gaudi was a great artist In Spanish, it translates as: Gaudi era un gran artista But instead our MT System outputs two incorrect Target Language (TL) sentences: *Gaudi estaba un artista grande  *Gaudi era un artista grande If we are given the sentence (SL): Gaudi is a great artist we would like our MT system to translate it like: Gaudi es un gran artista Explain general order for Spanish  post-nominal adj Explain meaning of Gaudi era un artista grande (big = size) Interactive and Automatic Rule Refinement

Commercial and Online MT Systems
Correct Translation: Gaudi era un gran artista ImTranslation: *El Gaudi era un gran artista Systran, Babelfish (Altavista), WorldLingo, Translated.net : *Gaudi era  gran artista 1-800-Translate  *Gaudi era un fenomenal artista Show this is a big/general problem -> solution has great impact Other MT systems also produce incorrect translations  MT output requires post-editing ImTranslation: translation2.paralink.com/ Interactive and Automatic Rule Refinement

Post-editing Current solutions: Manual post-editing [Allen, 2003] Automated post-edition module (APE) [Allen & Hogan, 2000] MT output still requires post-editing!!! post-editing: the correction of MT output by human linguists or editors Current systems do not recycle post-editing efforts back into the system, beyond adding as new training data. - human post-editors: some tools exist to assist in this tedious task -> drawback: it doesn’t generalize - APE: prototype model, much better, but human needs to predict errors and code post-editing rules manually before my research, as far as I know, there is no solution to fully automatize the post-editing process, in the sense of automatically learning the post-editing rules that would fix any MT output. Interactive and Automatic Rule Refinement

Drawbacks of Current Methods
Manual post-editing  Corrections do not generalize  Gaudi era un artista grande  Juan es un amigo grande (Juan is a great friend)  Era una oportunidad grande (It is a great opportunity) APE  Humans need to predict all the errors ahead of time and code for the post-editing rules + new error  Manual post-editing: most common method very tedious, since they need to do the same actions over and over for the same MT error, it would be nice to automatize APE: prototype module to do precisely that, but errors still need to be detected and predicted by a human first and then post-editing rules need to be written manually for each type of error In the face of a new error type, the APE doesn’t do anything, human editor needs to fix it, over an over. Interactive and Automatic Rule Refinement

My Solution Automate post-editing efforts by feeding them back into the MT system. Possible alternatives: Automatic learning of post-editing rules + system independent - several thousands of sentences might need to be corrected for the same error Automatic refinement of translation rules + attacks the core of the problem for transfer-based MT systems (need rules to fix!) explain drawbacks of each (both are language-pair dependent) ALPER: system independent, but interaction of two incorrect rules can generate thousands of (very) incorrect sentences ARTR: system dependent, goes to the core of the problem,  fixing one or two basic rule in an easy way, gets rid of many accumulative errors.  fixing one rule can prevent thousands of incorrect sentences from being generated Interactive and Automatic Rule Refinement

Related Work [Corston-Oliver & Gammon, 2003] [Imamura et al. 2003] [Menezes & Richardson, 2001] [Brill, 1993] [Gavaldà, 2000] Machine Translation Rule Adaptation [Callison-Burch, 2004] [Su et al. 1995] My Thesis Post-editing No pre-existing training data required No human reference translations required Use Non-expert user feedback See RW paper Interactive and Automatic Rule Refinement

Resource-poor Scenarios (AVENUE)
Lack of electronic parallel data Lack of manual grammar (or very small initial grammar)  Need to validate elicitation corpus and automatically learned translation rules Why bother? Indigenous communities have difficult access to crucial information that directly affects their life (such as land laws, plagues, health warnings, etc.) Preservation of their language and culture Mapudungun Quechua Aymara My research approach can be applied to any language-pair (given a transfer MT system), but it’s specifically designed to be able to tackle languages with little or no electronic resources, such as Mapudungun and Quechua, which makes the problem of refining translation rules even harder. Since my work is embedded in the AVENUE project, I can’t assume that there is any pre-existing training corpus or even reference translations for my testing sets. 1. (often oral tradition)  SMT and EBMT out of the question 2. lack of computational linguists with knowledge of the RP language Interactive and Automatic Rule Refinement

How is MT possible for resource-poor languages?
Bilingual speakers Interactive and Automatic Rule Refinement

AVENUE Project Overview
Learning Module Transfer Rules Lexical Resources Run Time Transfer System Lattice Word-Aligned Parallel Corpus Elicitation Tool Elicitation Corpus Elicitation Rule Learning Run-Time System Handcrafted rules Morphology Morpho-logical analyzer For those of you who are not familiar with the Avenue project, there are 4 main modules. The first one, elicits translations and word alignments from bilingual speakers with a user-friendly tool. The Rule Learner (3rd module) learns the translation rules from the word-aligned parallel corpus result of the Elicitation phase, and The last module is the actual transfer engine, which given the translation rules and lexical entries, and a sentence to be translated, produces a list of translation candidates or lattice. But, until now there was no way to validate generalizations learned by the Automatic Rule Learner Nor to verify the quality of the translations… Interactive and Automatic Rule Refinement

My Thesis Interactive and Automatic Rule Refinement Elicitation
Morphology Rule Learning Run-Time System Rule Refinement Word-Aligned Parallel Corpus Learning Module Translation Correction Tool Handcrafted rules Run Time Transfer System Transfer Rules Morpho-logical analyzer Rule Refinement Module Elicitation Corpus So, my thesis work consists of extracting bilingual speaker feedback with an online GUI (the TCTool) and hypothesize Refinement operations to be performed both on the grammar (transfer rules) and the lexicon so as To improve translation quality and coverage. Note that my this approach is specifically well-suited for languages with scarce resources, but can be applied to any Language pair, as long as there are bilingual speakers available and there is an initial set of transfer rules for that pair. Lexical Resources Lattice Elicitation Tool Interactive and Automatic Rule Refinement

Recycle corrections of Machine Translation output back into the system by refining and expanding existing translation rules In other words, my thesis focuses on Recyling non-expert user corrections back into the Machine Translation system by refining and expanding existing translation rules Left: represents an initial grammar (small manual grammar or automatically learned grammar) Right: refined grammar (codes for exceptions to general rules, has more (agreement) constraints, new rules, etc.)

Thesis Statement - Given a rule-based Transfer MT system, we can extract useful information from non-expert bilingual speakers about the corrections required to make MT output acceptable. - We can automatically refine and expand translation rules, given corrected and aligned translation pairs and some error information, to improve coverage and overall MT quality. Furthermore, compress more? Brett Interactive and Automatic Rule Refinement

Assumptions No parallel training data available No human reference translations available The SL sentence needs to be fully parsed by the translation grammar. Bilingual speakers can give enough information about the MT errors. First 2, imposed by the AVENUE project  makes the problem more challenging  3. some rule needs to have applied, so that we can attempt to refine it automatically. 4. Which I’ll show you is true in the preliminary work section. Interactive and Automatic Rule Refinement

Scope Types of errors that: Focus 1: can be refined fully automatically just by using correction information. Focus 2: can be refined fully automatically using correction and error information. Focus 3: require a reasonable amount of further user interaction and can be solved by available correction and error information. Fall outside the scope Types of errors for which - users cannot give the required information or - further user interaction might take too long and might not provide with the required information These different setting will add up to the number of user studies I need to do Interactive and Automatic Rule Refinement

Technical Challenges Elicit MT error information from non-expert users (minimal post-editing) Refine and expand a grammar and lexicon minimally (minimal description length)  to improve translation quality and coverage automatically Refine both manual and automatically learned translation rules Interactive and Automatic Rule Refinement

Refine and expand a grammar and lexicon Elicit minimal MT information
Technical Challenges Refine and expand a grammar and lexicon minimally Base: crucial component, without which it makes no sense to try to automatically learn Refinement operations.  Minimal in this context refers to minimal post-editing, which usually means to make the least amount of changes possible for producing an understandable working document. rather than producing a high quality document. However, unlike Allen’s (2003) definition of MPE, I do not mean MPE in the sense of producing MT output of less quality, but rather of having bilingual users perform the least amount of changes to the translation so that the same rule that originally generated the SL sentence can be refined. Because my goal is to elicit rather technical information from non-expert users, I need to find a way to classify errors that is not too hard for naive users to do accurately, and at the same time is useful for automatic RR. 2. (minimal description length) MDL = parsimony so that grammar size doesn't blow up with lots of local refinements Elicit minimal MT information from non-expert users Interactive and Automatic Rule Refinement

Preliminary Work Interactive elicitation of error information
A framework for automatic rule adaptation

Interactive Elicitation of MT Errors
Goal: Simplify MT correction task maximally Challenges: Find appropriate level of granularity for MT error classification Design a user-friendly graphic user interface with: SL sentence (e.g. I see them) TL sentence (e.g. Yo veo los) word-to-word alignments (I-yo, see-veo, them-los) (context) Goal: so that non-expert users can do it reliably GUI: which allows non-expert bilingual users to reliably detect and minimally correct MT errors, given: Interactive and Automatic Rule Refinement

MT Error Typology for RR (simplified)
Completed Work Interactive elicitation of error information Missing word Extra word Wrong word order Incorrect word Wrong agreement Local vs Long distance Word vs. phrase + Word change Sense Form Selectional restrictions Idiom After looking at several hundred sentences (eng2spa) I organized the different types of MT errors into a typology, keeping in mind the ultimate goal of automatic rule refinement (local vs long-distance, word vs phrase, word change) (sense, form, selectional restrictions, idiom, ...) (missing constraint, extra agreement constraint) Missing constraint Extra constraint Interactive and Automatic Rule Refinement

TCTool (Demo) Interactive elicitation of error information Actions: Add a word Delete a word Modify a word Change word order To address the types of errors identified, the TCTool has 4 main correcting actions that can be applied to words And alignments. First 5 translations from lattice produced by transfer engine. Users picks correct translation, or best incorrect translation best translation = the one requiring the least amount of corrections Interactive and Automatic Rule Refinement

1st Eng2Spa User Study Completed Work Interactive elicitation of error information [LREC 2004] MT error classification  9 linguistically-motivated classes: word order, sense, agreement error (number, person, gender, tense), form, incorrect word and no translation precision recall F1 error detection 90% 89% error classification 72% 71% Manual grammar (12 rules) lexical entries Test set: 32 sentences from the AVENUE Elicitation Corpus (4 correct / 28 incorrect) Interested in high precision, even at the expense of lower recall Users did not always fix a translation in the same way When the final translation was not = gold standard, it was still correct or better (better stylistically) ------ For more details, see LREC paper or thesis proposal document Interactive and Automatic Rule Refinement

Automatic Rule Refinement Framework
Completed Work Automatic Rule Adaptation Find best RR operations given a: Grammar (G), Lexicon (L), (Set of) Source Language sentence(s) (SL), (Set of) Target Language sentence(s) (TL), Its Parse tree (P), and Minimal correction of TL (TL’) such that TQ2 > TQ1 Which can also be expressed as: max TQ(TL|TL’,P,SL,RR(G,L)) such that the translation quality of the refined grammar is higher than that of the original grammar. to maximize translation quality Or coverage increased Or ambiguity reduced Interactive and Automatic Rule Refinement

Types of Refinement Operations
Completed Work Types of Refinement Operations Automatic Rule Adaptation 1. Refine a translation rule: R0  R1 (R0 modified, either made more specific or more general) R0: NP DET N ADJ DET ADJ N a nice house una casa bonito Some times a rule is mostly correct, but is missing an agreement constraint: Example on this slide Automatically learned grammars some times over-generalize, and is such cases the rules have an extra agreement constraint, which makes the rule incorrect.  Results in higher precision (+ reducing size of candidate list) R1: NP DET N ADJ DET ADJ N N gender = ADJ gender a nice house una casa bonita Interactive and Automatic Rule Refinement

Types of Refinement Operations (2)
Completed Work Types of Refinement Operations (2) Automatic Rule Adaptation 2. Bifurcate a translation rule: R0  R0 (same, general rule)  R1 (R0 modified, specific rule) R0: NP DET ADJ N NP DET N ADJ a nice house una casa bonita Rop1: leave original R0 rule as is, modify the duplicate, R1, Example: Change word order When users add or delete a word, the safest Rop type is bifurcate, until we have evidence that the original rule can never apply  interactive mode High precision, but increases size of candidate list Coverage in terms of TL patterns: every time R0 is modified ->R0’, the coverage on the TL side increases R1: NP DET ADJ N a great artist un gran artista Interactive and Automatic Rule Refinement

Formalizing Error Information
Completed Work Formalizing Error Information Automatic Rule Adaptation Wi = error Wi’ = correction Wc = clue word Example: SL: the red car - TL: *el auto roja  TL’: el auto rojo Wi = roja Wi’ = rojo Wc = auto Wi = namely the word that needs to be modified, deleted or dragged into a different position by the user in order for the sentence to be correct Wi’= namely the user modification of Wi or the word that needs to be added by the user in order for the sentence to be correct. Wc = represents the word that gives away the clue with respect to what triggered the correction, namely the cause of the error. Example: in the case of lack of agreement between a noun and the adjective that modifies it, as in *el auto roja (the red car), Wc is instantiated with auto, namely the word that gives away the clue about what the gender agreement feature value of Wi (roja) should be. Wc can also be a phrase or constituent like a plural subject (e.g.. *[Juan y Maria] cai). Wc is not always present and it can be before or after Wi (i>c or i<c). They can be contiguous or separated by one or more words. need to agree Interactive and Automatic Rule Refinement

Triggering Feature Detection
Completed Work Triggering Feature Detection Automatic Rule Adaptation Comparison at the feature level to detect triggering feature(s) Delta function: (Wi,Wi’) Examples: (rojo,roja) = {gender} (comiamos,comia) = {person,number} (mujer,guitarra) = {} If  set is empty, need to postulate a new binary feature Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature. Triggering feature: namely what feature attributes has a different value in Wi and Wi’ Delta set empty = the existing feature language is not expressive enough to distinguish between Wi and Wi’ Interactive and Automatic Rule Refinement

Deciding on the Refinement Op
Completed Work Deciding on the Refinement Op Automatic Rule Adaptation Given: - Action performed by the user (add, delete, modify, change word order) , and - Error information is available (clue word, word alignments, etc.)  Refinement Action Given 1 and 2 The method proposed can determine what Refinement action to take And here is an example of how the proposed approach would work thru a simulation Interactive and Automatic Rule Refinement

- Batch and Interactive mode User studies Evaluation
Proposed Work - Batch and Interactive mode User studies Evaluation

Rule Refinement Example
Automatic Rule Adaptation Change word order SL: Gaudí was a great artist TL: Gaudí era un artista grande Corrected TL (TL’): Gaudí era un gran artista Interactive and Automatic Rule Refinement

Automatic Rule Adaptation 1. Error Information Elicitation Given the TL produced by the MT system for a specific SL sentence, Feed the translation pair thru the TCTool to elicit the correction and error information. Refinement Operation Typology Interactive and Automatic Rule Refinement

2. Variable Instantiation from Log File
Automatic Rule Adaptation Correcting Actions: 1. Word order change (artista grande  grande artista): Wi = grande 2. Edited grande into gran: Wi’ = gran identified artist as clue word  Wc = artist In this case, even if user had not identified Wc, refinement process would have been the same Input correction log file with transfer engine output (parse tree) to Refinement module  variable instantiation. Interactive and Automatic Rule Refinement

3. Retrieve Relevant Lexical Entries
Automatic Rule Adaptation No lexical entry for great  gran Duplicate lexical entry great-grande and change TL side: ADJ::ADJ |: [great] -> [gran] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) (Morphological analyzer: grande = gran) ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc)) 3. Retrieve Relevant Lexical Entries for gran (and artist) Add Lexical Entry for “gran” Modify grammar and lexicon by applying RR ops. (The type of this operation is bifurcate) Lex0  Lex0 + Lex1[Lex0 +  TLword] Even if we had a morphological analyzer , it would find no difference between grande and gran  my research can do something we couldn’t do even with a morphology module grande grande AQ0CS0 grande NCCS000 gran gran AQ0CS0 Parole tags: A=adjective Q=qualifying 0=no degree C=common gender S= number sg 0= no case Interactive and Automatic Rule Refinement

4. Finding Triggering Feature(s)
Automatic Rule Adaptation Feature  function: (Wi, Wi’) =   need to postulate a new binary feature: feat1 5. Blame assignment tree: <((S,1 (NP,2 (N,5:1 "GAUDI") ) (VP,3 (VB,2 (AUX,17:2 "ERA") ) (NP,8 (DET,0:3 "UN") (N,4:5 "ARTISTA") (ADJ,5:4 "GRANDE") ) ) ) )> Empty delta set = feature language is not expressive enough to distinguish between “gran” and “grande”, need to postulate a new feature Blame assignment = what rules need to be refined (from parse output by xfer engine): Even if user had not identified Wc, the same RR process could be done fully automatically Interactive and Automatic Rule Refinement

6. Variable Instantiation in the Rules
Automatic Rule Adaptation Wi = grande  POSi = ADJ = Y3, y3 Wc = artista  POSc = N = Y2, y2 {NP,8} NP::NP : [DET ADJ N] -> [DET N ADJ] ( (X1::Y1) (X2::Y3) (X3::Y2) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y2 agr)) ; det-noun agreement ((y3 agr) = (y2 agr)) ; adj-noun agreement (y2 = x3) ) (R0) 4. Determine appropriate Rule Refinement operations that need to apply explain rule formalism!!! Interactive and Automatic Rule Refinement

7. Refining Rules Automatic Rule Adaptation (R1) {NP,8’} NP::NP : [DET ADJ N] -> [DET ADJ N] ( (X1::Y1) (X2::Y2) (X3::Y3) ((x0 def) = (x1 def)) (x0 = x3) ((y1 agr) = (y3 agr)) ; det-noun agreement ((y2 agr) = (y3 agr)) ; adj-noun agreement (y2 = x3) ((y2 feat1) =c + )) 5. Modify grammar and lexicon by applying RR ops. Interactive and Automatic Rule Refinement

8. Refining Lexical Entries
Automatic Rule Adaptation ADJ::ADJ |: [great] -> [grande] ((X1::Y1) ((x0 form) = great) ((y0 agr num) = sg) ((y0 agr gen) = masc) ((y0 feat1) = -)) ADJ::ADJ |: [great] -> [gran] ((y0 feat1) = +)) 5. Modify grammar and lexicon by applying RR ops. Interactive and Automatic Rule Refinement

Done? Not yet Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Need to restrict application of general rule (R0) to just post-nominal ADJ un artista grande un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces an incorrect translation that we know is incorrect (by the minimal post-editing maximum). Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Grande (post-nominal adj) Gran (pre-nominal adj) Interactive and Automatic Rule Refinement

Add Blocking Constraint
Automatic Rule Adaptation NP,8 (R0) ADJ(grande) [feat1 = -] [feat1 = -] NP,8’ (R1) ADJ(gran) [feat1 =c +] [feat1 = +] Can we also eliminate incorrect translations automatically? un artista grande *un artista gran un gran artista *un grande artista Now the refined grammar produces the correct translation, but it still produces incorrect translations that we know are incorrect. Since the Refinement operation has increased ambiguity in the grammar: translation candidate list size has increased by more than double, since both “grande” and “gran” can be unified with {NP,8} and “gran” now unifies with {NP,8’} Can we also eliminate incorrect translations automatically?  tighter grammar (minimal description length)  Since great translates both as grande and gran, both rules will be applied, one with each type of adjective. Interactive and Automatic Rule Refinement

Making the grammar tighter
Automatic Rule Adaptation If Wc = artista Add [feat1= +] to N(artista) Add agreement constraint to NP,8 (R0) between N and ADJ ((N feat1) = (ADJ feat1)) *un artista grande *un artista gran un gran artista *un grande artista If user identified Wc, the RR module can also tag artista with ((feat1)= +) and add an agreement constraint between ADJ and N ((y2 feat1) = (y3 feat1)) to {NP,8} For each new adjective that appears in this context, one of the two NP rules will be picked by users, and the RR module will be able to tag them as being pre-nominal (feat1 = +) or post-nominal (feat1 = -) in the lexicon. without Wc  RR generated correct translation, but can’t eliminate all incorrect translations Interactive and Automatic Rule Refinement

Batch Mode Implementation
Proposed Work Batch Mode Implementation Automatic Rule Adaptation For Refinement operations of errors that can be refined : Fully automatically just by using correction information (Focus 1) Fully automatically using correction and error information (Focus 2) Interactive and Automatic Rule Refinement

Interactive Mode Implementation
Proposed Work Interactive Mode Implementation Automatic Rule Adaptation Extra error information is required More sentences need to be evaluated (and corrected) by users Relevant Minimal Pairs (MP) Focus 3: types of errors that require a reasonable amount of further user interaction and can be solved by available correction and error information. Typically requires more effort from users and takes longer  smaller test set Interactive and Automatic Rule Refinement

Automatic Rule Adaptation focus 1 & 2 focus 3 (focus 3) outside scope black: batch mode green and blue: interactive mode Make a tree with the nodes coloured according to focus!!! Interactive and Automatic Rule Refinement Refinement Operation Typology

Example Requiring Minimal Pair
Proposed Work Example Requiring Minimal Pair Automatic Rule Adaptation 1. Run SL sentence through the transfer engine I see them  *veo los  los veo 2. Wi = los but no Wi’ nor Wc Need a minimal pair to determine appropriate refinement: I see cars  veo autos 3. Triggering feature(s): (los,autos) = {pos} PRON(los)[pos=pron] N(autos)[pos=n] In Spanish if the direct Obj is a pronoun (clitic), it needs to be in a pre-verbal position. Wc  Not a specific word, but rather a combination of features in Wi(pron+acc) If operating in batch mode, I would flip the order of the VP rule and would generate the correct translation, But would not eliminate incorrect translations (autos veo, etc.) Interactive and Automatic Rule Refinement

Refining and Adding Constraints
Proposed Work Refining and Adding Constraints VP,3: VP NP  VP NP VP,3’: VP NP  NP VP + [NP pos =c pron] Percolate triggering features up to the constituent level: NP: PRON  PRON + [NP pos = PRON pos] Block application of general rule (VP,3): VP,3: VP NP  VP NP + [NP pos = (*NOT* pron)] To the appropriate rules … but for that to have any effect, we need to make sure that they pos feature is passed from the PRON to the constituent level if applicable [know if it’s applicable and which rule from MT output tree] NP… Which has the desired effect of not overgenerating for cases where the Npobj is not a pronoun But we’d still need to block the application of the general rule… Interactive and Automatic Rule Refinement

Generalization Power Have other example sentences with same error that would be translated correctly after refinement! Interactive and Automatic Rule Refinement

Proposed Work User Studies TCTool: new MT classification (Eng2Spa) Different language pair (Mapudungun or Quechua  Spanish) Manual vs Learned grammars Batch vs Interactive mode (+Active Learning) Amount of error information elicited 1. New way of eliciting MT error information simple statements/questions about the error possibly ordering from most informative to least informative Eng2Spa II: test new version of TCTool (v0.2) and compare error correction + classification accuracy results. Need to find the tradeoff between safety and generality. In particular, how much does the “interactivity” add (in safety) to the most general batch settings? (Alon’s suggestion) This will probably result into at least 4-8 more user studies  Interactive and Automatic Rule Refinement

Evaluation of Refined MT System
Evaluate best translation  Automatic evaluation metrics (BLEU, NIST, METEOR) Evaluate candidate list  precision (+parsimony) Automatic evaluation of Refined MT output Automatic Eval Necessary to maximize TQ and C while deciding on what RR operations to apply have 2 or more reference translations (sets of user corrections) Interactive and Automatic Rule Refinement

Evaluate Best translation
Hypothesis file (translations to be evaluated automatically) Raw MT output: Best sentence (picked by user to be correct or requiring the least amount of correction) Refined MT output: Use METOR score at sentence level to pick best candidate from the list  Run all automatic metrics on the new hypothesis file using user corrections as reference translations. Assumption: user corrections = gold standard  reference translations Method: compare raw MT output with MT output by the refined grammar and lexicon using automatic evaluation metrics, such as BLEU and METEOR Interactive and Automatic Rule Refinement

Evaluate Candidate List
Precision: “tp” binary {0,1} tp + fp total number of TC SL TL SL TL SL TL SL TL   = “tp” binary, since it’s either like the user correction (reference translation)  1 or it isn’t  0. precision is also a measure of parsimony, since if the size of CL grows, precision decreases recall (tp/tp+fn) makes no sense in this context since  we would need to know all the possible correct translations in order to calculate fn (but there is no predefined set of fn) Interactive and Automatic Rule Refinement

Expected Contributions
An efficient online GUI to display translations and alignments and solicit pinpoint fixes from non-expert bilingual users. An expandable set of rule refinement operators triggered by user corrections, to automatically refine and expand different types of grammars. A mechanism to automatically evaluate rule refinements with user corrections as reference translations. 1. contribution 90% done 2. 50% done (laying out the theory, figuring out how to do it (is an important contribution) 3. 70% (method identified, easy to implement) Interactive and Automatic Rule Refinement

Thesis Timeline Research components Duration (months) Back-end implementation User Studies Resource-poor language (data + manual grammar) 2 Adapt system to new language pair 1 Active Learning methods Evaluation Write and defend thesis Total Acknowledgements Committee Avenue team Friends and colleagues Expected graduation date: May 2006 Interactive and Automatic Rule Refinement

References Add references: Related work Probst et al. 2002 AL Interactive and Automatic Rule Refinement

Thanks! Questions? Interactive and Automatic Rule Refinement

Some Questions Is the refinement process deterministic? Interactive and Automatic Rule Refinement

Others TCTool Demo Simulation RR operation patterns Automatic Evaluation feasibility study AMTA paper results BLEU, NIST and METEOR Precision, recall and F1 Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Interactive and Automatic Rule Refinement

Automatic Rule Adaptation SL + best TL picked by user Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Changing “grande” into “gran” Interactive and Automatic Rule Refinement

Automatic Rule Adaptation Back to main Interactive and Automatic Rule Refinement

Automatic Rule Adaptation 1 2 3 Interactive and Automatic Rule Refinement

Input to RR module Automatic Rule Adaptation User correction log file Transfer engine output (+ parse tree): sl: I see them tl: VEO LOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,0 (PRON,2:3 "LOS") ) ) ) )> sl: I see cars tl: VEO AUTOS tree: <((S,0 (VP,3 (VP,1 (V,1:2 "VEO") ) (NP,2 (N,1:3 “AUTOS") ) ) ) )> Interactive and Automatic Rule Refinement

Completed Work Types of RR Operations Automatic Rule Adaptation Grammar: R0  R0 + R1 [=R0’ + constr] Cov[R0]  Cov[R0,R1] R0  R1[=R0 + constr= -]  R2[=R0’ + constr=c +] Cov[R0]  Cov[R1,R2] R0  R1 [=R0 + constr] Cov[R0]  Cov[R1] Lexicon Lex0  Lex0 + Lex1[=Lex0 + constr] Lex0  Lex1[=Lex0 + constr] Lex0  Lex0 + Lex1[Lex0 +  TLword]   Lex1 (adding lexical item) bifurcate refine Grammar: Bifurcates: leave original as is, modify the other one (maybe change word oder + make the other one more specific [VP: V Nppron in Spanish] Make more specific [agreement constraint missing] Bifurcate: general case (blocking constraint) + specific case (triggering constraint) [pre-nominal adjectives] Cov in terms of TL patterns, every time R0 is modified ->R0’, the coverage on the TL side increases Lexicon 1. cayeron  se cayeron 2. woman + animate= + 4. Sense missing 3. OVW Interactive and Automatic Rule Refinement

Manual vs Learned Grammars
Automatic Rule Adaptation Manual inspection: Automatic MT Evaluation: [AMTA 2004] Same test set than user study (32 sentences) Same manual grammar + automatically learned grammar using Avenue’s Rule Learning module [Probst et al. 2002] (+ added feature constraints) Looking at 5 first translations Looking at the best translation for each grammar and compared Automatic eval reflects the findings thru manual evaluation that even though MG output is better, it is also the same in many cases, scores are not that different Different types of errors  different types of RR operations required Learned grammars higher level of lexicalization, need RR module to achieve appropriate level of generalization. Conclusions: - Manual G will need to be refined to encode exceptions, whereas Learned G will need to be refined to achieve the right level of generalization. - We expect the RR to give the most leverage when combined with the Learned Grammar. NIST BLEU METEOR Manual grammar 4.3 0.16 0.6 Learned grammar 3.7 0.14 0.55 Interactive and Automatic Rule Refinement

Human Oracle experiment
Completed Work Human Oracle experiment Automatic Rule Adaptation As a feasibility experiment, compared raw output with manually corrected MT: statistically significant (confidence interval test) These is an upper-bound on how much difference we should expect any refinement approach to make. NIST is the least discriminative of metrics, since for bigrams that appear only once (very small set) the system doesn’t get any credit. Interactive and Automatic Rule Refinement

Proposed Work Active Learning Automatic Rule Adaptation Minimize the number of examples a human annotator must label [Cohn et al. 1994] usually by processing examples in order of usefulness. . Minimize the number of Minimal Pairs presented to users Method to ... usually by processing examples in order of usefulness. [Lewis and Catlett 94] used uncertainty as a measure of usefulness. 2. [Callison-Burch 2003] proposed AL to reduce the cost of creating a corpus of labeled training examples for Statistical MT. Add example of RR op requiring AL and minimal pair Interactive and Automatic Rule Refinement

Towards Interactive and Automatic Refinement of Translation Rules

Similar presentations

Presentation on theme: "Towards Interactive and Automatic Refinement of Translation Rules"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards Interactive and Automatic Refinement of Translation Rules

Similar presentations

Presentation on theme: "Towards Interactive and Automatic Refinement of Translation Rules"— Presentation transcript:

Similar presentations

About project

Feedback