Download presentation
Presentation is loading. Please wait.
Published bySophie Rich Modified over 9 years ago
1
Generalising lexical translation strategies for MT using comparable corpora Bogdan Babych, Serge Sharoff, Anthony Hartley Centre for Translation Studies, University of Leeds Leeds, UK {b.babych,s.sharoff,a.hartley}@leeds.ac.uk
2
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 1 Overview Indirect translation equivalents in MT: current limitations Increasing the range of translation equivalents used by MT –Equivalent-oriented vs. strategy-oriented approaches –Methodology for discovering translation strategies using comparable corpora –Applications for terminology research Conclusions and future work
3
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 2 Indirect equivalents in MT Data-driven MT (statistical & example-based) –Reusing equivalents learnt from parallel corpora –Problem: Lack of generalisation Equivalents expressed as word patterns Do not generalise beyond lemmas –Cannot generate indirect equivalents for ‘unseen’ expressions Difficult to maintain many specific patterns Fundamental limits on the range of translation solutions generated by MT
4
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 3 Indirect equivalents: Change of perspective Problems for MT: non-fluent translations & mistranslations Ru: Из кризисов такого рода как парламентский можно выходить за счет демократических методов. –lit.: 'From crises of such type as parliamentary it is possible to go out by means of democratic methods –RBMT: Such as parliamentary it is possible to leave crises due to democratic methods. –SMT: This kind of crisis as a parliamentary, can go through democratic methods. HT: We can escape crises like these through democratic means
5
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 4 From equivalents to lexical translation strategies Indirect equivalents = ‘creative’ solutions to non- trivial problems Parallel corpora: too small, sparse and specialised –The same problem often solved idiosyncratically: no clear statistical model –Set of ‘indirect’ translation problems is open Our solution: higher order model –Generalising classes of equivalents as strategies By similarity of usage in comparable corpora Equivalents to unseen expressions are generated from discovered strategies
6
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 5 Current methodology One fixed strategy: rephrasing words using similarity of ‘collocation vectors’ ~ near-synonyms Generator of equivalents from ASSIST project –выходить из кризиса (go out of crisis) ~ {to approach, to face, to get over} crisis Выходить(go out).sim задходить(come).dict + collocations of (crisis) to approach No other strategies yet implemented –Transposition (change of syntactic perspective) Modulation (change of lexical perspective) … –Further goal: to find ~ escape from crisis … via …
7
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 6 Strategy evaluation Coverage of problems vs. coverage of solutions Several strategies cover the same problem (variation) –Ru: Механизм принятия решений будет публичным. (lit.: 'The mechanism of making decisions will be public‘) –публичный механизм (‘public mechanism’) Public process / … a greater public interaction (Current re-phrasing strategy) The answer will come from the people. (Change-of-perspective strategy) It is harder to match solutions: diversity of strategies
8
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 7 Coverage of translation problems by re-phrasing strategy Characterising linguistic productivity of the strategy Experiment: 12 translators suggest indirect solutions to the same set of problems –36 translation problems (25 Ru & 11 En) –210 different human solutions (5.83 solutions / problem) Task of the system: to generate a possible solution for each problem
9
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 8 Coverage of translation problems by re-phrasing strategy For 75% of problems: at least 1 match by re-phrasing strategy Average coverage of a set of human solutions: 34.7%
10
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 9 Coverage of translation solutions by re-phrasing strategy Comparing coverage of indirect equivalents by: –(1) bilingual dictionary solutions (Oxford Russian) –(2) solutions extracted from word alignment in parallel corpus: Training Set: Ru-En news, 700k wd. Test Set: Euronews Ru-En interviews, 100k wd. –(3) strategy-based (i.e. re-phrasing) solutions: Collocations vectors from monolingual corpora (BNC, RNC) ~ 100M Filtered by co-occurrence in news corpora ~200M
11
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 10 Coverage of solutions by re-phrasing strategy Task of the system: to generate an exact solution for each problem
12
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 11 Coverage of solutions by re-phrasing strategy Conclusions Learning individual equivalents is not efficient –Low coverage of unseen problems –Lower generalisation of idiosyncratic alignments Re-phrasing strategy: productive but not sufficient
13
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 12 On-going project: beyond re-phrasing strategy Modelling transposition and modulation strategies –Learning strategies from parallel data –Aligning ‘indirect’ solutions (discontinuous MWEs) выходить из кризиса (go out of crisis) escape crisis –Generalising equivalents with similarity classes –Covering unseen expressions: {Выходить / выводить…} из {конфликта / застоя / депрессии…} (go out / lead out from crisis, stagnation, depression) to escape conflict/ controversy, to flee difficulty, to survive disaster/ tragedy …
14
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 13 MT-oriented evaluation Improvements for incomprehensible translations and mistranslations: MT: Es verdad que empezamos vacilantes pero era lógico. (lit: started hesitant) HT: Of course we had our doubts to begin with but that's normal SMT: It is true that we started to waver but was logical (unacceptable literal translation) –empezar vacilante ~ begin doubt (modulation) –Indirect solutions: we had our fears/ doubts to start with; we began with fear/ scepticism/ worries...; we were not convinced then; after our early scepticism; we were soon/gradually/quickly convinced
15
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 14 Application to terminological research Terminological equivalents are usually direct –Rarely change lexical or syntactic perspective –Standard fixed equivalents preferred Distributional similarity framework –Yields a network of related terms (not paraphrases) –Useful for automating terminological research Prototype terminological workbench for translators –English—French corpora in a specialised domain (2M words in total); Giza alignments; termbanks –Translators explore systems of related terms
16
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 15 Terminological interface for translators French term plan and the English term plain
17
29 May 2008 LREC 2008 Generalising Lexical Translation Strategies for MT 16 Conclusions and future work Making testable predictions for indirect equivalents –Model for re-phrasing, transposition & modulation strategies –Match human translators’ solutions for unseen phrases Future work –Automatic identification of phrases which need non- literal translation –Building fluent equivalents around solutions –Integrating strategy-based generator into SMT decoder –Evaluation of the improvement in coverage –Evaluation of the productivity / reusability of strategies
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.