Presentation is loading. Please wait.

Presentation is loading. Please wait.

DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.

Similar presentations


Presentation on theme: "DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering."— Presentation transcript:

1 DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation”, C. Tillmann, H. Ney

2 Computational Challenges in M.T. Source sentence f (French) Target sentence e (English) Bayes' rule: Pr(e|f) = Pr(e)*Pr(f|e)/Pr(f)

3 Computational Challenges in M.T. Estimating the language model probability Pr(e) (L.M. Problem Trigram) Estimating the Translation model probability Pr(f|e) (T. Problem) Finding an efficient way to search for the English sentence that maximizes the product (Search Problem). We want to focus only in the most likely hypothesis during the search.

4 Approach based on Bayes’ rule: Transformation Inverse Transformation Target Language Text Source Language Text Global Search: over Language ModelTranslation Model

5 Trigram language model Translation model (simplified) : 1. Lexicon probabilities: 2. Fertilities 3. Class-based distortion probs : “Here, j is the currently covered input sentence position and j0 is the previously covered input sentence position. The input sentence length J is included, since we would like to think of the distortion probability as normalized according to J.” [Tillmann] Model Details

6 Same except in the handling of distortion probabilities. In model 4 there are 2 separate distortion probabilities for the head of a tablet and the rest of the words of the tablet. Probability depends on the previous tablet and on the identity (class) of the French word being placed. (Ej, appearance of adjectives before nouns in English but after them in French). “We expect dl(-lI.A(e),/3(f)) to be larger than dl(+ llA(e),/3(f)) when e is an adjective and d is a noun. Indeed, this is borne out in the trained distortion probabilities for Model 4, where we find that dl(- l|A(government's),B(developpement)) is 0.7986, while dl(+ l|A(government's),B(developpement)) is 0.0168.” A and B are class functions of the English and French words (in this implementation |A|=|B|=50 classes) Model Details (Model 4 vs. Model 3):

7 Decoder Others have followed different approaches for Decoders This is the part where we have to be efficient !!! Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation, C. Tillmann, H. Ney DP-based beam search decoder for IBM-model 4 (this is the one described in the previous paper)

8 Example Alignment besuchen Kollege Sie Mai colleague diesem Fall kann mein am vierten nicht In this case can not visit you on the forth. of May. my Word-to-Word Alignment (source to target): Hidden Alignment: TargetTarget Source

9 Inverted Alignments i i - 1 Source Positions Target Positions Inverted alignment (target to source) : Coverage constraint: introduce coverage vector

10 Traveling Salesman Problem Problem: Visit J cities Costs for transitions between cities Visit each city exactly once, minimizing overall costs Dynamic Programming (Held-Karp 1962) Cities correspond to source sentence positions (words,coverage constraint) Costs (negative logarithm of the product of the translation, alignment and language model probabilities).

11 Traveling Salesman Problem DP with auxiliary quantity Shortest path from city 1 to city j visiting all cities in Complexity using DP: The order in which cities are visited is not important Only costs for the best path reaching j has to be stored Remember Minimum edit distance formulation was also a DP search problem

12 ({1},1) ({1,2},2) ({1,3},3) ({1,4},4) ({1,2,3},3) ({1,2,4},4) ({1,2,5},5) ({1,2,3},2) ({1,3,4},4) ({1,3,5},5) ({1,2,4},2) ({1,3,4},3) ({1,4,5},5) ({1,5},5) ({1,2,5},2) ({1,3,5},3) ({1,4,5},4) ({1,2,3,4,5},2) ({1,2,3,4,5},3) ({1,2,3,4,5},4) ({1,2,3,4,5},5) Final ({1,2,3,5},5) ({1,2,4,5},5) ({1,3,4,5},5) ({1,2,3,4},4) ({1,2,4,5},4) ({1,3,4,5},4) ({1,2,3,4},3) ({1,2,3,5},3) ({1,3,4,5},3) ({1,2,3,4},2) ({1,2,3,5},2) ({1,2,4,5},2)

13 ({1},1) ({1,2},2) ({1,3},3) ({1,4},4) ({1,2,3},3) ({1,2,4},4) ({1,2,5},5) ({1,2,3},2) ({1,3,4},4) ({1,3,5},5) ({1,2,4},2) ({1,3,4},3) ({1,4,5},5) ({1,5},5) ({1,2,5},2) ({1,3,5},3) ({1,4,5},4) ({1,2,3,4,5},2) ({1,2,3,4,5},3) ({1,2,3,4,5},4) ({1,2,3,4,5},5) Final ({1,2,3,5},5) ({1,2,4,5},5) ({1,3,4,5},5) ({1,2,3,4},4) ({1,2,4,5},4) ({1,3,4,5},4) ({1,2,3,4},3) ({1,2,3,5},3) ({1,3,4,5},3) ({1,2,3,4},2) ({1,2,3,5},2) ({1,2,4,5},2)

14 M.T. Recursion Equation Complexity: where E is the size of the Target language vocabulary (still too large…) Maximum approximation: *Q(e,C,j) is the probability of the best partial hypothesis (e1..ei, b1..bi) where C = {bk | k = 1..i}, bi = j, ei = e, and ei-1 = e’

15 DP-based Search Algorithm Input: source string initialization for each cardinality do for each pair,where, do for each target word do Trace back: Find shortest tour Recover optimal sequence

16 IBM-Style Re-ordering (S3) Procedural Restriction: select one of the first 4 empty positions (to extend the hypothesis) Upper bound for word reordering complexity: 1 j J

17 Kollege Sie Mai Verb Group Re-ordering (GE) besuchen colleague diesem Fall kann mein am vierten nicht In this case can not visit you on the forth. of May. my Complexity: Mostly monotonic traversal from left to right

18 Beam Search Pruning Search proceeds cardinality-synchronously over coverage vectors : Three pruning types: 1. Coverage pruning 2. Cardinality pruning 3. Observation pruning(number of words produced by a source word f is limited)

19 Beam Search Pruning 4 kinds of Thresholds: the coverage pruning threshold tC the coverage histogram threshold nC the cardinality pruning threshold tc (looks only at the cardinality) the cardinality histogram threshold nc (looks only at the cardinality) Define new probabilities based on uncovered positions (using only trigrams and lexicon probabilities). Maintain only the ones above the thresholds.

20 Beam Search Pruning Compute best score and apply threshold: 1. For each coverage vector 2. For each cardinality : Use histogram pruning Observation pruning: for each select best target word :

21 German-English Verbmobil German to English, IBM-4 Evaluation Measure: m-WER and SSER Training: 58 K sentence pairs Vocabulary: 8K (German), 5K (English) Test-331 (held-out data) (scaling factors for language and distortion models) Test-147 (evaluation)

22 Effect of Coverage Pruning Re-ordering restriction CPU time [sec] m-WER [%] GE0.010.2173.5 0.10.4353.1 1.01.4330.3 2.54.7525.8 5.029.624.6 10.063024.9 S30.015.4870.0 0.19.2150.9 1.046.231.6 2.519028.4 5.083028.3

23 TEST-147: Translation Results Re-orderingCPU [sec] m -WER [%] SSER [%] MON (no re-ordering)0.240.628.6 GE (verb group)5.233.421.4 S3 (like IBM patent)13.734.220.3

24 References Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation, C. Tillmann, H. Ney “A DP based Search Using Monotone Alignments in Statistical Translation” C. Tillmann, S. Vogel, H. Ney, A. Zubiaga The Mathematics of Statistical Machine Translation: Parameter Estimation Peter E Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, Robert L. Mercer Accelerated DP Based Search for Statistical Translation, C. Tillmann, S. Vogel, H. Ney, A. Zubiaga, H. Sawaf Word Re-orderign and DP-based Search in Statistical Machine Translation, H. Ney, C. Tillmann


Download ppt "DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering."

Similar presentations


Ads by Google