Download presentation
Presentation is loading. Please wait.
Published byFrederick Adams Modified over 8 years ago
1
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation
2
Overview Assignments Some notes about HW1 Final projects Decoding for Machine Translation Brief Review of Search in AI Search space Costs Search for phrase-based translation Representing Hypotheses Pruning Multi-stack Decoding
3
Notes about Homework 1 Updated version with clarifications on how to run the Moses experiments Do we need extension of due date?
4
Final projects Proposals due 4/27 Updates due 5/11 Final reports due June 1 Presentation due June 1
5
Final project scope About four times the work on one homework assignment per person Project proposal: a short (one or two paragraphs) description of What is the problem and general approach Who is in the group and who will do what What data you are using Project update: a one page description of what you have done so far, with some preliminary results, if possible Final report: a four to eight page description of the problem and results Final presentation: depending on number of groups, x minute presentation
6
What this lecture is about A language model and a translation model define scores for candidate translations A decoder finds the (approximately) highest-scoring translation A decoder searches for the best translation, using a search algorithm
7
A review from introductory AI classes Search
8
Map of Romania with step costs in km Slide copied from Hwee Tou Ng's AI course slides
9
Search problem formulation A problem is defined by four items: 1. initial state e.g., "at Arad" 2. actions or successor function S(x) = set of action–state pairs e.g., S(Arad) = {, … } 3. goal test e.g., x = "at Bucharest" 4. path cost (additive) e.g., sum of distances, number of actions executed, etc. c(x,a,y) is the step cost, assumed to be ≥ 0 normally in AI search but not guaranteed in MT A solution is a sequence of actions leading from the initial state to a goal state We need to find lowest cost solution For MT, can define cost as the negative score (max score = min cost) Slide copied from Hwee Tou Ng's AI course slides
10
How to search: tree search algorithms Basic idea: offline, simulated exploration of state space by generating successors of already-explored states (a.k.a.~expanding states) building up a tree of explored states
11
Best-first search Idea: use an evaluation function f(n) for each node estimate of "desirability" Expand most desirable unexpanded node Implementation: Order the nodes in fringe in decreasing order of desirability Special cases: greedy best-first search A * search Slide copied from Hwee Tou Ng's AI course slides
12
Romania with step costs in km Slide copied from Hwee Tou Ng's AI course slides
13
A* best-first search Evaluation function f(n) = g(n)+ h(n) (heuristic) g(n)= cost of path to node h(n)= estimate of cost from n to closest goal e.g., h SLD (n) = straight-line distance from n to Bucharest A* uses an admissible heuristic – one that does not overestimate the cost of the best path to a goal state This property makes it optimal
14
A * search example
20
Search in phrase-based translation
21
Basic phrase-translation Decisions for target sentence, segmentation and alignment, given source sentence Source sentence is segmented into source phrases Not linguistically motivated segmentation Each source phrase is translated into a target phrase Independent of other source phrases and their translations The resulting target phrases are re-ordered to form output
22
Translation as sequence of actions: decoding process Build translation from left to right Select foreign sequence of words to be translated
23
Decoding process Build translation from left to right Select foreign sequence of words to be translated Select English phrasal translation from phrase table Append English words to the end of the partial translation
24
Decoding process Build translation from left to right Select foreign sequence of words to be translated Select English phrasal translation from phrase table Append English words to the end of the partial translation Mark foreign words as translated So we know not to select them again later on
25
Decoding process One-to-many translation
26
Decoding process Many-to-one translation
27
Decoding process Many-to-one translation
28
Decoding process Reordering
29
Decoding process Reordering Translation finished (reached a goal)
30
Phrase translation options for source sentence Many different phrase-translation options available for a sentence Can look them all up before starting decoding
31
Decoding organization Each sequence of actions we explore defines a partial translation hypothesis Translation hypotheses are the analogue of search nodes in general search The data we keep in each translation hypothesis should be sufficient to Tell us which actions are applicable we need to know which foreign words have been translated Tell us what is the cost so far (in the examples we will use probability instead and will multiply probs instead of adding up costs) Allow us compute the cost of each possible next action Allow us read off the target translation
32
Search hypotheses and initial hypothesis Start with an initial empty hypothesis e: no English words have been output f: no foreign words have been translated the probability so far is 1 (we will multiply in the prob. of each next action) end prev 0 not shown here but need the end of the previous phrase in f for distortion model computation prev 2 English words: not shown but need them for language model computation
33
Hypothesis expansion Pick translation option Create next hypothesis using this action e: Mary is in the output f: Maria has been translated p: the probability of the partial translation so far
34
Computing the probability of actions Probability of actions depends on models used Translation models Phrasal probabilities in both directions Lexical weighting probabilities Word count, phrase count Reordering model probability Can be computed given current phrase pair and positions in the source of the current and previous phrase Language model probability Can be computed given English side of current phrase pair, and last 2 previous English words (for trigram LM)
35
Hypothesis expansion Add another hypothesis using another translation option
36
Hypothesis expansion Further expansion
37
Hypothesis expansion Until all foreign words are translated Trace the parent links back to the beginning to collect full translation
38
Hypothesis expansion The search space explodes grows exponentially with sentence length
39
Explosion of search space The search graph grows exponentially with sentence length Due to the number of possible re-orderings, problem is NP complete [Knight, 1999] We need to reduce the search space Can recombine equivalent hypotheses (loss-less, risk- free pruning) Apply other kinds of pruning Histogram pruning Threshold pruning
40
Hypothesis recombination Example: different paths to the same English output in partial hypotheses Correspond to different phrasal segmentation
41
Hypothesis recombination Combine equivalent hypotheses Drop the weaker hypothesis The weaker path is still available for lattice generation
42
Hypothesis recombination The merged hypotheses do not need to match completely We just need them to have the same best path to completion The same applicable future expansions with the same scores Same last 2 English words, coverage vectors, last phrase source position Since any path that goes through the worse hypothesis can be changed to use the path to the better hypothesis and then the same path to the end, we are not losing anything.
43
Pruning
44
Hypothesis stacks The i-th stack contains hypotheses for which i source words have been translated Process stacks in order Expand all hypotheses from a stack Place expanded hypotheses on corresponding stacks
45
How to compare hypotheses So far we only have the probability (cost) of each hypothesis Comparing hypotheses that have translated the same number of words makes these costs more comparable Can we do better than comparing based on cost so far?
46
Comparing hypotheses Comparing two hypotheses translating the same number of words The one translating an easier part of the sentence is preferred Can do better by considering future cost of translating the rest of the source words
47
Estimating future cost The closest to the correct future cost we get, the better for our search But computation of the future cost should not take too long A future cost estimate that is less or equal to the true cost (optimistic), guarantees optimality in A* search This has usually been too slow in practice, so we don’t use A* and admissible heuristics
48
Estimating future cost The future cost will be the sum of costs of actions (translations) that we will take in the future We can estimate the cost of each translation option for the sentence Translation probabilities: context independent Lang model: context dependent, so we approximate P(to)P(the|to) Reordering model cost: ignore, can’t estimate without context Prob for option = LM * TM
49
Future cost estimation Find the cost of the cheapest translation for a given source phrase (highest probability)
50
Future cost estimation For each span of the source sentence (each contiguous sequence of words) compute the cost of the cheapest combination of translation options Can be done efficiently using dynamic programming
51
Estimation of combined score of hypotheses Add up the costs of contiguous spans in the un-translated sequence of words to compute future cost Add future cost to cost so far to compute combined score used for pruning
52
Limits on reordering Limits on reordering can reduce the search space dramatically Monotone decoding Target phrases follow same order as source phrases Reordering limit n (used in Moses) Forbid jumps greater with distance greater than n Results in polynomial inference In addition to speed-ups reordering limits often lead to improved translations Because the reordering models are weak
53
Word lattices Can easily extract a word lattice from search graph Can extract n-best translation hypotheses n-best lists are used for discriminative re-ranking and training of log-linear model parameters
54
Summary Search in phrase-based translation Decomposing translation into sequence of actions Building partial translation hypotheses from left to right Computing cost by adding up costs for actions Recombining hypotheses for loss-less memory and time savings Pruning based on estimated score for risky search- space reduction Organizing hypotheses in comparable stacks Estimating future cost
55
Readings and next time For this week Chapter 6, SMT Decoding Optional: Germann et al 03 (nice paper comparing optimal and other decoders for IBM model 4) For next week Starting on tree-based models
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.