Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for.

Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for next time: Chapter 5

2 Homework Questions?

3 Agenda  Introduction to machine translation  Statistical approaches  Use of parallel data  Alignment  What functions must be optimized?  Comparison of A* and greedy local search (hill climbing) algorithms for translation  How they work  Their performance

4 Approach to Statistical MT  Translate from past experience  Observe how words, and phrases, and sentences are translated  Given new sentences in the source language, choose the most probable translation in the target language  Data: large corpus of parallel text  E.g., Canadian Parliamentary proceedings

5 Data  Example  Ce n’est pas clair.  It is not clear.  Quantity  200 billion words (2004 MT evaluation)  Sources  Hansards: Canadian parliamentary proceedings  Hong Kong: official documents published in multiple languages  Newspapers published in multiple languages  Religious and literary works

6 Alignment – the first step  Which sentences or paragraphs in one language correspond to which paragraphs or sentences in another language? (Or what words?)  Problems  Translators don’t use word for word translations  Crossing alignments  Types of alignment  1:1 (90% of the cases)  1:2, 2:1  3:1, 1:3

7 With regard toQuant auxAccording to [the] mineral waters and[(les) eaux minerales et[our survey,] 1988 the lemonades-soft drinksaux limonades], they encounter[elles rencontrent[sales] of still moretoujours plus[mineral water users. Indeed d’adeptes. ] En effet and soft drinks] were our survey[notre sondage][much higher] makes standoutfait ressortir[than in 1987,] the sales[des ventes]reflecting clearly[nettement[The growing popularity] superiorSuperieures]Of these products. to those in 1987[a celles de 1987][Cola drink] manufacturers for cola-based drinksPour [les boissons a base de cola] [in particular] especiallynotammentAchieved above Average growth rates An example of 2:2 alignment

8  Fertility: a word may be translated by more than 1 word  Notamment -> in particular (fertility 2)  Limonades -> soft drinks  Fertility 0: A word translated by 0 words  Des ventes -> sales  Les boissons a base de cola -> cola drinks  Many to many:  Elles rencontrent toujours plus d’adeptes -> The growing popularity

9 Bead for sentence alignment  A group of sentences in one language that corresponds in content to some group of sentences in the other language  Either group can be empty  How much content has to overlap between sentences to count it as alignment?  An overlapping clause can be sufficient

10 Methods for alignment  Length based  Offset alignment  Word based  Anchors (e.g., cognates)

11 Word Based Alignment  Assume first and last sentences of the texts align (anchors).  Then until most sentences aligned:  Form an envelope of alignments from the cartesian product of the list of sentences Exclude alignments if they cross anchors or too distance  Choose pairs of words that tend to occur in alignments  Find pairs of source and target sentences which contain many possible lexical correspondences.  The most reliable augment the set of anchors

12 The Noisy Channel Model for MT Language Model P(e) Translation Model P(f|e) Decoder e’=argmax e P(e|f) Noisy Channel

13 The problem  Language model constructed from a large corpus of English  Bigram model: probability of word pairs  Trigram model: probability of 3 words in a row  From these, compute sentence probability  Translation model can be derived from alignment  For any pair of English/French words, what is the probability that pair is a translation?  Decoding is the problem: Given an unseen French sentence, how do we determine the translation?

14 Language Model  Predict the next word given the previous words  P(W n | W 1 ……W n-1 )  Markov assumption  Only the last few words affects the next word  Usual cases: bigram, trigram, 4gram  Sue swallowed the large green ….  Parameter estimation  Bigram: 20,000X19,000 = 400 million  Trigram: 20,000 2 X19,000 = 8 trillion  4gram: 20,000 3 X19,000=1.6X10 17

15 Translation Model  For a particular word alignment, multiply the m translation probabilities:  P(Jean aime Marie | John loves Mary)  P(Jean|John)XP(aime|loves)XP(Marie|Mar y)  Then sum the probabilities of all alignments

16 Decoding is NP complete  When considering any word re- ordering  Swapped words  Words with fertility > n (insertions)  Words with fertility 0 (deletions)  Usual strategy: examine a subset of likely possibilities and choose from that  Search error: decoder returns e’ but there exists some e s.t. P(e|f) > P (e’|f)

17 Example Decoding Errors  Search Error Permettez que je donne un example a la chambre. Let me give the House one example. Let me give an example in the House  Model Error Vous avez besoin de toute l’aide disponible. You need all the help you can get. You need of the whole benefits available.

18 Search  Traditional decoding method: stack decoder  A* algorithm  Deeply explore each hypothesis  Fast greedy algorithm  Much faster than A*  How often does it fail?  Integer Programming Method  Transform to Traveling Salesman (see paper)  Very slow  Guaranteed to find the best choice

19 Large branching factors  Machine Translation Input: sequence of n words, each with up to 200 possible target word translations. Output: sequence of m words in the target language that has high score under some goodness criterion. Search space:  6 words French sentence has 10 300 distinct translation scores under the IBM M4 translation model. [Soricut, Knight, Marcu, AMTA’2002] … …

20 Stack decoder: A*  Initialize the stack with an empty hypothesis  Loop Pop h, the best hypothesis off the stack If h is a complete sentence, output h and terminate For each possible next word w, extend h by adding w and push the resulting hypothesis onto the stack.

21 Complications  It’s not a simple left-to-right translation  Because we multiply probabilities as we add words, shorter hypotheses will always win  Use multiple stacks, one for each length  Given fertility possibilities, when we add a new target word for an input source word, how many do we add?

22 Example

Hill climbing function HillClimbing(problem, initial-state, queuing-fn) node ← MakeNode(initial-state(problem)); while T do next ← Best(SearchOperator-fn(node,cost-fn)); if(IsBetter-fn(next, node)) then continue; else if(GoalTest(node)) then return node; else exit; end while return Failure; MT (Germann et al., ACL-2001) node ← targetGloss(sourceSentence); while T do next ← Best( LocallyModifiedTranslationOf(node)); if(IsBetter(next, node)) then continue; else print node; exit; end while

24 Types of changes  Translate one or two words (j 1 e 1 j 2 e 2 )  Translate and insert (j e 1 e 2 )  Remove word of fertility 0 (i)  Swap segments (i 1 i 2 j 1 j 2 )  Join words (i 1 i 2 )

25 Example  Total of 77,421 possible translations attempted

28 How to search better?  MakeNode(initial-state(problem))  RemoveFront(Q)  SearchOperator-fn(node, cost-fn);  queuing-fn(problem, Q, (Next,Cost));

29 Example 1: Greedy Search MakeNode(initial-state(problem)) Machine Translation (Marcu and Wong, EMNLP-2002) node ← targetGloss(sourceSentence); while T do next ← Best( LocallyModifiedTranslationOf(node)); if(IsBetter(next, node)) then continue; else print node; exit; end while

30 Climbing the wrong peak What sentence is more grammatical? 1. better bart than madonna, i say 2. i say better than bart madonna, Can you make a sentence with these words? a and apparently as be could dissimilar firing identical neural really so things thought two Model validation Model stress-testing

31 Language-model stress-testing  Input: bag of words  Output: best sequence according to a linear combination of an ngram LM syntax-based LM (Collins, 1997)

32 Size: 10-25 words long Best searched 51.6: and so could really be a neural apparently thought things as dissimilar firing two identical Original word order 64.3: could two things so apparently dissimilar as a thought and neural firing really be identical Best searched 32.3: i say better than bart madonna, Original word order 41.6: better bart than madonna, i say Size: 3-7 words long SBLM*: trained on an additional 160k WSJ sentences.

33 End of Class Questions

Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for.

Similar presentations

Presentation on theme: "Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for.

Similar presentations

Presentation on theme: "Search Applications: Machine Translation Next time: Constraint Satisfaction Reading for today: See “Machine Translation Paper” under links Reading for."— Presentation transcript:

Similar presentations

About project

Feedback