Download presentation
Presentation is loading. Please wait.
Published byAvice Morgan Modified over 9 years ago
1
Stephan Vogel - Machine Translation1 Machine Translation Factored Models Stephan Vogel Spring Semester 2011
2
Stephan Vogel - Machine Translation2 Overview lFactored Language Models lMulti-Stream Word Alignment lFactored Translation Models
3
Stephan Vogel - Machine Translation3 Motivation lVocabulary grows dramatically for morphology rich languages lLooking at surface word form does not take connections (morphological derivations) between words into account lExample: ‘book’ and ‘books’ as unrelated as ‘book’ and ‘sky’ lDependencies within sentences between words are not well detected lExample: number or gender agreement Singular: der alte Tisch (the old table) Plural: die alten Tische (the old tables) lConsider word as a bundle of factors lSurface word form, stem, root, prefix, suffix, POS, gender marker, case marker, number marker, …
4
Stephan Vogel - Machine Translation4 Two solutions lMorphological decomposition into stream of morphemes lCompound noun splitting lPrefix-stem-suffix splitting lWords as bundle of (parallel) factors word lemma POS morphology word class prefix stem suffix prefix stem suffix … w1 w2 w3 w4 ….
5
Stephan Vogel - Machine Translation5 Questions lWhich information is the most useful lHow to use this information? lIn the language model lIn the translation model lHow to use it at training time lHow to use it at decoding time
6
Stephan Vogel - Machine Translation6 Factored Models lMorphological preprocessing lA significant body of work lFactored language models lKirchhoff et al lHierarchical lexicon lNiessen at al lBi-Stream alignment lZhao et al lFactored translation models lKoehn et al
7
Stephan Vogel - Machine Translation7 Factored Language Model Some papers: Bilmes and Kirchhoff, 2003 Factored Language Models and Generalized Parallel Backoff Duh and Kirchhoff, 2004 Automatic learning of language model structure Kirchhoff and Yang, 2005 Improved Language Modeling for Statistical Machine Translation
8
Stephan Vogel - Machine Translation8 Factored Language Model lRepresentation: lLM probability:
9
Stephan Vogel - Machine Translation9 Language Model Backoff lSmoothing by backing off lBackoff paths in standard LM in factored LM
10
Stephan Vogel - Machine Translation10 Choosing Backoff Paths lDifferent possibitities lFixed path lChoose path dynamically during training lChoose multiple paths dynamically during training and combine results (Generalized Parallel Backoff) lMany paths -> optimization problem lDuh & Kirchhoff (2004) use genetic algorithm lBilmes and Kirchhoff (2003) report LM perplexities lKirchhoff and Yang (2005) use FLM to rescore n-best list generated by SMT system l3-gram FLM slightly worse then standard 4-gram LM lCombined LM does not outperform standard 4-gram LM
11
Stephan Vogel - Machine Translation11 Hierarchical Lexicon l Morphological analysis lUsing GERCG, a constraint grammar parser for German for lexical analysis and morphological and syntactic disambiguation lBuild equivalence classes lGroup words which tend to translate into same target word lDon’t distinguish what does not need to be distinguished! lEg. for nouns: gender is irrelevant as is nominative, dative, and accusative; but genitive translates differently Sonja Nießen and Hermann Ney, Toward hierarchical models for statistical machine translation of inflected languages. Proceedings of the workshop on data-driven methods in machine translation - Volume 14, 2001.
12
Stephan Vogel - Machine Translation12 Hierarchical Lexicon lEquivalence classes at different levels of abstraction lExample: ankommen ln is full analysis ln-1: drop “first person” -> group “ankomme”, “ankommst”, “ankommt” ln-2: drop singular/plural distinction l…
13
Stephan Vogel - Machine Translation13 Hierarchical Lexicon lTranslation probability Probability for taking all factors up to i into account lAssumption: does not depend on e and word form follows unambiguously from tags lLinear combination of p i
14
Stephan Vogel - Machine Translation14 Multi-Stream Word Alignment lUse multiple annotations: stem, POS, … lConsider each annotation as additional stream or tier lUse generative alignment models lModel each stream lBut tie streams through alignment lExample: Bi-Stream HMM word alignment (Zhao et al 2005)
15
Stephan Vogel - Machine Translation15 Bi-Stream HMM Alignment lHMM: lRelative word position as distortion component (can be conditioned on word classes) lForward-backward algorithms for training
16
Stephan Vogel - Machine Translation16 Bi-Stream HMM Alignment lBi-Stream HMM: lAssume the hidden alignment generates 2 data stream: words and word class labels Stream 1: Stream 2: Stream 1 Stream 2
17
Stephan Vogel - Machine Translation17 Second Stream: Bilingual Word Clusters lIdeally, we want to use word classes with group translations of words in source language cluster into cluster on target side lBilingual Word Clusters (Och, 1999) lAssume monolingual clusters fixed first lOptimize the clusters for the other language (mkcls in GIZA++) lBilingual Word Spectral Clusters lEigen-structure analysis lK-means or single linkage clustering. lOther Word Clusters lLDA (Blei, 2000) lCo-clusters, etc.
18
Stephan Vogel - Machine Translation18 Bi-Stream HMM with Word Clusters lEvaluating Word Alignment Accuracy: F-measure lBi-stream HMM (Bi-HMM) is better than HMM; lBilingual word-spectral clusters are better than traditional ones; lHelps more for small training data. TreeBank, F2ETreeBank, E2F FBIS, F2EFBIS, E2F F-Measure HMM Trad.with-SpecHMMTrad. with-Spec
19
Stephan Vogel - Machine Translation19 Factored Translation Models Paper: Koehn and Hoang, Factored Translation Models, EMNLP 2007 lFactored translation model as extension of phrase-based SMT lInteresting for translating into or between morphology rich languages lExperiments for English-German, English Spanish, English-Czech (I follow that paper. Description on Moses web site is nearly identical. See http://www.statmt.org/moses/?n=Moses.FactoredModelshttp://www.statmt.org/moses/?n=Moses.FactoredModels Example also from: http://www.inf.ed.ac.uk/teaching/courses/mt/lectures/factored-models.pdf)
20
Stephan Vogel - Machine Translation20 Factored Model lAnalysis as preprocessing lNeed to specify the transfer lNeed to specify the generation word lemma POS morphology word class word lemma POS morphology word class …… InputOutput word lemma POS morphology word class word lemma POS morphology word class …… InputOutput Factored Representation Factored Model: transfer and generation
21
Stephan Vogel - Machine Translation21 Transfer lMapping individual factors: lAs we do with non-factored models lExample: Haus -> house, home, building, shell lMapping combinations of factors: lNew vocabulary as Cartesian product of the vocabularies of the individual factors, e.g. NN and singular -> NN|singular lMap these combinations lExample: NN|plural|nominative -> NN|plural, NN|singular lNumber of factors on source and target side can differ
22
Stephan Vogel - Machine Translation22 Generation lGenerate surface form from factors lExamples: house|NN|plural -> houses house|NN|singular -> house house|VB|present|3 rd -person -> houses
23
Stephan Vogel - Machine Translation23 Example including all Steps lGerman word Häuser lAnalysis: lhäuser|haus|NN|plural|nominative|neutral lTranslation lMapping lemma: { ?|house|?|?|?, ?|home|?|?|?, ?|building|?|?|? } lMapping morphology: { ?|house|NN|plural, ?|house|NN|singular, ?|home|NN|plural, ?|building|NN||plural } lGeneration lGenerating surface forms: {houses|house|NN|plural, house|house|NN|singular, homes|home|NN|plural, buildings|building|NN||plural }
24
Stephan Vogel - Machine Translation24 Training the Model lParallel data needs to be annotated -> preprocessing lSource and target side annotation typically independent of each other lSome work on ‘coupled’ annotation, e.g. inducing word classes through clustering with mkcls, or morphological analysis of Arabic conditioned on English side (Linh) lWord alignment lOperate on surface form only lUse multi-stream alignment (example: BiStream HMM) lUse discriminative alignment (example: CRF approach) lEstimate translation probabilities: collect counts for factors or combination of factors lPhrase alignment lExtract from word alignment using standard heuristics lEstimate various scoring functions
25
Stephan Vogel - Machine Translation25 Training the Model lWord alignment (symmetrized)
26
Stephan Vogel - Machine Translation26 Training the Model lExtract phrase: natürlich hat john # naturally john has
27
Stephan Vogel - Machine Translation27 Training the Model lExtract phrase for other factors: ADV V NNP # ADV NNP V
28
Stephan Vogel - Machine Translation28 Training the Generation Steps lTrain on target side of corpus lCan use additional monolingual data lMap factor(s) to factor(s), e.g. word->POS and POS->word lExample: lThe/DET big/ADJ tree/NN lCount collection: count( the, DET )++ count( big, ADJ )++ count( tree, NN )++ lProbability distributions (maximum likelihood estimates) p( the | DET ) and p( DET | the ) p( big | ADJ ) and p( ADJ | big ) p( tree | NN ) and p( NN | tree )
29
Stephan Vogel - Machine Translation29 Combination of Components lLog-linear components of feature functions lSentence translation generated from a set of phrase pairs Translation component: Feature functions h defined over phrase pairs Generation component: Feature function h defined over output words
30
Stephan Vogel - Machine Translation30 Decoding with Factored Models lInstead of just phrase table, now multiple tables lImportant: all mappings operate on same segmentation of source sentence into phrases lMore target translations are now possible lExample: … beat … can be verb or noun Translations: beat # schlag (NN or VB), schlagen (VB), Rhythmus (NN) … beat … schlag schlagen Rhythmus … beat … schlag|NN|Nom schlag|VB|1-person|singular schlag|NN|Dat schlag|NN|Akk Not-factored Factored
31
Stephan Vogel - Machine Translation31 Decoding with Factored Models lCombinatorial explosion -> harsher pruning needed lNotice: translation step features and generation step features depend only on phrase pair lAlternative translations can be generated and inserted into the translation lattice before best-path search begins (building fully expanded phrase table?) lFeatures can be calculate and used for translation model pruning (observation pruning) lPruning in Moses decoder lNon-factored model: default is 20 alternatives lFactored model: default is 50 alternative lIncrease in decoding time: factor 2-3
32
Stephan Vogel - Machine Translation32 Factored LMs in Moses lThe training script allows to specify multiple LMs on different factors, with individual orders (history length) lExample: --lm 0:3:factored-corpus/surface.lm // surface form 3-gram LM --lm 2:3:factored-corpus/pos.lm // POS 3-gram LM lThis generates different LMs on the different factors, not a factored LM lDifferent LMs are used as independent features in decoder lNo backing-off between different factors
33
Stephan Vogel - Machine Translation33 Summary lFactored models to lDeal with large vocabulary in morphology rich LMs l‘Connect’ words, thereby getting better model estimates lExplicitly model morphological dependencies within sentences lFactored models are not always called factored models lHierarchical model (lexicon) lMulti-stream model (alignment) lFactored LMs introduced for ASR lMany backoff paths lMoses decoder lAllows factored TMs and factored LMs lBut no backing-off between factors, only log-linear combination
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.