Download presentation
Presentation is loading. Please wait.
Published byOliver Patrick Modified over 8 years ago
1
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights, New York
2
Previous attempt at using syntax 2003 NSF/JHU MT Summer Workshop (Och et. al., 2004) : Method: run SMT to generate topN translations, and use syntactic info to rerank the candidates. => Parsing MT output is problematic. Result: no gain from syntax.
3
Outline Current phrase-based SMT systems (a.k.a. clump-based systems) Overview of the new approach Learning and applying rewrite rules Experimental results Conclusion and future work
4
Clump-based SMT The unit of translation is a clump, rather than a word. A clump is simply a word n- gram. Ex: P (est le premier | is the first) Baseline system: (Tillmann & Xia, 2003)
5
Clump-based system: training stage Source Sentence Clump Extractor parallel data Clump Library Preprocessor Target sentence Preprocessor
6
Clump-based system: translation stage Preprocessor Decoder Source sentence Clump Library Target translation Language Model
7
France is the first western country … Baseline system: clump extraction France is => France est La France est le premier pays occidental … France => France
8
France is the first western country … Baseline system: clump extraction France is => France est La France est le premier pays occidental … France => France France is the => France est le
9
Baseline system: decoding He is the first international student he => il, he is => il est, he is the => il est le first => premier, is the first => est le premier international => international student => étudiant He is the first international student
10
Monotonic vs. non-monotonic decoding premier il est le international étudiant... il est le premier international étudiant il est le premier étudiant international He is the first international student Monotonic decoding: Non-monotonic decoding: (S2, S1, S3, S4): (S1, S2, S4, S3): (S1, S2, S3, S4): S1S2S3S4
11
Challenges for current clump-based systems premier il est le international étudiant premier international il est le étudiant (1) Non-monotonic decoding is expensive (n!), and it can hurt performance. He is the first international student S1 S2 S3 S4 (S2, S1, S3, S4): (S2, S3, S1, S4): (S2, S3, S4, S1): premier international étudiant il est le
12
Challenges for current clump-based systems (ctd) (2) No phrase-level generalizations are learned and used. France is the first western country. He is the first international student. Rewrite rules are useful: word-level rule: Adj N => N Adj phrase-level rule: Subj V Obj => V Subj Obj
13
New approach He is the first international student He is the first student international Applying rewrite rules: Adj N => N Adj il est le premier étudiant international Monotonic decoding
14
Rewrite rules S NP V S V NP V NP => V NP NP NP 0 V NP 1 => V NP 0 NP 1
15
Defaults and exceptions Adj N => N Adj Adj (first) N => Adj (premier) N NP 0 V NP 1 (iobj, pron) => NP 0 NP 1 (iobj, pron) V NP 0 V NP 1 => NP 0 V NP 1 NP 0 V NP 1 (iobj, pron) => NP 0 NP 1 V Adj (first) N => Adj N Learn both defaults and exceptions
16
New approach: training stage 1 Source Sentence Phrase Aligner Parallel data Parser Target sentence Parser Rewrite rules Rewrite Rule Extractor
17
New approach: training stage 2 Source Sentence Clump Extractor Parallel data Rewrite Rule Applier Source sentence in target word order Clump Library PreprocessorParser Target sentence Preprocessor Rewrite rules
18
New approach: translation stage Preprocessor Decoder Source sentence Target translation Rewrite rule applier source sentence in target word order Parser Clump Library Language Model Rewrite rules
19
Tasks Learn rewrite rules automatically from data. Apply rewrite rules to source parse trees.
20
Learning rewrite rules Parse source and target sentences Align linguistic phrases Extract rewrite rules Organize rewrite rules into a hierarchy
21
Parsing France is the first western country Slot grammar: (McCord 1980, 1993, …)
22
Parse trees in Penn Treebank style S NP-SBJ V NP-PRD is DetAdj NN Francethe first western country
23
Aligning phrases NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la
24
Extracting rewrite rules NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la NP 0 (France) V(is) NP 1 (country) => NP 0 V NP 1
25
Extracting rewrite rules NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la N (France) => Det (la) N NP 0 (France) V(is) NP 1 (country) => NP 0 V NP 1
26
Extracting rewrite rules NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la Det (the) Adj 1 (first) Adj 2 (western) N (country) => Det Adj 1 N Adj 2 N (France) => Det (la) N NP 0 (France) V(is) NP 1 (country) => NP 0 V NP 1
27
Creating more generalized rules Det (the) Adj 1 (first) Adj 2 (western) N (country) => Det Adj 1 N Adj 2 Adj (first) N (country) => Adj N * Adj N => Adj N * Adj (first) N => Adj N Adj (western) N (country) => N Adj * Adj N => N Adj * Adj (western) N => N Adj
28
Merging the counts and normalize ADJ N => N ADJ 539328 0.64 ADJ N => ADJ N 278091 0.33 ADJ (first) N => ADJ N 10245 0.99 ADJ (first) N => N ADJ 103 0.01 ADJ (first) N (country) => ADJ N 27 1.0
29
Organizing rewrite rules N Adj N PP Adj (first) N Adj(first) N(country) N 0.9 => Det N 0.1 Adj N => N Adj 0.64 => Adj N 0.33 => Adj N 0.99 => N Adj 0.01 => N Adj PP 0.61 => Adj N PP 0.30 N PP => N PP 0.91 => Det N PP 0.05 => Adj N 1.0 Adj N N PP Adj N PP Adj(first) N(country) N
30
Organizing rewrite rules Adj (first) N Adj N => N Adj 0.64 => Adj N 0.27 => Adj N 0.99 => N Adj 0.26 => Det Adj N 0.35 Adj N
31
Applying rewrite rules Adj 1 (first) Adj 2 N => Adj 1 N Adj 2 He NP-SBJ S VNP-PRD is DetAdj N N the first international student NP-SBJ S VNP-PRD is DetAdjN N He the first student international Adj N => N Adj Adj 1 Adj 2 N => N Adj 2 Adj 1
32
Decoding He is the first international student He is the first student international Applying rewrite rules: Adj (first) Adj N => Adj N Adj il est le premier étudiant international Monotonic decoding
33
Experimental Result Training data: 90M-word Eng-Fr Canadian Hansard Test data: 500 sentences in news domain Metrics: Bleu score (Papineni et. al., 2002) 1-reference translation Parser: English and French slot grammars Baseline system: (Tillmann and Xia, 2003)
34
Extracted rewrite rules Extracted rules: 15.0 M After removing singleton: 2.9 M After filtering with hierarchy: 56 K -- 1K unlexicalized rules -- 55K lexicalized rules, represented as 760 compact rule schemes. Ex: Adj (w) N => Adj N: w: new, first, prime, many, other, ….
35
Most commonly used rules # of rules applied per sentence: 1.4 times Adj N => N Adj : 0.32 Adj (w) N => Adj N : 0.15 NP 1 ’s NP 2 => NP 2 de NP 1 : 0.05 NP Adv V => NP V Adv: 0.03 NP 1 V NP 2 (pron) => NP 1 NP 2 V: 0.03
36
Monotonic vs. non-monotonic decoding 0.17 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 baselinew/ rewrite rules non-monotonic monotonic
37
Monotonic decoding results Max source clump size Bleu Score (1-ref) baseline w/ rewrite rules
38
Conclusion Use automatically learned rewrite rules to reorder source sentences: –Rewrite rules allow generalizations –Monotonic decoding speeds up translation Gain: 10% improvement in Bleu. (0.196 => 0.215)
39
Future work Try other language pairs (Ar-Eng, Ch-Eng). Inject rewriting lattice into statistical models. Use rewrite rules directly in the decoder.
40
Backup slides
41
An example of filtering rules N* (10^9) Det Adj N* Adj(prime) N* (10^3) Adj N* (10^6) Det Adj(prime) N* (10^2) => N 0.9 N* Adj 0.7 Adj N* 0.3 => Adj N 1.0 => Det Adj N 0.85 (gain=0) (gain=4*10^5) (gain=10^3) (gain=0)
42
Fr: le presse service de le premier ministre service de presse An example Eng: the prime minister ’s press office issued the following press release Fr: le premier ministre de presse service diffusé le suivant communiqué a diffusé le communiqué suivant du
43
Main issues in MT Word choice: office => bureau, cabinet, ……, service release => libération, sortie, disque,.. communiqué Inserting glue words: e.g., preposition “de”, aux verb “a” Ordering target words: service de presse Morphing target words: subject-verb agreement, contraction (de + le => du), etc.
44
Two approaches to MT Syntax-based MT Statistical MT (SMT)
45
Syntax-based MT Major steps : –Parse the source sentence –Translate source words into target words –Reshape source parse tree with rewrite rules –Read target sentence off the tree.
46
Translation lexicon: prime => premier, ’s => de office => service if modified by “press” Rewrite rules: NP1 de NP2 => NP2 de NP1 the prime minister NP press office NP ’s NP N Det Adj N le premier ministre presse service de Translation: service de presse de le premier ministre N1 N2 => N2 de N1 du NP press office N presse service the prime minister Det Adj N le premier ministre de NP N de N service pressethe prime minister Det Adj N le premier ministre de
47
Syntax-based approach It requires: a parser for the source language a translation lexicon a set of rewrite rules Normally, these components are created by hand.
48
Statistical Machine Translation Learn from parallel corpus Easier to create translation systems for new language pairs “Phrase-based” models outperforms word-based models. E->F Translator F->E Translator
49
Advantages of phrase pairs Translating source word with extended context: press office => service de presse Glue word insertion: e.g., “de” Ordering of target words Morphing target words: the prime minster ’s => du premier ministre: de + le => du
50
NSF workshop experiments Data: 150M-word Chinese-English parallel corpora. Top 1000 candidates, 4 references Baseline (SMT): 0.316 in bleu Oracle result: 0.398 in bleu Each method: range from 0.304 to 0.325 Adding all good methods: 0.332 Typical improvements: no syntax > shallow ~ tricky > deep syntax Sadly, no gain from syntax
51
Syntax-based rewrite rules (X0 => X1 … Xn) => (Y0 => Y1 … Yn) Xi, Yi: head word, thematic role (e.g., subj, obj), syntax label (POS tag of the head word), etc. VP => V iobj + NP + w_it => VP => iobj + noun + w_le V V NP (iobj, it) => NP (iobj, le) V
52
Parsing: ESG parser Slot grammar is a lexicalized, dependency- oriented system (McCord 1980) Languages covered: English, German, French, Spanish, Italian, and Portuguese.
54
the prime minister Det Adj N press office NP ’s NP N N NP NP1 ’s NP2 => NP2 de NP1 N1 N2 => le N2 de N1 Det Adj N => Det Adj N Training (1) : parse and learn rewrite rules le premier ministre NP de NP Det N P N Det Adj N le service de presse 1 2 21
55
the prime minister Det Adj N press office NP ’s NP N N NP1 ’s NP2 => NP2 de NP1 N1 N2 => le N2 de N1 Det Adj N => Det Adj N Training (2): put Eng sentences into Fr order the prime minister NP de NP Det N P N Det Adj N le office de press => le office de press de the prime minister
56
Training (3): learn phrases from training data Eng: le office de press de the prime minister Fr: le service de presse du premier ministre Phrase pairs learned: le office de press => le service de presse press de the prime minister => presse du premier minstre le => le de => de, de the => du
57
the government economic policy NP the government Det N NP ’s NP Adj N NP1 ’s NP2 => NP2 de NP1 Adj N => N Adj Translating (1): put Eng sentences into Fr order NP deNP N Adj Det N policy ecomomic => policy ecomomic de the goverment
58
Translating (2): translate with SMT decoder Eng: policy economic de the government Phrase pairs learned at training time: policy => politique economic => economique de the government => du gouvernement SMT output: translating in linear order politique economique du gourvernement
59
Test the idea Training data: 90M English-French Candide data Test data: 500 sentences, 1 reference translation Parser: English and French slot grammars (ESG and FSG) Rewrite rules: 10 hand-written rewrite rules Adj N => N Adj
60
Experimental results Improvement so far: from 0.196 to 0.214 (+9%) NSF workshop: no gain from syntax Not reorder source Reorder source Not reorder target Reorder target 0.196 0.187 0.214 0.184
61
Learning rewrite rules from data There are many rules and many exceptions: ADJ N => N ADJ 0.47 ADJ N => ADJ N0.27 Ex: small, recent, past, former, next, last, good, previous, serious, certain, large, great, various, …..
62
Algorithm Parse source and target sentences Align linguistic phrases Extract rewrite rules
63
Filtering rewrite rules Why? Too many rules Most are “redundant” How? Put rules into a hierarchy Calculate gains w.r.t. parents
64
Translation results Baseline (no rewrite rules): 0.196 with 10 hand-written rules: 0.214 with 1K unlexicalized rules: 0.211 with 1K unlexicalized rules and 760 “meta” rules: 0.215
65
Details of filtering algorithm Remove redundant unlexicalized rules Remove redundant lexicalized rules w.r.t. the corresponding unlexicalized rules Put
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.