Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,

Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights, New York

Previous attempt at using syntax 2003 NSF/JHU MT Summer Workshop (Och et. al., 2004) : Method: run SMT to generate topN translations, and use syntactic info to rerank the candidates. => Parsing MT output is problematic. Result: no gain from syntax.

Outline Current phrase-based SMT systems (a.k.a. clump-based systems) Overview of the new approach Learning and applying rewrite rules Experimental results Conclusion and future work

Clump-based SMT The unit of translation is a clump, rather than a word. A clump is simply a word n- gram. Ex: P (est le premier | is the first) Baseline system: (Tillmann & Xia, 2003)

Clump-based system: training stage Source Sentence Clump Extractor parallel data Clump Library Preprocessor Target sentence Preprocessor

Clump-based system: translation stage Preprocessor Decoder Source sentence Clump Library Target translation Language Model

France is the first western country … Baseline system: clump extraction France is => France est La France est le premier pays occidental … France => France

France is the first western country … Baseline system: clump extraction France is => France est La France est le premier pays occidental … France => France France is the => France est le

Baseline system: decoding He is the first international student he => il, he is => il est, he is the => il est le first => premier, is the first => est le premier international => international student => étudiant He is the first international student

Monotonic vs. non-monotonic decoding premier il est le international étudiant... il est le premier international étudiant il est le premier étudiant international He is the first international student Monotonic decoding: Non-monotonic decoding: (S2, S1, S3, S4): (S1, S2, S4, S3): (S1, S2, S3, S4): S1S2S3S4

Challenges for current clump-based systems premier il est le international étudiant premier international il est le étudiant (1) Non-monotonic decoding is expensive (n!), and it can hurt performance. He is the first international student S1 S2 S3 S4 (S2, S1, S3, S4): (S2, S3, S1, S4): (S2, S3, S4, S1): premier international étudiant il est le

Challenges for current clump-based systems (ctd) (2) No phrase-level generalizations are learned and used. France is the first western country. He is the first international student. Rewrite rules are useful: word-level rule: Adj N => N Adj phrase-level rule: Subj V Obj => V Subj Obj

New approach He is the first international student He is the first student international Applying rewrite rules: Adj N => N Adj il est le premier étudiant international Monotonic decoding

Rewrite rules S NP V S V NP V NP => V NP NP NP 0 V NP 1 => V NP 0 NP 1

Defaults and exceptions Adj N => N Adj Adj (first) N => Adj (premier) N NP 0 V NP 1 (iobj, pron) => NP 0 NP 1 (iobj, pron) V NP 0 V NP 1 => NP 0 V NP 1 NP 0 V NP 1 (iobj, pron) => NP 0 NP 1 V Adj (first) N => Adj N  Learn both defaults and exceptions

New approach: training stage 1 Source Sentence Phrase Aligner Parallel data Parser Target sentence Parser Rewrite rules Rewrite Rule Extractor

New approach: training stage 2 Source Sentence Clump Extractor Parallel data Rewrite Rule Applier Source sentence in target word order Clump Library PreprocessorParser Target sentence Preprocessor Rewrite rules

New approach: translation stage Preprocessor Decoder Source sentence Target translation Rewrite rule applier source sentence in target word order Parser Clump Library Language Model Rewrite rules

Tasks Learn rewrite rules automatically from data. Apply rewrite rules to source parse trees.

Learning rewrite rules Parse source and target sentences Align linguistic phrases Extract rewrite rules Organize rewrite rules into a hierarchy

Parsing France is the first western country Slot grammar: (McCord 1980, 1993, …)

Parse trees in Penn Treebank style S NP-SBJ V NP-PRD is DetAdj NN Francethe first western country

Aligning phrases NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la

Extracting rewrite rules NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la NP 0 (France) V(is) NP 1 (country) => NP 0 V NP 1

Extracting rewrite rules NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la N (France) => Det (la) N NP 0 (France) V(is) NP 1 (country) => NP 0 V NP 1

Extracting rewrite rules NP-SBJ S VNP-PRD is DetAdj N N France the first western country NP-SBJ S VNP-PRD est DetAdjN N Francele premier pays occidental Det la Det (the) Adj 1 (first) Adj 2 (western) N (country) => Det Adj 1 N Adj 2 N (France) => Det (la) N NP 0 (France) V(is) NP 1 (country) => NP 0 V NP 1

Creating more generalized rules Det (the) Adj 1 (first) Adj 2 (western) N (country) => Det Adj 1 N Adj 2 Adj (first) N (country) => Adj N * Adj N => Adj N * Adj (first) N => Adj N Adj (western) N (country) => N Adj * Adj N => N Adj * Adj (western) N => N Adj

Merging the counts and normalize ADJ N => N ADJ 539328 0.64 ADJ N => ADJ N 278091 0.33 ADJ (first) N => ADJ N 10245 0.99 ADJ (first) N => N ADJ 103 0.01 ADJ (first) N (country) => ADJ N 27 1.0

Organizing rewrite rules N Adj N PP Adj (first) N Adj(first) N(country)  N 0.9 => Det N 0.1 Adj N => N Adj 0.64 => Adj N 0.33 => Adj N 0.99 => N Adj 0.01 => N Adj PP 0.61 => Adj N PP 0.30 N PP => N PP 0.91 => Det N PP 0.05 => Adj N 1.0 Adj N N PP Adj N PP Adj(first) N(country) N

Organizing rewrite rules Adj (first) N Adj N => N Adj 0.64 => Adj N 0.27 => Adj N 0.99 => N Adj 0.26 => Det Adj N 0.35 Adj N

Applying rewrite rules Adj 1 (first) Adj 2 N => Adj 1 N Adj 2 He NP-SBJ S VNP-PRD is DetAdj N N the first international student NP-SBJ S VNP-PRD is DetAdjN N He the first student international Adj N => N Adj Adj 1 Adj 2 N => N Adj 2 Adj 1

Decoding He is the first international student He is the first student international Applying rewrite rules: Adj (first) Adj N => Adj N Adj il est le premier étudiant international Monotonic decoding

Experimental Result Training data: 90M-word Eng-Fr Canadian Hansard Test data: 500 sentences in news domain Metrics: Bleu score (Papineni et. al., 2002) 1-reference translation Parser: English and French slot grammars Baseline system: (Tillmann and Xia, 2003)

Extracted rewrite rules Extracted rules: 15.0 M After removing singleton: 2.9 M After filtering with hierarchy: 56 K -- 1K unlexicalized rules -- 55K lexicalized rules, represented as 760 compact rule schemes. Ex: Adj (w) N => Adj N: w: new, first, prime, many, other, ….

Most commonly used rules # of rules applied per sentence: 1.4 times Adj N => N Adj : 0.32 Adj (w) N => Adj N : 0.15 NP 1 ’s NP 2 => NP 2 de NP 1 : 0.05 NP Adv V => NP V Adv: 0.03 NP 1 V NP 2 (pron) => NP 1 NP 2 V: 0.03

Monotonic vs. non-monotonic decoding 0.17 0.175 0.18 0.185 0.19 0.195 0.2 0.205 0.21 0.215 baselinew/ rewrite rules non-monotonic monotonic

Monotonic decoding results Max source clump size Bleu Score (1-ref) baseline w/ rewrite rules

Conclusion Use automatically learned rewrite rules to reorder source sentences: –Rewrite rules allow generalizations –Monotonic decoding speeds up translation Gain: 10% improvement in Bleu. (0.196 => 0.215)

Future work Try other language pairs (Ar-Eng, Ch-Eng). Inject rewriting lattice into statistical models. Use rewrite rules directly in the decoder.

Backup slides

An example of filtering rules N* (10^9) Det Adj N* Adj(prime) N* (10^3) Adj N* (10^6) Det Adj(prime) N* (10^2) => N 0.9  N* Adj 0.7  Adj N* 0.3 => Adj N 1.0 => Det Adj N 0.85 (gain=0) (gain=4*10^5) (gain=10^3) (gain=0)

Fr: le presse service de le premier ministre service de presse An example Eng: the prime minister ’s press office issued the following press release Fr: le premier ministre de presse service diffusé le suivant communiqué a diffusé le communiqué suivant du

Main issues in MT Word choice: office => bureau, cabinet, ……, service release => libération, sortie, disque,.. communiqué Inserting glue words: e.g., preposition “de”, aux verb “a” Ordering target words: service de presse Morphing target words: subject-verb agreement, contraction (de + le => du), etc.

Two approaches to MT Syntax-based MT Statistical MT (SMT)

Syntax-based MT Major steps : –Parse the source sentence –Translate source words into target words –Reshape source parse tree with rewrite rules –Read target sentence off the tree.

Translation lexicon: prime => premier, ’s => de office => service if modified by “press” Rewrite rules: NP1 de NP2 => NP2 de NP1 the prime minister NP press office NP ’s NP N Det Adj N le premier ministre presse service de Translation: service de presse de le premier ministre N1 N2 => N2 de N1 du NP press office N presse service the prime minister Det Adj N le premier ministre de NP N de N service pressethe prime minister Det Adj N le premier ministre de

Syntax-based approach It requires: a parser for the source language a translation lexicon a set of rewrite rules Normally, these components are created by hand.

Statistical Machine Translation  Learn from parallel corpus  Easier to create translation systems for new language pairs  “Phrase-based” models outperforms word-based models. E->F Translator F->E Translator

Advantages of phrase pairs Translating source word with extended context: press office => service de presse Glue word insertion: e.g., “de” Ordering of target words Morphing target words: the prime minster ’s => du premier ministre: de + le => du

NSF workshop experiments Data: 150M-word Chinese-English parallel corpora. Top 1000 candidates, 4 references Baseline (SMT): 0.316 in bleu Oracle result: 0.398 in bleu Each method: range from 0.304 to 0.325 Adding all good methods: 0.332 Typical improvements: no syntax > shallow ~ tricky > deep syntax Sadly, no gain from syntax

Syntax-based rewrite rules (X0 => X1 … Xn) => (Y0 => Y1 … Yn) Xi, Yi: head word, thematic role (e.g., subj, obj), syntax label (POS tag of the head word), etc. VP => V iobj + NP + w_it => VP => iobj + noun + w_le V V NP (iobj, it) => NP (iobj, le) V

Parsing: ESG parser Slot grammar is a lexicalized, dependency- oriented system (McCord 1980) Languages covered: English, German, French, Spanish, Italian, and Portuguese.

the prime minister Det Adj N press office NP ’s NP N N NP NP1 ’s NP2 => NP2 de NP1 N1 N2 => le N2 de N1 Det Adj N => Det Adj N Training (1) : parse and learn rewrite rules le premier ministre NP de NP Det N P N Det Adj N le service de presse 1 2 21

the prime minister Det Adj N press office NP ’s NP N N NP1 ’s NP2 => NP2 de NP1 N1 N2 => le N2 de N1 Det Adj N => Det Adj N Training (2): put Eng sentences into Fr order the prime minister NP de NP Det N P N Det Adj N le office de press => le office de press de the prime minister

Training (3): learn phrases from training data Eng: le office de press de the prime minister Fr: le service de presse du premier ministre Phrase pairs learned: le office de press => le service de presse press de the prime minister => presse du premier minstre le => le de => de, de the => du

the government economic policy NP the government Det N NP ’s NP Adj N NP1 ’s NP2 => NP2 de NP1 Adj N => N Adj Translating (1): put Eng sentences into Fr order NP deNP N Adj Det N policy ecomomic => policy ecomomic de the goverment

Translating (2): translate with SMT decoder Eng: policy economic de the government Phrase pairs learned at training time: policy => politique economic => economique de the government => du gouvernement SMT output: translating in linear order politique economique du gourvernement

Test the idea Training data: 90M English-French Candide data Test data: 500 sentences, 1 reference translation Parser: English and French slot grammars (ESG and FSG) Rewrite rules: 10 hand-written rewrite rules Adj N => N Adj

Experimental results Improvement so far: from 0.196 to 0.214 (+9%) NSF workshop: no gain from syntax Not reorder source Reorder source Not reorder target Reorder target 0.196 0.187 0.214 0.184

Learning rewrite rules from data There are many rules and many exceptions: ADJ N => N ADJ 0.47 ADJ N => ADJ N0.27 Ex: small, recent, past, former, next, last, good, previous, serious, certain, large, great, various, …..

Algorithm Parse source and target sentences Align linguistic phrases Extract rewrite rules

Filtering rewrite rules Why? Too many rules Most are “redundant” How? Put rules into a hierarchy Calculate gains w.r.t. parents

Translation results Baseline (no rewrite rules): 0.196 with 10 hand-written rules: 0.214 with 1K unlexicalized rules: 0.211 with 1K unlexicalized rules and 760 “meta” rules: 0.215

Details of filtering algorithm Remove redundant unlexicalized rules Remove redundant lexicalized rules w.r.t. the corresponding unlexicalized rules Put

Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,

Similar presentations

Presentation on theme: "Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,

Similar presentations

Presentation on theme: "Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,"— Presentation transcript:

Similar presentations

About project

Feedback