Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.

Similar presentations


Presentation on theme: "CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin."— Presentation transcript:

1 CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin Probst, Erik Peterson, Chris Monson Language Technologies Institute, CMU

2 MilliRADD Approaches MT for minimal-resource languages Limited parallel corpora (100K) Limited dictionary (10K) Limited native speaker access No pre-existing grammars, tree-banks, etc. MT Engines Statistical MT (SMT) Example Based MT (EBMT) Transfer-based MT (Learning transfer rules) Multi-Engine (MEMT)

3 Statistical MT Core engine: same as MegaRADD Improvements since dry run: Treatment of articles/morphology Phrasal alignments and units Dynamic re-segmentation Best performing MT engine (Official NIST score = 6.14 Latest NIST score = 6.30)

4 Improvements to SMT Measured on Dry-Run Data BASELINE SMT 6.38 SMT + phrasing for S-T and T-S 6.74 SMT + phrasing for S-S and T-T 6.80 SMT + both phrasings 6.88

5 Example-Based MT Longest-fragment match in corpus Permits inexact matching Multiple matches generate lattice Target language model finds best path Augmentation 10K dictionary + bilingual word-pairs extracted statistically from 100K corpus Official NIST score = 5.29 (3.97 w/o dict)

6 EBMT: Combine Trx Fragments English: I would like to meet her. Mapudungun: Ayükefun trawüael fey engu. English: The tallest man is my father. Mapudungun: Chi doy fütra chi wentru fey ta inche ñi chaw. English: I would like to meet the tallest man Mapudungun (new): Ayükefun trawüael Chi doy fütra chi wentru Mapudungun (correct): Ayüken ñi trawüael chi doy fütra wentruengu.

7 Transfer Rules Manually-developed transfer rules (only 71 hours development time) Strict compositionality Lexicon = 10K + statistical pairs from 100K parallel corpus. Target Language model disambiguation Official NIST score = 4.84

8 Segmentation Differences on Mandarin Dry-Run Evaluation Segmentation with large dict 10K-dictionary + T-bank w’s SMT 6.88 6.64 EBMT 5.82 5.93 Transfer 5.25 5.38

9 Next Major Developents Continued improvements to all engines New algorithm for EBMT (for Mega- and Milli-RADD) Multi-Engine combinations Automatically-acquired Transfer Rule Approach based on seeded VS’s

10 Compositionality Adjust rule to reflect compositionality NP rule can be used to translate part of the sentence; keep / add context constraints, eliminate unnecessary ones Flat Seed Generation The highly qualified applicant visits the company. Der äußerst qualifizierte Bewerber besucht die Firma. ((1,1),(2,2),(3,3),(4,4),(5,5),(6,6)) S::S [det adv adj n v det n] [det adv adj n v det n] ((x1::y1) (x2::y2)…. ((x4 agr) = *3-sing) … ((y3 case) = *nom)…) Goal: Syntactic Transfer Rules 1) Flat Seed Generation: produce rules from word-aligned sentence pairs, abstracted only to POS level; no syntactic structure 2) Add compositional structure to Seed Rule by exploiting previously learned rules 3) Seeded Version Space Learning group seed rules by constituent sequences and alignments, seed rules form s-boundary of VS; generalize with validation Seeded Version Space Learning Group seed rules into version spaces: … NP v det n… Notes: 1) Partial order of rules in VS 2) Generalization via merging S::S [NP v det n] [NP n v det n] ((x1::y1) (x2::y2)…. … ((y1 case) = *nom)…) NP::NP [det adv adj n] [det adv adj n] ((x1::y1)… ((y4 agr) = (x4 agr) ….) Merge two rules: 1) Deletion of constraint 2) Raising of two value to one Agreement constraint, e.g. ((x1 num) = *pl), ((x3 num) = *pl) ((x1 num) = (x3 num) 3) Use merged rule to translate


Download ppt "CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin."

Similar presentations


Ads by Google