Presentation is loading. Please wait.

Presentation is loading. Please wait.

TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.

Similar presentations


Presentation on theme: "TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible."— Presentation transcript:

1 TIDES MT Workshop Review

2 Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible phrase substitutions Output English parse tree Algorithm: lattice parsing (?)  JHU: –Failed to incorporate “linguistic knowledge” Morphology, NE, Syntax Modeling phrase movement did not help

3 Phrase Alignment

4 ITC-irst  ITC-irst: very similar to Franz’s system  Log-linear model / minimum error training  Phrase based model  Preprocessing –Chinese numberical translation, segmentation, split of long sentence (testing)  LM Adaptation: mixture of LM from different copora

5 JHU  Alignment template + WFST  Multiple phrase segmentation of source sentence is essential in translation  BiText chunking(DP, and Divisive Clustering): similar idea as our sentence splitting  Phrase-level movement  Document-specific LM (LM adaptation) –Gains from Doc-specific LMs and BMR-Bleu are not additive

6 ATR  ATR: –unsupervised Chinese word segmenter –Truecasing by Conditional Random Field

7 IBM  Word reordering –Pre-ordering: reorder the source sentence 10% improvement –Word-level and block level reordering

8 ISI  Franz system: –Log linear model –Alignment template –Discriminative training –DP search –New feature functions Lexicalized reordering+1% Bleu Penalize word deletions+2% Bleu –Tight integration of rule-based translations+2% Translation Components: numbers, NE, dates Train classifier to identify where TC works where

9 ISI  Franz: important things in system developing –Good engineering is important Scalability Efficiency No bugs in software Good overall system architecture –Error analysis should drive research Step1: what is the major error in current system Step2: fix it! Step3: goto step 1

10 Comparable Corpora  ISI: –Arabic: 99M->106M; Bleu: 43.8->42.99 –Chinese: 168M->176M; Bleu: 32.05->32.85

11 Confidence Intervals  Bootstrapping  IBM’s method –Chop the test data into 50 pieces  NIST’s method –Sign test

12 New Players  BYU: simple transfer system  Linear B: human post edit MT hypothesis (HAMT)  MTM linguaSoft: based on CIMOS rule based system  NTT: WFST based decoder


Download ppt "TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible."

Similar presentations


Ads by Google