Download presentation
Presentation is loading. Please wait.
1
TIDES MT Workshop Review
2
Using Syntax? ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible phrase substitutions Output English parse tree Algorithm: lattice parsing (?) JHU: –Failed to incorporate “linguistic knowledge” Morphology, NE, Syntax Modeling phrase movement did not help
3
Phrase Alignment
4
ITC-irst ITC-irst: very similar to Franz’s system Log-linear model / minimum error training Phrase based model Preprocessing –Chinese numberical translation, segmentation, split of long sentence (testing) LM Adaptation: mixture of LM from different copora
5
JHU Alignment template + WFST Multiple phrase segmentation of source sentence is essential in translation BiText chunking(DP, and Divisive Clustering): similar idea as our sentence splitting Phrase-level movement Document-specific LM (LM adaptation) –Gains from Doc-specific LMs and BMR-Bleu are not additive
6
ATR ATR: –unsupervised Chinese word segmenter –Truecasing by Conditional Random Field
7
IBM Word reordering –Pre-ordering: reorder the source sentence 10% improvement –Word-level and block level reordering
8
ISI Franz system: –Log linear model –Alignment template –Discriminative training –DP search –New feature functions Lexicalized reordering+1% Bleu Penalize word deletions+2% Bleu –Tight integration of rule-based translations+2% Translation Components: numbers, NE, dates Train classifier to identify where TC works where
9
ISI Franz: important things in system developing –Good engineering is important Scalability Efficiency No bugs in software Good overall system architecture –Error analysis should drive research Step1: what is the major error in current system Step2: fix it! Step3: goto step 1
10
Comparable Corpora ISI: –Arabic: 99M->106M; Bleu: 43.8->42.99 –Chinese: 168M->176M; Bleu: 32.05->32.85
11
Confidence Intervals Bootstrapping IBM’s method –Chop the test data into 50 pieces NIST’s method –Sign test
12
New Players BYU: simple transfer system Linear B: human post edit MT hypothesis (HAMT) MTM linguaSoft: based on CIMOS rule based system NTT: WFST based decoder
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.