Download presentation
Presentation is loading. Please wait.
1
Course Summary LING 575 Fei Xia 03/06/07
2
Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics
3
Introduction to MT
4
Major challenges Translation is hard. Getting the right words: –Choosing the correct root form –Getting the correct inflected form –Inserting “spontaneous” words Putting the words in the correct order: –Word order: SVO vs. SOV, … –Unique constructions: –Divergence
5
Lexical choice Homonymy/Polysemy: bank, run Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … Coding (Concept lexeme mapping) differences: –More distinction in one language: e.g., kinship vocabulary. –Different division of conceptual space:
6
Major approaches Transfer-based Interlingua Example-based (EBMT) Statistical MT (SMT) Hybrid approach
7
The MT triangle word Word Meaning Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT (interlingua) Analysis Synthesis
8
Comparison of resource requirement Transfer- based InterlinguaEBMTSMT dictionary+++ Transfer rules + parser+++ (?) semantic analyzer + parallel data++ othersUniversal representation Generator thesaurus
9
Evaluation Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. Human evaluation: accuracy, fluency, … –Problem: expensive, slow, subjective, non-reusable. Automatic measures: –Edit distance –Word error rate (WER), Position-independent WER (PER) –Simple string accuracy (SSA), Generation string accuracy (GSA) –BLEU
10
Major approaches
11
Word-based SMT IBM Models 1-5 Main concepts: –Source channel model –Hidden word alignment –EM training
12
Source channel model for MT Eng sent Noisy channel Fr sent P(E)P(F | E) Two types of parameters: Language model: P(E) Translation model: P(F | E)
13
Modeling p(F | E) with alignment
14
Modeling Parameters: Length prob: P(m | l) Translation prob: t(f j | e i ) Distortion prob (for Model 2): d(i | j, m, l) Model 1: Model 2:
15
Training Model 1:
16
Finding the best alignment Given E and F, we are looking for Model 1:
17
Clump-based SMT The unit of translation is a clump. Training stage: –Word alignment –Extracting clump pairs Decoding stage: –Try all segmentations of the src sent and all the allowed permutations –For each src clump, try TopN tgt clumps –Prune the hypotheses
18
Transfer-based MT Analysis, transfer, generation: –Example: (Quirk et al., 2005) 1.Parse the source sentence 2.Transform the parse tree with transfer rules 3.Translate source words 4.Get the target sentence from the tree Translation as parsing: –Example: (Wu, 1995)
19
Hybrid approaches Preprocessing with transfer rules: (Xia and McCord, 2004), (Collins et al, 2005) Postprocessing with taggers, parsers, etc: JHU 2003 workshop Hierarchical phrase-based model: (Chiang, 2005) …
20
Other topics
21
Other issues Resources –MT for Low density languages –Using comparable corpora and wikipedia Special translation modules –Identifying and translating name entities and abbreviations –…–…
22
To build an MT system (1) Gather resources –Parallel corpora, comparable corpora –Grammars, dictionaries, … Process data –Document alignment, sentence alignment –Tokenization, parsing, …
23
To build an MT system (2) Modeling Training –Word alignment and extracting clump pairs –Learning transfer rules Decoding –Identifying entities and translating them with special modules (optional) –Translation as parsing, or parse + transfer + translation –Segmenting src sentence, replace src clump with target clump, …
24
To build an MT system (3) Post-processing –System combination –Reranking Using the system for other applications: –Cross-lingual IR –Computer-assisted translation –….
25
Misc Grades –Assignments ( hw1-hw3): 30% –Class participation: 20% –Project: Presentation: 25% Final paper: 25%
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.