Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Similar presentations


Presentation on theme: "Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics."— Presentation transcript:

1 Course Summary LING 575 Fei Xia 03/06/07

2 Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics

3 Introduction to MT

4 Major challenges Translation is hard. Getting the right words: –Choosing the correct root form –Getting the correct inflected form –Inserting “spontaneous” words Putting the words in the correct order: –Word order: SVO vs. SOV, … –Unique constructions: –Divergence

5 Lexical choice Homonymy/Polysemy: bank, run Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … Coding (Concept  lexeme mapping) differences: –More distinction in one language: e.g., kinship vocabulary. –Different division of conceptual space:

6 Major approaches Transfer-based Interlingua Example-based (EBMT) Statistical MT (SMT) Hybrid approach

7 The MT triangle word Word Meaning Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT (interlingua) Analysis Synthesis

8 Comparison of resource requirement Transfer- based InterlinguaEBMTSMT dictionary+++ Transfer rules + parser+++ (?) semantic analyzer + parallel data++ othersUniversal representation Generator thesaurus

9 Evaluation Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. Human evaluation: accuracy, fluency, … –Problem: expensive, slow, subjective, non-reusable. Automatic measures: –Edit distance –Word error rate (WER), Position-independent WER (PER) –Simple string accuracy (SSA), Generation string accuracy (GSA) –BLEU

10 Major approaches

11 Word-based SMT IBM Models 1-5 Main concepts: –Source channel model –Hidden word alignment –EM training

12 Source channel model for MT Eng sent Noisy channel Fr sent P(E)P(F | E) Two types of parameters: Language model: P(E) Translation model: P(F | E)

13 Modeling p(F | E) with alignment

14 Modeling Parameters: Length prob: P(m | l) Translation prob: t(f j | e i ) Distortion prob (for Model 2): d(i | j, m, l) Model 1: Model 2:

15 Training Model 1:

16 Finding the best alignment Given E and F, we are looking for Model 1:

17 Clump-based SMT The unit of translation is a clump. Training stage: –Word alignment –Extracting clump pairs Decoding stage: –Try all segmentations of the src sent and all the allowed permutations –For each src clump, try TopN tgt clumps –Prune the hypotheses

18 Transfer-based MT Analysis, transfer, generation: –Example: (Quirk et al., 2005) 1.Parse the source sentence 2.Transform the parse tree with transfer rules 3.Translate source words 4.Get the target sentence from the tree Translation as parsing: –Example: (Wu, 1995)

19 Hybrid approaches Preprocessing with transfer rules: (Xia and McCord, 2004), (Collins et al, 2005) Postprocessing with taggers, parsers, etc: JHU 2003 workshop Hierarchical phrase-based model: (Chiang, 2005) …

20 Other topics

21 Other issues Resources –MT for Low density languages –Using comparable corpora and wikipedia Special translation modules –Identifying and translating name entities and abbreviations –…–…

22 To build an MT system (1) Gather resources –Parallel corpora, comparable corpora –Grammars, dictionaries, … Process data –Document alignment, sentence alignment –Tokenization, parsing, …

23 To build an MT system (2) Modeling Training –Word alignment and extracting clump pairs –Learning transfer rules Decoding –Identifying entities and translating them with special modules (optional) –Translation as parsing, or parse + transfer + translation –Segmenting src sentence, replace src clump with target clump, …

24 To build an MT system (3) Post-processing –System combination –Reranking Using the system for other applications: –Cross-lingual IR –Computer-assisted translation –….

25 Misc Grades –Assignments ( hw1-hw3): 30% –Class participation: 20% –Project: Presentation: 25% Final paper: 25%


Download ppt "Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics."

Similar presentations


Ads by Google