Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Course Summary LING 575 Fei Xia 03/06/07

Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics

Introduction to MT

Major challenges Translation is hard. Getting the right words: –Choosing the correct root form –Getting the correct inflected form –Inserting “spontaneous” words Putting the words in the correct order: –Word order: SVO vs. SOV, … –Unique constructions: –Divergence

Lexical choice Homonymy/Polysemy: bank, run Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … Coding (Concept  lexeme mapping) differences: –More distinction in one language: e.g., kinship vocabulary. –Different division of conceptual space:

Major approaches Transfer-based Interlingua Example-based (EBMT) Statistical MT (SMT) Hybrid approach

The MT triangle word Word Meaning Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT (interlingua) Analysis Synthesis

Comparison of resource requirement Transfer- based InterlinguaEBMTSMT dictionary+++ Transfer rules + parser+++ (?) semantic analyzer + parallel data++ othersUniversal representation Generator thesaurus

Evaluation Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. Human evaluation: accuracy, fluency, … –Problem: expensive, slow, subjective, non-reusable. Automatic measures: –Edit distance –Word error rate (WER), Position-independent WER (PER) –Simple string accuracy (SSA), Generation string accuracy (GSA) –BLEU

Major approaches

Word-based SMT IBM Models 1-5 Main concepts: –Source channel model –Hidden word alignment –EM training

Source channel model for MT Eng sent Noisy channel Fr sent P(E)P(F | E) Two types of parameters: Language model: P(E) Translation model: P(F | E)

Modeling p(F | E) with alignment

Modeling Parameters: Length prob: P(m | l) Translation prob: t(f j | e i ) Distortion prob (for Model 2): d(i | j, m, l) Model 1: Model 2:

Training Model 1:

Finding the best alignment Given E and F, we are looking for Model 1:

Clump-based SMT The unit of translation is a clump. Training stage: –Word alignment –Extracting clump pairs Decoding stage: –Try all segmentations of the src sent and all the allowed permutations –For each src clump, try TopN tgt clumps –Prune the hypotheses

Transfer-based MT Analysis, transfer, generation: –Example: (Quirk et al., 2005) 1.Parse the source sentence 2.Transform the parse tree with transfer rules 3.Translate source words 4.Get the target sentence from the tree Translation as parsing: –Example: (Wu, 1995)

Hybrid approaches Preprocessing with transfer rules: (Xia and McCord, 2004), (Collins et al, 2005) Postprocessing with taggers, parsers, etc: JHU 2003 workshop Hierarchical phrase-based model: (Chiang, 2005) …

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Similar presentations

Presentation on theme: "Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Similar presentations

Presentation on theme: "Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics."— Presentation transcript:

Similar presentations

About project

Feedback