Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited.

Similar presentations


Presentation on theme: "Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited."— Presentation transcript:

1 Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited Lecture CS 4705: Introduction to Natural Language Processing Fall 2004

2 Sounds like Faulkner? http://www.ee.ucla.edu/~simkin/sounds_like_faulkner.html It lay on the table a candle burning at each corner upon the envelope tied in a soiled pink garter two artificial flowers. Not hit a man in glasses. It was once a shade, which was in all beautiful weather under a tree and varied like the branches in the wind. William Faulkner, "The sound and the fury“ Es war einmal ein Schatten, der lag bei jedem schönen Wetter unter einem Baum und schwankte wie die Zweige im Wind. Helmut Wördemann, "Der unzufriedene Schatten“ (Translated by Systran)  Faulkner  Machine Translation  Faulkner  Machine Translation

3 Progress in MT Statistical MT example Form a talk by Charles Wayne, DARPA 2002 2003 Human Translation insistent Wednesday may recurred her trips to Libya tomorrow for flying Cairo 6-4 ( AFP ) - an official announced today in the Egyptian lines company for flying Tuesday is a company " insistent for flying " may resumed a consideration of a day Wednesday tomorrow her trips to Libya of Security Council decision trace international the imposed ban comment. Egyptair Has Tomorrow to Resume Its Flights to Libya Cairo 4-6 (AFP) - said an official at the Egyptian Aviation Company today that the company egyptair may resume as of tomorrow, Wednesday its flights to Libya after the International Security Council resolution to the suspension of the embargo imposed on Libya. Egypt Air May Resume its Flights to Libya Tomorrow Cairo, April 6 (AFP) - An Egypt Air official announced, on Tuesday, that Egypt Air will resume its flights to Libya as of tomorrow, Wednesday, after the UN Security Council had announced the suspension of the embargo imposed on Libya.

4 Road Map Why Machine Translation (MT)? Multilingual Challenges for MT MT Approaches MT Evaluation

5 Why (Machine) Translation? Languages in the world 6,800 living languages 600 with written tradition 95% of world population speaks 100 languages Translation Market $8 Billion Global Market Doubling every five years (Donald Barabé, invited talk, MT Summit 2003)

6 Why Machine Translation? Full Translation –Domain specific Weather reports Machine-aided Translation –Translation dictionaries –Translation memories –Requires post-editing Cross-lingual NLP applications –Cross-language IR –Cross-language Summarization

7 Road Map Why Machine Translation (MT)? Multilingual Challenges for MT –Orthographic variations –Lexical ambiguity –Morphological variations –Translation divergences MT Paradigms MT Evaluation

8 Multilingual Challenges Orthographic Variations –Ambiguous spelling كتب الاولاد اشعارا كَتَبَ الأوْلادُ اشعَاراً – Ambiguous word boundaries Lexical Ambiguity –Bank  بنك (financial) vs. ضفة(river) –Eat  essen (human) vs. fressen (animal)

9 Multilingual Challenges Morphological Variations Affixation vs. Root+Pattern write  writtenكتب  مكتوب kill  killedقتل  مقتول do  doneفعل  مفعول conj noun pluralarticle Tokenization And the cars  and the cars والسيارات  w Al SyArAt Et les voitures  et le voitures

10 Multilingual Challenges Translation Divergences How languages map semantics to syntax 35% of sentences in TREC El Norte Corpus (Dorr et al 2002) Divergence Types –Categorial (X tener hambre  X be hungry) [98%] –Conflational (X dar puñaladas a Z  X stab Z) [83%] –Structural (X entrar en Y  X enter Y)[35%] –Head Swapping (X cruzar Y nadando  X swim across Y)[8%] –Thematic (X gustar a Y  Y like X)[6%]

11 لست هنا I-am-not here be Ihere I am not here not ليس ا ناهنا Translation Divergences conflation Je ne suis pas ici I not be not here etre Jeicinepas

12 * ا نابردان * קרל انا بردان I cold be Icold I am coldקר לי cold for-me אני Translation Divergences categorial, thematic and structural tener Yofrio tengo frio I-have cold

13 swim I quickly across river I swam across the river quickly Translation Divergences head swap and categorial اسرع اناسباحةعبور نهر اسرعت عبور النهر سباحة I-sped crossing the-river swimming

14 swim I quickly across river I swam across the river quickly Translation Divergences head swap and categorial חצה אניבאת נהר ב שחיהמהירות חציתי את הנהר בשחיה במהירות I-crossed obj river in-swim speedily

15 Translation Divergences head swap and categorial חצה אניבאת נהר ב שחיהמהירות اسرع اناسباحةعبور نهر swim I quickly across river no un pre p ver b no un adve rb ver b no un ver b no un

16 Translation Divergences Orthography+Morphology+Syntax 妈妈的车 mama de che car mom possessed-by mom’s car سيارة ماما sayyArat mama la voiture de maman

17 Road Map Why Machine Translation (MT)? Multilingual Challenges for MT MT Approaches –Gisting / Transfer / Interlingua –Statistical / Symbolic / Hybrid –Practical Considerations MT Evaluation

18 MT Approaches MT Pyramid Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration Gisting

19 MT Approaches Gisting Example Sobre la base de dichas experiencias se estableció en 1988 una metodología. Envelope her basis out speak experiences them settle at 1988 one methodology. On the basis of these experiences, a methodology was arrived at in 1988.

20 MT Approaches MT Pyramid Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration GistingTransfer

21 MT Approaches Transfer Example Transfer Lexicon –Map SL structure to TL structure  poner X mantequilla en Y :obj :mod:subj :obj butter X Y :subj:obj X puso mantequilla en YX buttered Y

22 MT Approaches MT Pyramid Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration GistingTransferInterlingua

23 MT Approaches Interlingua Example: Lexical Conceptual Structure (Dorr, 1993)

24 MT Approaches MT Pyramid Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration Interlingua Gisting Transfer

25 MT Approaches MT Pyramid Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration Interlingual Lexicons Dictionaries/Parallel Corpora Transfer Lexicons

26 MT Approaches Statistical vs. Symbolic Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration

27 MT Approaches Noisy Channel Model Portions from http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

28 MT Approaches IBM Model (Word-based Model) http://www.clsp.jhu.edu/ws03/preworkshop/lecture_yamada.pdf

29 Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration MT Approaches Statistical vs. Symbolic vs. Hybrid

30 Source word Source syntax Source meaningTarget meaning Target syntax Target word AnalysisGeneration MT Approaches Statistical vs. Symbolic vs. Hybrid

31 MT Approaches Hybrid Example: GHMT Generation-Heavy Hybrid Machine Transaltion Lexical transfer but NO structural transfer  poner Maria mantequilla en pan :obj :mod:subj :obj lay locate place put render set stand Maria butter bilberry on in into at bread loaf :obj :mod:subj :obj Maria puso la mantequilla en el pan.

32 MT Approaches Hybrid Example: GHMT LCS-driven Expansion Conflation Example Goal BUTTER V MARIA BREAD Agent Goal PUT V BUTTER N Theme Agent MARIA BREAD [CAUSE GO] CategorialVariation

33 MT Approaches Hybrid Example: GHMT Structural Overgeneration put Maria butteron bread lay Maria butterat loaf render Maria butter into loaf butter Maria bread Maria butter …

34 Structural N-gram Model –Long-distance –Lexemes Surface N-gram Model –Local –Surface-forms John buy MT Approaches Hybrid Example: GHMT Target Statistical Resources car ared Johnboughtcarared

35 MT Approaches Hybrid Example: GHMT Linearization &Ranking Maria buttered the bread -47.0841 Maria butters the bread -47.2994 Maria breaded the butter -48.7334 Maria breads the butter -48.835 Maria buttered the loaf -51.3784 Maria butters the loaf -51.5937 Maria put the butter on bread -54.128

36 MT Approaches Practical Considerations Resources Availability –Parsers and Generators Input/Output compatability –Translation Lexicons Word-based vs. Transfer/Interlingua –Parallel Corpora Domain of interest Bigger is better Time Availability –Statistical training, resource building

37 MT Approaches Resource Poverty No Parser? No Translation Dictionary? Parallel Corpus Align with rich language Extract dictionary Parse rich side Infer parses Build a statistical parser

38 Road Map Why Machine Translation (MT)? Multilingual Challenges for MT MT Approaches MT Evaluation

39 More art than science Wide range of Metrics/Techniques –interface, …, scalability, …, faithfulness,... space/time complexity, … etc. Automatic vs. Human-based –Dumb Machines vs. Slow Humans

40 MT Evaluation Metrics System-based Metrics Count internal resources: size of lexicon, number of grammar rules, etc. –easy to measure –not comparable across systems –not necessarily related to utility (Church and Hovy 1993)

41 MT Evaluation Metrics Text-based Metrics –Sentence-based Metrics Quality: Accuracy, Fluency, Coherence, etc. 3-point scale to 100-point scale –Comprehensibility Metrics Comprehension, Informativeness, x-point scales, questionnaires most related to utility hard to measure

42 MT Evaluation Metrics Text-based Metrics (cont’d) –Amount of Post-Editing number of keystrokes per page not necessarily related to utility Cost-based Metrics –Cost per page –Time per page

43 Human-based Evaluation Example Accuracy Criteria

44 Human-based Evaluation Example Fluency Criteria

45 Fluency vs. Accuracy Accuracy Fluency conMT FAHQ MT Prof. MT Info. MT

46 Automatic Evaluation Example Bleu Metric Bleu –BiLingual Evaluation Understudy (Papineni et al 2001) –Modified n-gram precision with length penalty –Quick, inexpensive and language independent –Correlates highly with human evaluation –Bias against synonyms and inflectional variations

47 Test Sentence colorless green ideas sleep furiously Gold Standard References all dull jade ideas sleep irately drab emerald concepts sleep furiously colorless immature thoughts nap angrily Automatic Evaluation Example Bleu Metric

48 Test Sentence colorless green ideas sleep furiously Gold Standard References all dull jade ideas sleep irately drab emerald concepts sleep furiously colorless immature thoughts nap angrily Unigram precision = 4/5 Automatic Evaluation Example Bleu Metric

49 Test Sentence colorless green ideas sleep furiously Gold Standard References all dull jade ideas sleep irately drab emerald concepts sleep furiously colorless immature thoughts nap angrily Unigram precision = 4 / 5 = 0.8 Bigram precision = 2 / 4 = 0.5 Bleu Score = (a 1 a 2 …a n ) 1/n = (0.8 ╳ 0.5) ½ = 0.6325  63.25 Automatic Evaluation Example Bleu Metric


Download ppt "Machine Translation: Challenges and Approaches Nizar Habash Post-doctoral Fellow Center for Computational Learning Systems Columbia University Invited."

Similar presentations


Ads by Google