Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Directions in Machine Translation Introduction 陳惠群 中央研究院 語言所 / 資訊所.

Similar presentations


Presentation on theme: "New Directions in Machine Translation Introduction 陳惠群 中央研究院 語言所 / 資訊所."— Presentation transcript:

1 New Directions in Machine Translation Introduction 陳惠群 中央研究院 語言所 / 資訊所

2 10/22/1998 2 Why MT Matters? Economics –Costs  / Quality  / Turnaround  –Many MT developers, customers, and sponsors have already invested a lot for years. Politics –Multi-lingual Countries / Minority Languages Intelligence Gathering –Governments / Companies / Individuals Research –AI / CS / Linguistics / Psychology / and so on

3 10/22/1998 3 Recent Trends PC-based MT Systems Online MT Services, MT on Demand –Email, Web pages, Uploads Sub-language MT Systems Dialog-based (Speech-to-Speech) MT Systems Computer-Assisted Translation

4 10/22/1998 4 Classifying MT Systems Operations Fully-Automatic MT Semi-automatic MT Computer-Assisted Translation (CAT-Tools) Input Unrestricted Texts Restricted Texts (e.g.Technical Manuals) / MT in mind Sub-languages / Controlled languages Quality High / Low / Acceptable / Applicable / Readable How to evaluate a MT system? Strategies (see next page)

5 10/22/1998 5 MT Strategies Fundamentals Direct Translation MT Transfer-based MT Interlingua MT Linguists vs. Empiricists New Strategies Knowledge-based MT Example-based MT Statistics-based MT Hybrid MT –Japanese manufacturers know well that a single linguistic theory cannot lead to a good MT system. They realize that a huge amount of language phenomena must be processed in an ad-hoc manner. (M. Nagao)

6 10/22/1998 6 Direct MT  Simple syntactic analysis (disambiguation)  Bilingual lexicon (word-by-word translation)  Re-ordering rules Source Text Target Text

7 10/22/1998 7 Transfer-based MT  SL-TL lexicon & transfer rules  ST analysis Source Text (ST) Target Text (TT)  structure transfer  TT generation TT Structure ST Structure  SL grammar & lexicon  TL grammar & lexicon SL - source language; TL - target language

8 10/22/1998 8 Interlingua-based MT  ST analysis Source Text (ST) Target Text (TT) Interlingua representation (+SL-TL lexicon)  TT generation  SL grammar & lexicon  TL grammar & lexicon

9 10/22/1998 9 Knowledge-based MT All world knowledge? A long-term research Practical Systems: e.g. CMU’s KANT –narrow domain –domain model: defines all semantic classes and instances to represent all concepts in the domain –each concept definition includes: concept head (name of the concept) slots: allowable semantic roles fillers: allowable concept classes that the roles can contain –disambiguation by filler restriction –knowledge acquisition automatic or semi-automatic

10 10/22/1998 10 Example-based MT A companion module to improve MT quality Typically include the following (Nirenburg 1995): –sentence-aligned corpus –intra-language matching find chunks from source language part of the corpus which are best candidates for matching an input chunk –inter-language matching find the target language chunk corresponding to the chunk from the source language part of the corpus –chunk-combination The PANGLOSS Mark III Machine Translation System. S. Nirenburg, Technical Report CMU-CMT-95-145. 1995. (available online at http://www.lti.cs.cmu.edu/Research/CMT-home.html)

11 10/22/1998 11 Statistics-based MT(1) Maximize Pr(S|T) = Pr(S) Pr(T|S) / Pr(T) Pr(S): source language model Pr(T|S): translation model –lexical translation, distortion, and fertility Some comments: (Machine Translation 7:(4)) –I joined the attack … without realizing that precisely what the research was doing was to question some of the fundamental assumptions underlying MT research since 1966 … With hindsight, I can see that what this research was doing was saying that in the 20 years since ALPAC, the second generation architecture had led to only slightly better results than the architecture it replaced … (Harold Somers) –My initial reaction was the same as Somers. … The integration of a CANDIDE-type engine into a traditional MT architecture should probably at the deepest level the architecture allows (John White)

12 10/22/1998 12 Statistics-based MT(2) Machine Translation 7:(4) –...not only does it need no linguistics or linguists, but no foreign speakers either.... about 43% of sentences correctly translated. That compares badly with SYSTRAN which is usually assigned figures of around 65% … even if it did equal SYSTRAN’s level of performance, it is not clear what inferences we should draw. … we must always remember that they need millions of words of parallel texts even to start … The problems noted then were of long-distance dependencies: … French and English … were a lucky choice … we have good historical reasons for believing that a purely statistical method cannot do high-quality MT (Yorick Wilks) Word alignment

13 10/22/1998 13 Evaluation Traditional Evaluation Metrics (Church & Hovy) –System-based Metrics –easy to measure, but only for a particular system –e.g. 60 sub-grammars, 900 rewriting rules, … –Text-based Metrics sentence-based metrics –e.g. # of semantically or syntactically correct sentences compressibility metrics amount of post-editing metrics –Cost-based Metrics: cost & time (per N words) –Demos (must avoid misleading) Developer’s view or Customer’s view

14 10/22/1998 14 Some MT Problems Morphological ambiguity Lexical ambiguity and structural ambiguity Lexical mismatch and structural mismatch Idioms and collocations Ill-formed input World knowledge

15 10/22/1998 15 CAT Tools Pre-editing and post-editing environments with linguistic analyses Translation Memory –As the translator translates the text, each sentence (translation unit) is also saved automatically to a sophisticated translation unit database memory. As he translates, any similar sentence already in the memory will appear on screen for editing.(Ian Gordon) Alignment Tools Terminology Management

16 10/22/1998 16 Standards Exchange Standard –(Multilingual) Text Formats –Lexicons –Knowledge Bases –Translation Memories Evaluation Standard

17 10/22/1998 17 Future Direction Exploratory Research or Prototype Research? Modular Design (cf. Somers’ Comments) Better Linguistic Theories Lexicon Construction Hybrid MT (Mainline MT engine + Additional Modules) Spoken Language (Dialog-based) MT MT Evaluation Computer-Assisted Translation / User-Friendly Environment Sub-languages MT Systems Distributed MT / Networked MT MT on Demand

18 10/22/1998 18 References –Journal of Machine Translation (Kluwer) –Proceedings of TMI, MT Summit, AMTA –Proceedings of ACL, COLING, ROCLING –E-Print Archive http://xxx.lanl.gov/cmp-lg/ –AAMT http://www.jeida.or.jp/aamt/index-e.html –EAMT http://www.lim.nl/eamt/ –The Association for Computational Linguistics http://www.cs.columbia.edu/~acl/ –The LINGUIST List http://www.linguistlist.org/ –Translation Research Group http://www.ttt.org/index.html –Localization Industry Standards Association (LISA) http://www.lisa.unige.ch/

19 10/22/1998 19 References –ISI @ USC http://www.isi.edu/natural-language/nlp-at-isi.html –CMU/LTI http://www.lti.cs.cmu.edu/Research/CMT-home.html –Verbmobil http://www.dfki.de/verbmobil/ –C-STAR II http://www.is.cs.cmu.edu/cstar/ –GETA http://durian.imag.fr/ –Machine Translation at PAHO (ACG/T) http://www.paho.org/english/machine.htm –METEO http://padina.info.umoncton.ca/chandioux/meteoe.html –WordNet Bibliography http://www.cis.upenn.edu/~josephr/wn-biblio.html

20 10/22/1998 20 References –Globalink, Inc. http://www.globalink.com/ –SYSTRAN http://www.systransoft.com/ –Logos Corporation http://www.logos-ca.com/ –TRADOS http://www.trados.com/ –A.I.SOFT http://www.aisoft.co.jp/ –CSK Home Page http://www.csk.co.jp/home_e.html –SHARP SOFT http://www.sharp.co.jp/sc/excite/soft_map/soft.htm –OKI Software http://www.okisoft.co.jp/ –KODENSHA http://www1.mesh.ne.jp/KODENSHA/ –ASTRANSAC http://eiplaza.toshiba.co.jp/products/transac/


Download ppt "New Directions in Machine Translation Introduction 陳惠群 中央研究院 語言所 / 資訊所."

Similar presentations


Ads by Google