New Directions in Machine Translation Introduction 陳惠群中央研究院語言所 / 資訊所.

Slides:

Advertisements

Similar presentations

26./27. Juni 2006 Saarbrücken Workshop on multilingual semantic annotation, Saarbrücken, 26/ Comments on Emanuele Pianta: Exploiting Parallel Texts.

Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China

Machine Translation II How MT works Modes of use.

A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven

Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.

How do we work in a virtual multilingual classroom? A virtual multilingual classroom with Moodle and Apertium Cultural and Linguistic Practices in the.

C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.

January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.

Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.

April 2004 TM RASMAT 2004 – Uppsala Business Needs and Practices Pierre-Yves Foucou CTO - SYSTRAN.

Towards an NLP `module’ The role of an utterance-level interface.

Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.

1 Session 1 Advantages and Disadvantages of Translation Technology (TT) - Historical development of translation technology - Focus on TM and MT (Theory.

Machine Translation Anna Sågvall Hein Mösg F

C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.

J. Turmo, 2006 Adaptive Information Extraction Summary Information Extraction Systems Multilinguality Introduction Language guessers Machine Translators.

EBMT1 Example Based Machine Translation as used in the Pangloss system at Carnegie Mellon University Dave Inman.

Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.

PSY 369: Psycholinguistics Some basic linguistic theory part3.

Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.

METIS-II: a hybrid MT system Peter Dirix Vincent Vandeghinste Ineke Schuurman Centre for Computational Linguistics Katholieke Universiteit Leuven TMI 2007,

C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.

1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

 Copyright 2009 Digital Enterprise Research Institute. All rights reserved Digital Enterprise Research Institute Ontologies & Natural Language.

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

An innovative platform to allow translation and indexing of internet sites Localization World

Machine Translation History of Machine Translation Difficulties in Machine Translation Structure of Machine Translation System Research methods for Machine.

MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.

Some extra stuff Semantic change that results in an antonym of the original word:Semantic change that results in an antonym of the original word: awful:

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.

Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.

Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.

Translation Studies 8. Research methods in Translation Studies Krisztina Károly, Spring, 2006 Sources: Károly, 2002; Klaudy, 2003.

Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.

Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka

The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini

FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.

Area Report Machine Translation Hervé Blanchon CLIPS-IMAG A Roadmap for Computational Linguistics COLING 2002 Post-Conference Workshop.

Natural Language Interfaces to Databases Meikiu Lo Gwen Ray October 29, 2003.

TRANSLATION & THE HIGH TECH INDUSTRY. INTRODUCTION Translation has been existing ever since mythology began, passed the prophethood, and now in modern.

Evolution of Machine Translation: systems and use John Hutchins [ homepages/WJHutchins] [

Chapter 10 Language and Computer English Linguistics: An Introduction.

Coping with Babel How to Localize XML. Designing for Localization Document design can seriously impact the costs of translation and localization. Remember.

Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,

Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.

인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.

Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.

Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.

February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.

Designing a Machine Translation Project Lori Levin and Alon Lavie Language Technologies Institute Carnegie Mellon University CATANAL Planning Meeting Barrow,

Introduction to the European Union. The European Union Foundation Purpose.

1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.

Towards a Translation Assessment Assistant Tom Cheesman.

Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Removing the Language Barrier Machine Translation And Digital Libraries.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.

Introduction to Machine Translation

LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.

Approaches to Machine Translation

Introduction to Machine Translation

Approaches to Machine Translation

Introduction to Machine Translation

Presentation transcript:

New Directions in Machine Translation Introduction 陳惠群中央研究院語言所 / 資訊所

10/22/ Why MT Matters? Economics –Costs  / Quality  / Turnaround  –Many MT developers, customers, and sponsors have already invested a lot for years. Politics –Multi-lingual Countries / Minority Languages Intelligence Gathering –Governments / Companies / Individuals Research –AI / CS / Linguistics / Psychology / and so on

10/22/ Recent Trends PC-based MT Systems Online MT Services, MT on Demand – , Web pages, Uploads Sub-language MT Systems Dialog-based (Speech-to-Speech) MT Systems Computer-Assisted Translation

10/22/ Classifying MT Systems Operations Fully-Automatic MT Semi-automatic MT Computer-Assisted Translation (CAT-Tools) Input Unrestricted Texts Restricted Texts (e.g.Technical Manuals) / MT in mind Sub-languages / Controlled languages Quality High / Low / Acceptable / Applicable / Readable How to evaluate a MT system? Strategies (see next page)

10/22/ MT Strategies Fundamentals Direct Translation MT Transfer-based MT Interlingua MT Linguists vs. Empiricists New Strategies Knowledge-based MT Example-based MT Statistics-based MT Hybrid MT –Japanese manufacturers know well that a single linguistic theory cannot lead to a good MT system. They realize that a huge amount of language phenomena must be processed in an ad-hoc manner. (M. Nagao)

10/22/ Direct MT  Simple syntactic analysis (disambiguation)  Bilingual lexicon (word-by-word translation)  Re-ordering rules Source Text Target Text

10/22/ Transfer-based MT  SL-TL lexicon & transfer rules  ST analysis Source Text (ST) Target Text (TT)  structure transfer  TT generation TT Structure ST Structure  SL grammar & lexicon  TL grammar & lexicon SL - source language; TL - target language

10/22/ Interlingua-based MT  ST analysis Source Text (ST) Target Text (TT) Interlingua representation (+SL-TL lexicon)  TT generation  SL grammar & lexicon  TL grammar & lexicon

10/22/ Knowledge-based MT All world knowledge? A long-term research Practical Systems: e.g. CMU’s KANT –narrow domain –domain model: defines all semantic classes and instances to represent all concepts in the domain –each concept definition includes: concept head (name of the concept) slots: allowable semantic roles fillers: allowable concept classes that the roles can contain –disambiguation by filler restriction –knowledge acquisition automatic or semi-automatic

10/22/ Example-based MT A companion module to improve MT quality Typically include the following (Nirenburg 1995): –sentence-aligned corpus –intra-language matching find chunks from source language part of the corpus which are best candidates for matching an input chunk –inter-language matching find the target language chunk corresponding to the chunk from the source language part of the corpus –chunk-combination The PANGLOSS Mark III Machine Translation System. S. Nirenburg, Technical Report CMU-CMT (available online at

10/22/ Statistics-based MT(1) Maximize Pr(S|T) = Pr(S) Pr(T|S) / Pr(T) Pr(S): source language model Pr(T|S): translation model –lexical translation, distortion, and fertility Some comments: (Machine Translation 7:(4)) –I joined the attack … without realizing that precisely what the research was doing was to question some of the fundamental assumptions underlying MT research since 1966 … With hindsight, I can see that what this research was doing was saying that in the 20 years since ALPAC, the second generation architecture had led to only slightly better results than the architecture it replaced … (Harold Somers) –My initial reaction was the same as Somers. … The integration of a CANDIDE-type engine into a traditional MT architecture should probably at the deepest level the architecture allows (John White)

10/22/ Statistics-based MT(2) Machine Translation 7:(4) –...not only does it need no linguistics or linguists, but no foreign speakers either.... about 43% of sentences correctly translated. That compares badly with SYSTRAN which is usually assigned figures of around 65% … even if it did equal SYSTRAN’s level of performance, it is not clear what inferences we should draw. … we must always remember that they need millions of words of parallel texts even to start … The problems noted then were of long-distance dependencies: … French and English … were a lucky choice … we have good historical reasons for believing that a purely statistical method cannot do high-quality MT (Yorick Wilks) Word alignment

10/22/ Evaluation Traditional Evaluation Metrics (Church & Hovy) –System-based Metrics –easy to measure, but only for a particular system –e.g. 60 sub-grammars, 900 rewriting rules, … –Text-based Metrics sentence-based metrics –e.g. # of semantically or syntactically correct sentences compressibility metrics amount of post-editing metrics –Cost-based Metrics: cost & time (per N words) –Demos (must avoid misleading) Developer’s view or Customer’s view

10/22/ Some MT Problems Morphological ambiguity Lexical ambiguity and structural ambiguity Lexical mismatch and structural mismatch Idioms and collocations Ill-formed input World knowledge

10/22/ CAT Tools Pre-editing and post-editing environments with linguistic analyses Translation Memory –As the translator translates the text, each sentence (translation unit) is also saved automatically to a sophisticated translation unit database memory. As he translates, any similar sentence already in the memory will appear on screen for editing.(Ian Gordon) Alignment Tools Terminology Management

10/22/ Standards Exchange Standard –(Multilingual) Text Formats –Lexicons –Knowledge Bases –Translation Memories Evaluation Standard

10/22/ Future Direction Exploratory Research or Prototype Research? Modular Design (cf. Somers’ Comments) Better Linguistic Theories Lexicon Construction Hybrid MT (Mainline MT engine + Additional Modules) Spoken Language (Dialog-based) MT MT Evaluation Computer-Assisted Translation / User-Friendly Environment Sub-languages MT Systems Distributed MT / Networked MT MT on Demand

10/22/ References –Journal of Machine Translation (Kluwer) –Proceedings of TMI, MT Summit, AMTA –Proceedings of ACL, COLING, ROCLING –E-Print Archive –AAMT –EAMT –The Association for Computational Linguistics –The LINGUIST List –Translation Research Group –Localization Industry Standards Association (LISA)

10/22/ References USC –CMU/LTI –Verbmobil –C-STAR II –GETA –Machine Translation at PAHO (ACG/T) –METEO –WordNet Bibliography

10/22/ References –Globalink, Inc. –SYSTRAN –Logos Corporation –TRADOS –A.I.SOFT –CSK Home Page –SHARP SOFT –OKI Software –KODENSHA –ASTRANSAC