Machine Translation MÖSG vt 2004 Anna Sågvall Hein.

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Machine Translation II How MT works Modes of use.
Improving Machine Translation Quality with Automatic Named Entity Recognition Bogdan Babych Centre for Translation Studies University of Leeds, UK Department.
What is Word Study? PD Presentation: Union 61 Revised ELA guide Supplement (and beyond)
1 Minimally Supervised Morphological Analysis by Multimodal Alignment David Yarowsky and Richard Wicentowski.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Day 1 Punctuation and Capitalization
Anna Sågvall Hein, GSLT, January 2003 A grammar rule SVE.GRAM CL.IMP :=: 'CL, :=: 'IMP, = 'VERB, :=: 'VERB, = 'IMP, :=:, :=:, :=:, ADVANCE,
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
@ Anna Sågvall Hein 2005 An example of a good translation En inbyggd oljepump levererar olja under tryck både till hydraulsystemet och växellådans oljesystem.
Anna Sågvall Hein, GSLT, January 2003 Direct translation no intermediary sentence structure translation proceeds in a number of steps, each step dedicated.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
Motivations for transfer-based translation lexical ambiguity structural differences See further Ingo 91.
Computer support for second language learners’ free text production -Initial Studies- O. Knutsson, T. Cerratto Pargman & K. Severinson Eklundh Royal Institute.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
Procedural Writing Writing a How-To Paper.
An interactive environment for creating and validating syntactic rules Panagiotis Bouros*, Aggeliki Fotopoulou, Nicholas Glaros Institute for Language.
PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment Natalya Fridman Noy and Mark A. Musen.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Creation of a Russian-English Translation Program Karen Shiells.
An innovative platform to allow translation and indexing of internet sites Localization World
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Intuitive Coding of the Arabic Lexicon Ali Farghaly & Jean Senellart SYSTRAN Software Corporation San Diego, CA & Soisy, France.
Machine translation Context-based approach Lucia Otoyo.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
2016 National Curriculum Tests and Assessments KS2 Mary Regan Senior School Improvement Officer.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
Lecture 12 Applications and demos. Building applications Previous lectures have discussed stages in processing: algorithms have addressed aspects of language.
Error Correction: For Dummies? Ellen Pratt, PhD. UPR Mayaguez.
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
The Road to Literacy Development Native English Speakers vs. ELLs.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
INFORMATION FOR PARENTS AUTUMN 2014 SPELLING, PUNCTUATION AND GRAMMAR.
TYPES OF TRANSLATION.
Approaches to Machine Translation
Year 6 Objectives: Writing
Welcome to miss frey’s 2nd grade classroom
Transfer-based translation
Approaches to Machine Translation
Presentation transcript:

Machine Translation MÖSG vt 2004 Anna Sågvall Hein

@Anna Sågvall Hein, MÖSG 2004 Can computers translate? Not a simple yes or no depends on the text the purpose of the translation the required quality

@Anna Sågvall Hein, MÖSG 2004 Classical problems with MT unrealistic expectations bad translations difficulties in integrating MT in the work flow –the Ericsson case

@Anna Sågvall Hein, MÖSG 2004 What is MT proper? To be considered as MT, a system should provide mininally correct morphology minimal syntactic processing minimal semantic processing handle and produce full sentences Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories ( chins/IAMTcert.htm) chins/IAMTcert.htm

@Anna Sågvall Hein, MÖSG 2004 Basic translation strategies direct translation transfer-based translation statistical translation combined strategies

@Anna Sågvall Hein, MÖSG 2004 Direct translation, 1 no intermediary sentence structure the most important language component is a translation dictionary translation proceeds mostly word by word, or phrase by phrase translation problems are handled more or less case by case by means of specific rules

@Anna Sågvall Hein, MÖSG 2004 Direct translation, 2 quality –typically browsing quality –depends on the quality of the translation dictionary the coverage of the translation rules –editing quality may be achieved problems with –ambiguity –inflection –word order –structural differences

@Anna Sågvall Hein, MÖSG 2004 Advanced classical approach (Tucker 1987) source text dictionary lookups and morphological analysis identification of homographs identification of compounds identification of nouns and verb phrases processing of idioms

@Anna Sågvall Hein, MÖSG 2004 Advanced approach, cont. processing of prepositions subject-predicate identification syntactic ambiguity identification synthesis and morphological processing of target text rearrangement of words and phrases in target text

@Anna Sågvall Hein, MÖSG 2004 Feasibility of the direct translation strategy Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a sentence grammar and an intermediary structure?

@Anna Sågvall Hein, MÖSG 2004 SYSTRAN SYStem TRANslation developped in the US by Peter Toma first version 1969 (Ru-En) EC bought the rights of Systran in 1976 Systran SA, France, is the current owner of the rights of Systran currently 18 language pairs, excl. Swedish Swedish-->English is being introduced, starting in June 2004 (

@Anna Sågvall Hein, MÖSG 2004 Systran, cont. more than 1,600,000 dictionary units 20 domain dictionaries daily use by EC translators, administrators of the European institutions originally a direct translation strategy –see H&S to-day more of a transfer-based strategy

@Anna Sågvall Hein, MÖSG 2004 Ex. 1: fairly good translation /Systran sv-en "Enskilda företagare som inte bildat bolag klassificeras hit." "Individual entrepreneurs that have not formed companies are classified here.” Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats.

@Anna Sågvall Hein, MÖSG 2004 Ex. 2: word order problem/ Systran sv-en "När byarna kontaktades hade de inte ens utsatts för influensa." "When the villages were contacted had they not even been exposed to flu.” Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd.

@Anna Sågvall Hein, MÖSG 2004 Ex. 3: ambiguity problem/ Systran sv-en "Vad kan vi lära av Arrawetestammen?" "What can we faith of the Arawete?” Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb.

@Anna Sågvall Hein, MÖSG 2004 Ex. 4: ambiguity problem/ Systran sv-en ”Extrapoleringen går till så här. " ”The extrapolation goes to so here.” Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord.

@Anna Sågvall Hein, MÖSG 2004 Motivations for transfer-based translation lexical ambiguity structural differences See further Ingo 91 (6), Wikholm (89)

@Anna Sågvall Hein, MÖSG 2004 Transfer-based translation,1 intermediary sentence structure provides a basis for the systematic handling of grammatical problems and lexical choices basic processes –analysis –transfer –generation (synthesis)

@Anna Sågvall Hein, MÖSG 2004 Transfer-based translation, 2 knowledge-intensive language modules –dictionary and grammar of source language –transfer dictionary and transfer rules –dictionary and grammar of target language

@Anna Sågvall Hein, MÖSG 2004 Multra transfer-based translation engine high quality focus on restricted domains developped at Uppsala University

@Anna Sågvall Hein, MÖSG 2004

Multra formalisms intermediary structure –feature structure grammatical function & constituency analysis grammar –procedural transfer –unification based (Beskow 93) synthesis –PATR-like style (Beskow 93)

@Anna Sågvall Hein, MÖSG 2004 Simplistic approach sentence splitting tokenisation handling capital letters dictionary look-up and lexical substitution copying unknown words, digits, signs of punctuation etc. formal editing

@Anna Sågvall Hein, MÖSG 2004 Ex. 1: Multra Sv. I oljefilterhållaren sitter en överströmningsventil.  En. The oil filter retainer has an overflow valve. (from the Scania corpus) sitter  has adv  subj subj  obj

@Anna Sågvall Hein, MÖSG 2004 Ex. 2 Sv. Fyll på olja i växellådan.  En. Fill gearbox with oil. (from the Scania corpus) fyll på  fill obj  adv adv  obj

@Anna Sågvall Hein, MÖSG 2004 Ex. 3: Multra Detta filter ska bytas med jämna mellanrum.  This filter must be renewed at regular intervals. Lexical choices in the context ska - must byta –renew med - at jämna – regular mellanrum - interval

@Anna Sågvall Hein, MÖSG 2004 Ex. 4: Multra Beskrivningen gäller för automatväxellådor med beteckning ZF 4/HP500, 590 och 600.  The description applies to automatic gearboxes with the designations ZF 4/5HP500, 590 and 600. gäller – applies to beteckning – the designations

@Anna Sågvall Hein, MÖSG 2004 Feasibility of machine translation Re-use of translations Quality in relation to purpose Sublanguage Spell checked and grammar checked SL Controlled language Human machine interaction Evalution data and criteria

@Anna Sågvall Hein, MÖSG 2004 Re-use of previous translations translation memories translation dictionaries statistical machine translation

@Anna Sågvall Hein, MÖSG 2004 Re-use techniques,1 sentence alignment –linking source and target sentences pairwise –success rate close to 100 % –translation memories

@Anna Sågvall Hein, MÖSG 2004 Re-use techniques, 2 word alignment –linking sub-sentence segments, typically, source and target words and phrases pairwise –large-scale processing –success rate close to 80 % –translation dictionaries –statistical machine translation

@Anna Sågvall Hein, MÖSG 2004 A word alignment example Jag tar mittplatsen, som jag inte tycker om. I take the middle seat, which I dislike. jag – I tar – take mittplatsen – the middle seat som – which jag – I inte tycker om – dislike (from Tiedemann 2003)

@Anna Sågvall Hein, MÖSG 2004 Statistical machine translation large scale word alignment –raw translation dictionary direct translation using the dictionary –no translation rules smoothing the translation by means of a language model –statistically based decoding algorithm cruical arabic – english hindi - english

@Anna Sågvall Hein, MÖSG 2004 Quality publishing quality –high quality translation, good enough for publishing, typically, after inspection and minor editing browsing quality –low quality translation, comprehensible, typically, not good enough for editing and publishing, may contain grammatical errors, errors in word order, and wrong words

@Anna Sågvall Hein, MÖSG 2004 Translation purposes translation –publishing quality browsing –browsing quality gisting –browsing quality drafting –publishing/browsing quality? cross-language information retrieval –browsing quality

@Anna Sågvall Hein, MÖSG 2004 MT as a cross-language communication tool MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, September 2001 ( WJHutchins/MTS-2001.htm) WJHutchins/MTS-2001.htm

@Anna Sågvall Hein, MÖSG 2004 Restrictions on the input language –sublanguage text type domain – controlled language – spell checked – grammar checked

@Anna Sågvall Hein, MÖSG 2004 Typically general language – browsing quality restricted language – high quality

@Anna Sågvall Hein, MÖSG 2004 Spell checking and grammar checking If there are spelling errors or typos in the SL dictionary search will fail If there are grammatical errors in the SL grammatical analysis will fail Where and how should spell and grammar checking be accounted for? Before or during the process?

@Anna Sågvall Hein, MÖSG 2004 Controlled language controlled vocabulary –full lexical coverage, e.g. Scania Swedish controlled grammar –full grammatical coverage language checker –e.g. Scania Checker

@Anna Sågvall Hein, MÖSG 2004 Human intervention before –language checking during –e.g. ambiguity resolution after –post-editing

@Anna Sågvall Hein, MÖSG 2004 Evaluation of MT coverage (recall) quality (precision)

@Anna Sågvall Hein, MÖSG 2004 Current trends in direct translation re-use of translations –translation memories of sentences and sub-sentence units such as words, phrases and larger units –example-based translation –statistical translation Will re-use of translations overcome the problems with the direct translation approach that were discussed above? If so, how can the problems be handled?

@Anna Sågvall Hein, MÖSG 2004 Why machine translation? cheaper faster more consequent when it succeeds..

@Anna Sågvall Hein, MÖSG 2004 Assignment: Hable Con Ella (en-sv) Make a general quality assessment of the translation. Suggest a possible use of a translation of this kind. Identify the steps that were taken in the translation. Specify the translation errors that were made and discuss them. Suggest improvements in the framework of the direct translation strategy. Motivate them. Formalise them in a framework of your own choice. Discuss their general adequacy in the translation of Swedish to English.