TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities Belinda Maia, Diana Santos, Luís Sarmento & Anabela Barreiro.

Slides:



Advertisements
Similar presentations
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Advertisements

CHART or PICTURE INTEGRATING SEMANTIC WEB TO IMPROVE ONLINE Marta Gatius Meritxell González TALP Research Center (UPC) They are friendly and easy to use.
Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Modelling with expert systems. Expert systems Modelling with expert systems Coaching modelling with expert systems Advantages and limitations of modelling.
Machine Translation. Can you imagine working as a translator without the help of computer?
Machine Translation II How MT works Modes of use.
Introduction to Computational Linguistics
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Machine Translation Anna Sågvall Hein Mösg F
CSE111: Great Ideas in Computer Science Dr. Carl Alphonce 219 Bell Hall Office hours: M-F 11:00-11:
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
USP workshop Using the Corpógrafo Belinda Maia & Luís Sarmento PoloFLUP LINGUATECA.
CS 4705 Lecture 13 Corpus Linguistics I. From Knowledge-Based to Corpus-Based Linguistics A Paradigm Shift begins in the 1980s –Seeds planted in the 1950s.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Creation of a Russian-English Translation Program Karen Shiells.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
CSCI 347 – Data Mining Lecture 01 – Course Overview.
Invitation to Computer Science 5th Edition
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Linguistics, Pragmatics & Natural Grammar
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
A. Language : Language, as a matter of common knowledge, is the medium of communication through which we express our emotions ideas, feelings and thoughts.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Chapter 9 Describing Process Specifications and Structured Decisions
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Structure of Study Programmes
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Overview of technologies for translators and language service providers Belinda Maia University of Porto.
Structure of Study Programmes Bachelor of Computer Science Bachelor of Information Technology Master of Computer Science Master of Information Technology.
1 Computational Linguistics Ling 200 Spring 2006.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Evolution of Machine Translation: systems and use John Hutchins [ homepages/WJHutchins] [
Postgraduate Diploma in Translation Lecture 1 Computers and Language.
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Error Correction: For Dummies? Ellen Pratt, PhD. UPR Mayaguez.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Introduction to Computational Linguistics
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
Compiler Construction (CS-636)
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
1 STO A Lexical Database of Danish for Language Technology Applications Anna Braasch Center for Sprogteknologi Copenhagen SPINN Seminar, October 27, 2001.
 explain expected stages and patterns of language development as related to first and second language acquisition (critical period hypothesis– Proficiency.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Chapter 5 The Oral Approach.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
English-Lithuanian-English Lexicon Database Management System for MT Gintaras Barisevicius and Elvinas Cernys Kaunas University of Technology, Department.
1 January 31, Documenting Software William Cohen NCSU CSC 591W January 31, 2008.
TYPES OF TRANSLATION.
English-Korean Machine Translation System
Introduction to Machine Translation
Introduction to Machine Translation
Natural Language Processing
Presentation transcript:

TrAva – a tool for evaluating Machine Translation – pedagogical and research possibilities Belinda Maia, Diana Santos, Luís Sarmento & Anabela Barreiro - Linguateca

Why MT Matters Social reasons Political reasons Commercial importance Scientific and philosophical interest

Useful bibliography ARNOLD, D, BALKAN, L., LEE HUMPHREYS, R. Lee, MEIJER, S., & SADLER, S. (1994) Machine Translation - An Introductory guide. Manchester & Oxford : NCC Blackwell. ISBN 1-85554-217-X - or at: http://www.essex.ac.uk/linguistics/clmt/MTbook/.   COLE, Ron (ed) 1996 "Survey of the State of the Art in Human Language Technology" Chapter 8 - Multilinguality, at the Center for Spoken Language Understanding, Oregon.  http://cslu.cse.ogi.edu/HLTsurvey/ch8node2.html#Chapter8 MELBY, Alan K. 1995, The Possibility of Language: A discussion of the Nature of Language, with implications for Human and Machine Translation. Amsterdam: John Benjamins Pub. Co.

Machine Translation (MT) – a few dates 1947 Warren Weaver - his ideas led to heavy investment in MT  1959 Bar-Hillel - considers FAHQMT - FULLY AUTOMATIC HIGH QUALITY MACHINE TRANSLATION philosophically impossible 1964 ALPAC Report - on limitations of MT > withdrawal of funds late 1970s the CEC purchase of SYSTRAN and beginning of EUROTRA project. Upward trend in the 1970s and 1980s Today: MT technology - high-end versus low-end systems MT and the Internet

From Arnold et al 1995

From Arnold et al 1995

MT architectures – Arnold et al Direct architecture - simple grammatical rules + a large lexical and phrasal database Transfer architecture - more complex grammar with an underlying approach of transformational-generative theory + considerable research into comparative linguistics in the two languages involved  Interlingua architecture - L1 > a 'neutral language' (real, artificial, logical, mathematical..) > L2

Major Methods, Techniques and Approaches today Statistical vs. Linguistic MT assimilation tasks: lower quality, broad domains – statistical techniques predominate dissemination tasks: higher quality, limited domains – symbolic techniques predominate communication tasks: medium quality, medium domain – mixed techniques predominate Rule-based vs. Example-based MT Transfer vs. Interlingual MT Multi-Engine MT Speech-to-Speech Translation MLIM - Multilingual Information Management: Current levels and Future Abilities - report (1999) Chapter 4 at: http://www-2.cs.cmu.edu/~ref/mlim/chapter4.html

MT – present & future uses ‘Gist’ translation Ephemeral texts with tolerant users Human aided MT Domain specific Linear sentence structure Pre-edited text or ‘controlled language’ Post-editing Improvement of MT – particularly for restricted domains and registers

MT and the Human translator MT is less of a threat to the professional human translator than English MT can encourage people’s curiosity for texts in languages they do not understand > and lead to human translation MT can be a tool for the human translator Professional translators can learn to work with and train MT

PoloCLUP’s experiment Background Master’s seminar in Semantics and Syntax Wish to raise students’ awareness of the strengths and weaknesses of MT Wish to develop their interest in MT as a tool Need to improve their knowledge of linguistics. Availability of free MT online Automation of process provided by computer engineer

Phase 1 - METRA http://poloclup.linguateca.pt/ferramentas/metra/index.html Translation using 7 online MT programmes EN > PT PT > EN At present this tool is getting about 60 hits per day!

BOOMERANG http://poloclup.linguateca.pt/ferramentas/boomerang/index.html This tool submits a text for translation – and back-translation – and back-translation…. Until it reaches a fixed point This shows that the rules programmed for one language direction do not always correspond to the other language direction

EVAL > TrAva Informal class experiment led to a useful research tool Several versions of EVAL Different types of classification of input Different explanations of errors of output Production, correction and re-correction of procedure interesting

TrAva - procedure Online EN > PT MT using 4 MT systems: Free Translation Systran E T Server Amikai Researcher chooses area for analysis – e.g. ambiguity lexical and structural mismatches Homographs and polysemous lexical items   syntactic complexity multiword units: idioms and collocations   anaphora resolution

TrAva - procedure Selection of ‘genuine’ examples from BNC, Reuter’s corpus, newspapers etc. Possible ‘pruning’ of unnecessary text (some systems accept limited text) No deliberate attempt to confuse the system BUT: avoidance of repetitive ‘test suites’

TrAva - procedure Sentence submitted to TrAva MT results Researcher: Classifies part of sentence being examined in terms of the English lexicon or POS (BNC codes) Examines results Explains errors in terms of Portuguese grammar

Access to work done Researcher may access work done and review it Teacher / administrator can access student work and give advice FAQ

Present situation METRA and BOOMERANG are all free to use online at: http://poloclup.linguateca.pt/ferramentas TraVa is free to use online at: http://www.linguateca.pt/trava/ The corpus CorTA – over 1000 sentences + 4 MT versions available for consultation at: http://www.linguateca.pt/

Conclusions It has been a successful experiment It is useful pedagogically As linguistic analysis As appreciation of MT It has interesting theoretical implications > emphasis on ‘real’ sentences and recognition of interconnection of lexicon + syntax + context

Conclusions Further work needs to be done on the classifications E.g. the analysis of ‘error’ as ‘lexical choice’ needs to be able to combine with other possible reasons for error A lone researcher can use it to examine a restricted area BUT – a large team is needed to overhaul a system properly

Homographs and Polysemy Homographs = words with same spelling but different syntactic use Polysemy = words with same spelling, but different meaning according to use or context BUT – the difference is not as clear-cut as all that However – major problem for MT

Complex Noun Phrases DETerminante + ADJectivo + Nome DETerminante + ADJectivo Composto + ADJectivo + Nome DETerminante + ADVérbio (em –ly) + ADJectivo + ADJectivo + Nome

Lexical Bundles

EXAMPLES Now let us look at some examples: Homographs Polysemy Complex noun phrases Lexical bundles