Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical modelling of MT output corpora for Information Extraction.
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Compilers and Language Translation
CSC 313 – Advanced Programming Topics. Today’s Goal  Make you forget reading that was assigned  I went back & reviewed others; none like this one 
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Bilingual Dictionaries
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
Machine Translation Anna Sågvall Hein Mösg F
Software Quality Metrics
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Quantitative Evaluation of Machine Translation Systems: Sentence Level Palmira Marrafa António Ribeiro.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
© Copyright 2011 John Wiley & Sons, Inc.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Research methods in corpus linguistics Xiaofei Lu.
MACHINE TRANSLATION A precious key to communicate beyond linguistic barriers 1.
How to Give Effective Written Feedback
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
COP4020 Programming Languages
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Querying Across Languages: A Dictionary-Based Approach to Multilingual Information Retrieval Doctorate Course Web Information Retrieval Speaker Gaia Trecarichi.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
Leveraging Reusability: Cost-effective Lexical Acquisition for Large-scale Ontology Translation G. Craig Murray et al. COLING 2006 Reporter Yong-Xiang.
Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Introduction to Algorithms. What is Computer Science? Computer Science is the study of computers (??) This leaves aside the theoretical work in CS, which.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
GUIDE : PROF. PUSHPAK BHATTACHARYYA Bilingual Terminology Mining BY: MUNISH MINIA (07D05016) PRIYANK SHARMA (07D05017)
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
Semantics In Text: Chapter 3.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
The Functions and Purposes of Translators Syntax (& Semantic) Analysis.
A Metrics Program. Advantages of Collecting Software Quality Metrics Objective assessments as to whether quality requirements are being met can be made.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Intermediate 2 Computing Unit 2 - Software Development.
FIDELITY IN TRANSLATION AND INTERPRETATION PLAN 1.Fidelity as a phenomenon in translation 2.Verbalizing a simple idea 3.Principles of fidelity 3.1. Primary.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Institute of Informatics & Telecommunications NCSR “Demokritos” Spidering Tool, Corpus collection Vangelis Karkaletsis, Kostas Stamatakis, Dimitra Farmakiotou.
Objective of the course Understanding the fundamentals of the compilation technique Assist you in writing you own compiler (or any part of compiler)
Lecture IV. Basic Translation Theories Plan 1. The Transformational Approach 2. The Denotative Approach 3. The Communicational Approach.
Compiler Design (40-414) Main Text Book:
عمادة التعلم الإلكتروني والتعليم عن بعد
Statistical NLP: Lecture 13
CS 430: Information Discovery
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Algorithm Discovery and Design
Chapter 10: Compilers and Language Translation
Presentation transcript:

Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris

Context : evaluation of a translation made by an MT System source text : 604.rtf : corpus INRA – corpus biotechnologique sur la reproduction chez l’animal Source language : French Target language : English Available tools : French / English MT System French and English index of all specific words Original Indexes are not aligned Object : set a methodology for non interactive machine translation evaluation Goal of translation : simple understanding of original message (veille) Evaluating an MT French / English System

Evaluation procedure: Prequisites: We cannot carry out verification because we do not have the system specifications We can only carry out evaluation from user point of view We have no reference translation, no gold standard. French speaker intuitive correctness is our standard Evaluating an MT French / English System

Numeric data: Number of words in the source text = 562 Number of unknown words for MT system = 35 Number of NPs in source text and target text Number of VPs in source text and target text Question : should source and target number of NPs and VPs necessarily be equal ? Let us assume that this is the case and check Evaluating an MT French / English System Basis of the evaluation : Gather numeric relevant data in order to identify Problems and incorrections that can be analyzed later On big corpora numeric automated evaluations are the only Are the only efficient way to identify bugs and possible theoretical Weaknesses of a system.

Correction rate = grammatical correctness As will be seen in further slides… Evaluation criteria Informativeness (defined as characteristics of the translation process – output characteristics - quality of translation - quality of a text as a whole) Is the text understandable ?

General language word level : corresponds to two categories (simple lexical morphemes or simple grammatical words. Polysemous words resolution : does the system suggest the right equivalent Segmentation problems Fluency problems (non idiomatic expressions – will be explained but no numeric data given because we assume that MT goal in this case is limited to information Basic criteria to work out informativeness rates:

Metrics Informativeness Rates should be calculated for each criterion For each criterion work out: The total number of words corresponding to this criterion in source and target text and work out precision levels Correction rate Detailed in next slide along with results… Calculate an average of precision rates assigned to all criteria to obtain general informativeness rate Adequacy will be an average of correction rate + informativeness

Generate numeric data to calculate correction rate

Input data to calculate MT system correction rate After a previous source and target text tagging with LATL bilingual parser

1- (Number of target NP – source NPs ) ________________________________ Number of source NPs Calculating MT system correction rate

Further work (when back to Paris…) Checking grammatical correctness Wherever there appears a difference between number of NPs and VPs Finer grained analysis that includes adjectives Try to give a clear diagnostic of any problem (semantic or syntactic) Generate adequacy rates along with analysis

Thank you very much… Now Widad will tell you how to generate informativeness data…