Ariadna Font Llitjós March 10, 2004

Slides:



Advertisements
Similar presentations
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Advertisements

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Unit One: Parts of Speech
The Registration Experience Student Registration via Self-Service.
Name Resolution Domain Name System.
Nathaniel S. Good Aaron Krekelberg Usability and privacy: a study of Kazaa P2P file- sharing.
Diagnostic Pathfinder for Instructors. Diagnostic Pathfinder Local File vs. Database Normal operations Expert operations Admin operations.
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Carnegie Mellon Goal Recycle non-expert post-editing efforts to: - Refine translation rules automatically - Improve overall translation quality Proposed.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
An Introduction to Forms. The Major Steps of a MicroSoft Access Database  Tables  Queries  Forms  Macros  Reports  Modules On our road map, we are.
Programming Errors. Errors of different types Syntax errors – easiest to fix, found by compiler or interpreter Semantic errors – logic errors, found by.
SYNTAX 1 NOV 9, 2015 – DAY 31 Brain & Language LING NSCI Fall 2015.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
AVENUE: Machine Translation for Resource-Poor Languages NSF ITR
Developing affordable technologies for resource-poor languages Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September.
FROM BITS TO BOTS: Women Everywhere, Leading the Way Lenore Blum, Anastassia Ailamaki, Manuela Veloso, Sonya Allin, Bernardine Dias, Ariadna Font Llitjós.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
To my presentation about:  IELTS, meaning and it’s band scores.  The tests of the IELTS  Listening test.  Listening common challenges.  Reading.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Commas How to use commas correctly How to avoid comma splices &
Language Identification and Part-of-Speech Tagging
WP4 Models and Contents Quality Assessment
PhD at CSE: Overview CSE department offers Doctoral degree in the Computer Science (CS) or Computer Engineering areas (CpE) at both MS to PhD and BS to.
Testability.
Visual Basic.NET Windows Programming
Unit & District Tools Phase 1
PeerWise Student Instructions
Approaches to Machine Translation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Electromagnetism lab project
Learning Usage of English KWICly with WebLEAP/DSR
DATA INPUT AND OUTPUT.
Designing a Card Swipe Machine
Unit One: Parts of Speech
A Usability Study of a Language Centre Web Site
System Design Ashima Wadhwa.
Journal What are you an “expert” on or in? List at least three different things (and yes, you are an expert on something!)
Semantic Parsing for Question Answering
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Grammar.
LEVEL OF MEASUREMENT Data is generally represented as numbers, but the numbers do not always have the same meaning and cannot be used in the same way.
Statistical NLP: Lecture 13
Improving Written Communication: “To Do” Verb Phrase Problems
SAT Writing and Language/ACT English:
Lesson plans Introduction.
Grammar and Vocabulary Development
Lesson 5 Computer-Related Issues
The Five Stages of Writing
Screen Writing Brylee Huber.
Attributes of Information
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Loops CIS 40 – Introduction to Programming in Python
Approaches to Machine Translation
Practical Grammar Workplace Guide ENG/230
Statistical n-gram David ling.
Essentials of Oral Defense
Towards Interactive and Automatic Refinement of Translation Rules
CS246: Information Retrieval
Dr. Sampath Jayarathna Cal Poly Pomona
University of Illinois System in HOO Text Correction Shared Task
Evaluating Classifiers
The Writing Process.
Dr. Sampath Jayarathna Cal Poly Pomona
NAÏVE BAYES CLASSIFICATION
Programming Logic and Design Eighth Edition
Presentation transcript:

Ariadna Font Llitjós March 10, 2004 AVENUE's last component: Interactive and Automatic Refinement of Translation Rules Ariadna Font Llitjós March 10, 2004

The Translation Correction Tool Rule Refinement Module Today: the Interactive step The Translation Correction Tool English-Spanish user study (LREC’04 paper) Next week: the Automatic step Rule Refinement Module

AVENUE overview

Motivation In general: MTS output still requires post-editing Current systems do not recycle post-editing efforts back into the system Within AVENUE: Communities that speak a low-density language tend not to have computational linguists who can write translation grammars need to validate automatically learned transfer rules (RL module) General, 2nd point: - …., and thus MTS don’t improve beyond adding that specific corrected translation to the database.

Goal Simplify the correction task maximally Get naïve bilingual speakers to: accurately and minimally correct translations, and accurately classify MT errors

Ultimate goal Learn mapping between incorrect structures and correct structures. Ex: She saw  high woman She saw the tall woman

Spanish SLS: Ella vio a la mujer alta English TLS: She saw high woman Corrected TLS: She saw the tall woman MT error classification: missing determiner + wrong sense Blame assignment (NP rule that generated the direct object + selectional restrictions) Rule refinement: the Noun Phrase (NP) rule that generated the error: NP -> Adj N needs to be refined into 2 different cases: NP -> Det Adj N[sg] (the tall woman) NP -> (Det) Adj N[pl] ((the)? tall women)

Research questions What does minimally correction mean? How can we convey this to bilingual informants? What is the easiest and most intuitive way to minimally correct a sentence? Namely, how can users indicate errors? What is the right (intuitive and easy) MT error classification that will help in the RR task the most? Can naive users actually tell what the source of error is? Should we make MT error classification finer depending on user profiles? Does it make a difference if there is more context than just a sentence?

The Translation Correction Tool (TCTool) Online tool User friendly and easy to use (not for linguists or computer experts) Provides translation and error classification help (23-page tutorial + error-example page) Elicits users for as much information about translation errors as possible Initial MT error classification, expected to change after user studies 1st attempt to answer these questions, resulted into the TCTool *online tool: bilingual users can access it from anywhere where there is a computer and and internet connection (office, house, etc.)

http://avenue.lti.cs.cmu.edu/aria/spanish/

MT error classification 1st approach, expected to change after u.s. 9 linguistically motivated error types: word order, sense, agreement (number, person, gender, tense), form (case, POS), incorrect word, and no translation Users were given the agreement options, but case and POS had to be indistinguishably classified as form.

English-Spanish user study 32 sentences from the elicitation corpus: 4 correct / 28 incorrect Examples: sl: mary and anna are falling tl: maría y ana están cayendo sl: you saw the woman tl: viste la mujer tl: vió la mujer sl: i used my elbow to push the button tl: usé mi codo que apretar el botón sl: we are building new bridges in the city tl: nosotros estamos construyó nuevo puentes dentro la ciudad

English-Spanish MTS 12 manually written translation rules (2 for S, 7 for NP and 3 for VP) 442 lexical entries (designed to translate the first 400 sentences of the Elicitation corpus)

Correction Example

Editing a word

Users stats (Spain 24, Colombia 4, Mexico 1) 29 users completed the evaluation. (Spain 24, Colombia 4, Mexico 1) 66% of the users did not have any background in Linguistics, 75% had a graduate degree and 25% of the users had a Bachelor's degree. Fixed 26.6 translations (over 32; 28 needed fixing) duration: ~1 hour 30 min [28min-5hours] about 3 minutes per sentence

****************************************** Time stamp: Mar 9 2004 (22:51:13) num of sessions is 83 but the number of IP addresses is only 55 the number of queries by natives of US English is 2 the number of queries by non-natives is 69 the number of queries by users who did not specify either way is 12 Stats for ALL users: total number of users who finished the user studies is 29 total number of users who didn't start evaluation 15 total number of users who started (68) but didn't finish all sentences is 39 TOTAL NUMBER OF FINISHED SESSIONS 29 and 29 filled out the questionnaire

Gold standard To measure user accuracy in detecting and classifying errors, we need to establish exactly what is the minimum number of errors and corrections needed per translation. Created a gold standard, which determines what are the least number of errors that must be corrected and what are the error types Doesn’t include corrections that might make the translation more fluid, but don’t change it from incorrect to correct (s.a. removing the subject in Spanish)

Accuracy measures (wrt. gold standard) for 10 (of the 29): - from Spain, - 2 had Linguistics background - 2 had a Bachelor's degree, 5 a Masters and 3 a PhD. To measure accuracy, i.e. how close are users from the gold standard, we looked at precision, recall and F1 measure. In this context, precision is a measure of the proportion of errors that the user fixed correctly (\# errors detected correctly / \# errors detected). And since we are also interested in the accuracy of users when telling us what is the type of the error, we also estimated the precision in which users checked the right error type. Recall is a measure of the proportion of the errors in the translations that the user detected (\# errors detected correctly / \# errors there are in gold standard). Usually there is a trade-off between precision and recall, and the F1 measure is an even combination of the two. It is defined as [2*p*r/(p+r)]. All three measures fall in the range from 0 to 1, with 1 being the best score. Interested in high precision, even at the expense of lower recall - ideally: no false positives (users correcting something that is not strictly necessary) - we don't care so much about having false negatives (errors that were not corrected)

Analyzing results Users did not always fix a translation in the same way Most of the time, when the final translation was not = g.s. , it was still correct or better Users only produced 2.5 (out of 26.6) translations that were worse than the g.s. There doesn’t seem to be a time-accuracy correlation

Usability Questionnaire All users said it was easy to determine if a sentence was correct (reality: 89% accuracy) Users who thought it was easy to determine the source of error goes down to 88% of the users (reality: 73% accuracy)

Users thought TCTool is user friendly (82%)

But alignment representation could be improved (67%) Pie charts for all questions at the end

Conclusions MT error classification needs to depart from linguistically motivated classes; motivate on RR operations. TCTool usable, but some improvements needed: Tutorial -> dynamic, with movies (Ken) Make the alignment representation less confusing (done) Add login capability (so that users can take breaks, and not loose their work) Improve edit_word pop-up window interface

TCTool questionnaire stats Ariadna Font Llitjós March 2, 2004 24 users

Total users = 12

Total users = 14