What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester.

Slides:



Advertisements
Similar presentations
Close Reading A revision guide to question types.
Advertisements

Ági Hello. My name’s Ági and I’m a 10th course student in our grammar school. I really feel good here, love my classmates, we have been getting on well.
TEST-TAKING STRATEGIES FOR THE OHIO ACHIEVEMENT READING ASSESSMENT
Machine Translation II How MT works Modes of use.
Vocabulary Assessment Norbert Schmitt University of Nottingham
Terry’s Top 10 Tips for Personal Marketing Terry Kendrick
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
Cooperative Online Writing Lab Bluefield College COWL, 2005 Writing Concepts for ESL Students.
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
The Nature of Learner Language
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
Mr Barton’s Maths Notes
Module The Nature of Translating. What’s Inside Types of Translation.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Corpora and Language Teaching
©2003 Community Faculty Development Center Teaching Culture and Community in Primary Care: Teaching Culturally Appropriate Communication Skills.
Discussion examples Andrea Zhok.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
Style, Grammar and Punctuation
Citizenship & Friendship Essay Feedback. The Prompt What does Bellah suggest is the major difference between the way early Americans thought about friendship.
MACHINE TRANSLATION A precious key to communicate beyond linguistic barriers 1.
Westland Helicopters is an AgustaWestland Company.
Communicative Language Teaching (CLT)
ZUZANA STRAKOVÁ IAA FF PU Pre-service Trainees´ Conception of Themselves Based on the EPOSTL Criteria: a Case Study.
Innovation Leadership Training Goals and Metrics February 5, 2009 All materials © NetCentrics 2008 unless otherwise noted.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
* Discussion: DO YOU AGREE OR DISAGREE WITH THESE STATEMENTS? WHY OR WHY NOT? 1.The difficulty of a text depends mostly on the vocabulary it contains.
SLOW DOWN!!!  Remember… the easiest way to make your score go up is to slow down and miss fewer questions  You’re scored on total points, not the percentage.
Chris Barcock A680: English/ English Language Information and Ideas: Higher and Foundation Tiers.
Fifth Third Bank, Member FDIC. Eisenhower High School September 15, 2013.
Capstone Presentation Guideline February 2010 Middletown High School Middletown Public Schools.
1 CM107 UNIT 4 SEMINAR.  Reflect on the UNIT 3 PROJECT now that you have completed it.  What did you learn about the WRITING PROCESS?  What did you.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Capstone Presentation Guideline March 2014 Middletown High School Middletown Public Schools 2014 Presentation Overview.
9 Simple Steps to Building A Strong and Inspiring “Why or I” Story
Academically Productive Conversations Adapted from: Lily Wong Fillmore UC Berkeley Instructional Strategy.
Social Media Roundup Bad social media: 7 Ways to lose your audience.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
Reflection helps you articulate and think about your processes for communication. Reflection gives you an opportunity to consider your use of rhetorical.
How to Choose the Best Virtual Assistant Aftermarket Inception Computers & Graphics
Close Reading Intermediate 2. Time The Close Reading exam paper lasts for one hour. (Date and time for 2011: Friday 13 May, 1.00pm to 2.00pm.) NAB: Friday.
Self-Documenting Code Chapter 32. Kinds of Comments  Repeat of code  Explanation of code  Marker in code  Summary of code  Description of code’s.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
HOW TO SOLVE PROBELMS An Adventure in Professionalism.
Demonstration and Application Revision Version 2.0: The Return of the Revolutionaries of the Mountain of Slyzikarieth’s Death (Part XIV) Josh Waters Ty.
Breaking the NEWS About CANCER to FAMILY and FRIENDS To Tell or Not To Tell... Karen V. de la Cruz, Ph.D.
Teaching Writing.
Sight Words.
1 Psych 5510/6510 Chapter 13: ANCOVA: Models with Continuous and Categorical Predictors Part 3: Within a Correlational Design Spring, 2009.
Multiplication of Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math 1 3 ×1 3 Applying.
Unit 2 The Nature of Learner Language 1. Errors and errors analysis 2. Developmental patterns 3. Variability in learner language.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Appendix © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or.
1 UNIT-3 KNOWLEDGE REPRESENTATION. 2 Agents that reason logically(Logical agents) A Knowledge based Agent The Wumpus world environment Representation,
Fostering Autonomy in Language Learning. Developing Learner Autonomy in a School Context  The development of learner autonomy is a move from a teacher-directed.
Publishing in Theoretical Linguistics Journals. Before you submit to a journal… Make sure the paper is as good as possible. Get any feedback that you.
Scholastic Aptitude Test Developing Critical Reading Skills Doc Holley.
Ten Things You Should Know About Funding Leo Dunne December 2013.
Seven ways to help students enjoy grammar Elena Babina EL teacher, Gubkin 1Elena Babina, Gubkin.
Language Learning for Busy People These documents are private and confidential. Please do not distribute.. Intermediate: I Disagree.
Personal Responsibility and Decision Making
Statistics 20 Testing Hypothesis and Proportions.
Questionnaire-Part 2. Translating a questionnaire Quality of the obtained data increases if the questionnaire is presented in the respondents’ own mother.
By: Antonio Vazquez.  As far as this year goes, there were a lot of struggles that I had this year, I can’t really explain why, they just occurred. 
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Formulate the Research Problem
Reading Strategies “The only guide you'll ever need to Reading Chinese,” accessed at Zizzle Learn Chinese
Presentation transcript:

What a professional translator should know about Machine Translation Harold Somers Professor Emeritus University of Manchester

2 Background Machine Translation (MT): 60-year-old technology, firmly established (esp. free online MT) as viable, though flawed. Professional translators’ reservations –fears (misplaced as it turns out) that it would take work away from them –disgust at bad image MT gives to the profession recent developments suggest need for more reconciliatory approach –MT as a “colleague” rather than a rival Need better understanding of what MT can and, more importantly, cannot do.

3 Overview Focusing on full text MT … How MT works Strengths and weaknesses What should translators say about MT? Assumption: we’re mostly talking about free online MT here

4 History of MT : Crude early attempts with unsophisticated computers and naïve linguistic approach –mainly word-for-word : Linguistic rule-based programs –some successes, especially with “sublanguage” –requires much effort to build 1991-…: Statistics-based programs, learning translation patterns from large amounts of data –quick to develop if data is available –surprisingly good quality (but see later)

5 How does (S)MT work? Requires huge amounts of parallel (bilingual) data – i.e. texts and their translations Programs automatically align the texts (sentence-by-sentence where possible), then extract (or “learn”) translation probabilities (“models”) At run time, probabilities are juggled to get the highest scoring result

6 A little more detail Actually, two models are learned from the data: –Translation model: given words and word sequences in SL, what are the most likely corresponding words in the TL? –Target-language model: given these corresponding TL words, what is the most likely way in which they will be combined?

7 So how well does that work? Let’s look first at what makes translation hard for a computer … …then see how well SMT handles these difficulties … … and what we can conclude from that

8 Why is translation hard for a computer? Language is highly ambiguous Translation largely requires genuine understanding Translation is all about style You may not have realised that, but for a computer it is true Debatable in some cases, but undoubtedly often true Well, sometimes it is!

9 Language is difficult Individual words –ambiguous morphology –homonymy –polysemy –translation divergences Sequences of words –local ambiguity –global ambiguity –“Dependencies” –TL grammar –numb:number, tow:tower –round, bank, last, flush –report, range –wall = muro/parete –car boot sale, He shot the man with a gun –Time flies like an arrow –He left the passage that had taken him so long to compose out –This bed has been slept in Humans use their general understanding of context or plausibility, and so often don’t even notice the ambiguity “Contrastive” knowledge of languages is a big part of what translation is about

10 How does MT cope? Ambiguity –“Translation models” handle not only individual words, but word sequences –If the model has the wrong interpretation, the system is likely to reproduce it –Also, dependencies between words (which can be arbitrarily distant from each other) are more difficult to capture –Target-language models may also help here

11 How does MT cope? Style and nuance –Both translation and TL models can only reflect the data on which they have been trained –Probability data is generally not fine-grained enough to capture niceties –Again, anything that depends on long- distance dependencies is unlikely to shine through

12 What are MT’s strengths? Impact of training data is paramount: –MT performs best when translating the kind of text it has been trained on –This was also true of rule-based systems –Somewhat true of (specialised) human translators too Tension between –need to use as much material as possible for training –desire (eg Google) to provide a generic translation service –trade-off between coverage and translation quality

13 What are MT’s strengths? MT in general performs well with –simple grammatical source text free of ambiguities, colloquialisms, etc. for which style and nuance is not so important Happily these are the kinds of texts that human translators find least engaging However well MT manages, it is not 100% reliable as as a human

14 What you should say about MT It’s good (even preferable) for some things Mainly translation into the client’s language (“assimilation”) –Reading a document in a foreign language to see what it’s about and whether they need a proper translation, or which bits need translation –They may feel able (if they know the source language) to tidy it up (“post-editing”, “revision”) themselves, though they should always be aware of the risk involved Rough and ready translation into a foreign language –eg for informal communication with someone who can tolerate a rough translation –Again, the risks must be emphasised –Possible use (even by translators) of MT as a first draft: postediting

15 What you should say about MT But for other things MT might be quite unsuitable, and HT is still a better bet –Certainly any document (eg for publication) where the quality of the translation will reflect on your client –Any document where style and presentation is important –Any document where accuracy is crucial –Translation into a target language that the client does not know at all carries a major risk

16 A final word of warning Clients might like to evaluate an MT system for themselves A common method is back-and-forth (“round trip”) translation This has some major drawbacks: –A bad RT may be caused by a bad outward trip or a bad return trip … hard to know which –A good RT may hide a bad translation – eg word for word nonsense in the TL, which comes back as the same original source text So RTT on a single sentence won’t tell you much … so test it with a longer text: if it does OK it may be a fair result; if it does badly you can never be sure why

17 Grazie