Machine Translation Course 7 Diana Trandab ă ț Academic year 2014-2015.

Slides:



Advertisements
Similar presentations
Schema Theory.
Advertisements

The human brain … … tricks us whenever it can!.
The human brain … … tricks us whenever it can!. The phenomenal power of the human mind I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg.
Note taking skills Instructions … 1.Listen as your teacher reads the paragraph on the following page. Take notes as you listen and follow along.
Language Computational Cognitive Neuroscience Randall O’Reilly.
I hope you like & find the following useful. Please press enter, or click anywhere on the screen to continue. You can navigate from the bottom left hand.
The phenomenal power of the human mind I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg The phaonmneal pweor of the hmuan mnid! Aoccdrnig.
Welcome to the challenges of the Sixth Form. Even when you have problems there will always be a way to overcome them. In Sixth Form there can be a range.
Bellwork (4/22) What is interference?
MPSH Career Education.  Commonly defined as “the exchange of thoughts, ideas, feelings, information, opinions, and knowledge.”
Diversity Americans with Disabilities Act (ADA) July 26 th Anniversary Presented by Janette De La Rosa Ducut, Ed.D. Training Manager.
What is science? Science: is a process by which we gain knowledge deals only with the natural world collects & organizes information (data/evidence) gives.
Effective Communication The Magic of Language List 3 written ways you communicate on the job Bell Ringer: Answer the following questions in your notes:
Evaluate Schema Theory with reference to research studies.
Logo Design. UNTITLED Cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mind: aoccdrnig to a rscheearch.
Cognition Unit 7B.
The Teaching of PHONICS at St Anthony’s Catholic Primary School Wednesday 24 th September 2014 Mrs Cutler & Mrs Beard.
Thinking, Language, Intelligence
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
Writing and Referencing 101. Some APA basics Paraphrasing: * Disclaimer: examples are NOT from actual articles One solution to the issue is moving tables.
Conversation about Assessment With English Language Learners Sunrise School Division EAL Committee – January 25, 2012.
Special Day Classes (SDC) We must take them from GOOD to GREAT!!!!!!!!!!!!!!
What do you see?. O lny srmat poelpe can raed tihs. I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor.
Public Rhetoric and Practical Communication Composing with a Purpose Lecture 5: CAT 125 Elizabeth Losh
Brain Boot Camp Study Smarter, not Harder. Broca’s Area Test A big black bug bit a big black bear, made the big black bear bleed blood.
~ Thought Journal ~ SILENTLY read the following passage. When you are finished, SILENTLY write down your reaction in your thought journal. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Grab a seat 2. Get a drink 3. Choose a bun (or two) 4. Have a chat.
Communication “ The exchange of information, facts, ideas and meanings” Quinn et al. (2003, p38) Transferring information to bring about change “ The process.
The phenomenal power of the human mind   I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg.The phaonmneal pweor of the hmuan mnid!
Please read this sentence and count the number of F’s:
Literacy Workshop. Areas of Literacy Reading Speaking and Listening Writing.
CALL Computer Assisted Language Learning : Research University of Stellenbosch.
Lesson 80 - Mathematical Induction HL2 - Santowski.
Ignite your thought process Creativity. Two Myths About Creativity  Only a few special people possess it  Creativity is a gift and not a skill.
I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonmneal pweor of the hmuan mnid. Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy,
The human brain … … tricks us whenever it can!.
Readability Make sure at least the first and last letter of your word are very readable. If some of the other in between letters must sacrifice their readability.
Carre’s Grammar School Whole School. Carre’s Grammar School So why bother? I cnduo't bvleiee taht I culod aulaclty uesdtannrd waht I was rdnaieg. Unisg.
Observation - FUN Mr. McEwen 6 th Grade Science.
Literacy Workshop. Areas of Literacy Reading Speaking and Listening Writing.
Welcome to Group Dynamics LDSP 351 Dr. Crystal Hoyt.
ADVICE FOR PARENTS AND CARERS READING FOR PLEASURE AND PROGRESS Maggie McGuigan Tess Bhesania English Adviser Assistant Head Teacher.
Language. Language & Language Structure Language: our spoken, written, or signed words and the ways we combine them to communicate meaning.
Progression of phonics and reading in Key Stage 1
Year 1 Phonics workshop.
Flowers for Algernon Look up the words on both sides of the sheet. Write a clear definition in the right column. Use the first definition unless stated.
Hawes Down Infant School Excellence and Enjoyment
Unit 1: Communicating in the IT Industry
The human brain … … tricks us whenever it can!.
The phenomenal power of the human mind   I cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg The phaonmneal pweor of the hmuan mnid!
Reading at home with your child
English Information Evening
Please read the sign..
Classical Humanities Spring 2017
Even though the next page may look weird, you can still read it!
There are 9 people in this picture. Can you find them all?
Science and the Scientific Method
Hidden Biases of Good People
Science and the Scientific Method
St Mary’s Catholic Primary School
Reading and Phonics in the Early Years 2nd October 2018
Language.

Science and the Scientific Method
KS1 Literacy and Maths Workshop Thursday 27th September 2018
The Dingle Parent Reading Workshop
Sensation and Perception

Can you read this?? I cnduo't bvleiee taht I culod aulaclty uesdtannrd waht I was rdnaieg. Unisg the icndeblire pweor of the hmuan mnid, aocdcrnig to.
St Mary’s Catholic Primary School
Presentation transcript:

Machine Translation Course 7 Diana Trandab ă ț Academic year

What goes wrong? We see many errors in machine translation when we only look at the word level – Missing content words MT: Condemns US interference in its internal affairs. Human: Ukraine condemns US interference in its internal affairs. – Verb phrase MT: Indonesia said that oppose the presence of foreign troops. Human: Indonesia reiterated its opposition to foreign military presence.

What goes wrong? – Wrong dependencies MT: …, particularly those who cheat the audience the players. Human: …, particularly those players who cheat the audience. – Missing articles MT: …, he is fully able to activate team. Human: …, he is fully able to activate the team.

What goes wrong? – Word salad: the world arena on top of the u. s. sampla competitors, and since mid – july has not appeared in sports field, the wounds heal go back to the situation is very good, less than a half hours in the same score to eliminate 6:2 in light of the south african athletes to the second round. as opposed to letter salad

How can we improve? Relying on language model to produce more ‘accurate’ sentences is not enough Many of the problems can be considered ‘syntactic’ Perhaps MT-systems don’t know enough about what is important to people So, include syntax into MT – Build a model around syntax, or – Include syntax-based features in a model

Syntax-based translation One criticism of the phrase-based MT is that it does not model structural or syntactic aspects of the language. Syntax based MT uses parse trees to capture linguistic differences such as word order and case marking. Reordering for syntactic reasons – e.g., move German object to end of sentence Better explanation of function words – e.g., prepositions, determiners Conditioning to syntactically related words – translation of verb may depend on subject or object Use of syntactic language models

Syntax-based MT You have a sentence and its parse tree The children at each node in the tree are rearranged New nodes may be inserted before or after a child node These new nodes are assigned a translation Each of the leaf lexical nodes is then translated

A syntax-based model Assume word order is based on a reordering of source syntax tree > Reorder Assume null-generated words happen at syntactical boundaries > Insert (For now) Assume a word translates into a single word > Translate

Reorder

Insert

Translate

Syntactic language models Good syntax tree => good target language Allows for long distance constraints

Parameters Reorder (R) – child node reordering – Can take any possible child node reordering – Defines word order in translation sentence – Conditioned on original child node order – Only applies to non-leaf nodes

Parameters cont. Insertion (N) – placement and translation – Left, right, or none – Defines word to be inserted – Place conditioned on current and parent labels – Word choice is unconditioned

Parameters cont. Translation (T) – 1 to 1 – Conditioned only on source word – Can take on null Translation (T) – N to N – Consider word fertility (for 1-to-N mapping) – Consider phrase translation at each node – Limit size of possible phrases – Mix phrasal w/ word-to-word translation

Do we need the entire model to be based on syntax? Good performance increase Large computational cost – Many permutations to CFG rules How about trying something else? – Add syntax-based features that look for more specific things

Syntax-based Features Shallow – POS and Chunk Tag counts – Projected POS language model Deep – Tree-to-string – Tree-to-tree – Verb arguments

Shallow Syntax-Based Features POS and chunk tag count – Low-level syntactic problems with baseline system. Too many articles, commas and singular nouns. Too few pronouns, past tense verbs, and plural nouns. – Reranker can learn balanced distributions of tags from various features – Examples Number of NPs in English Difference in number of NPs between English and Chinese Number of Chinese N tags translated to only non-N tags in English.

Shallow Syntax-Based Features Projected POS language model – Use word-level alignments to project Chinese POS tags onto the English words Possibly keeping relative position within Chinese phrase Possibly keeping NULLs in POS sequence Possibly using lexicalized NULLs from English word – Use the POS tags to train a language model based on POS N-grams

Deep Syntax-based MT

Deep Syntax-Based Features Tree to string – Models explain how to transduce a structural representation of the source language input into a string in the target language – During decoding: Parse the source string to derive its structure Decoding explores various ways of decomposing the parse tree into a sequence of composable models, each generating a translation string on the target side The best-scoring string can be selected as the translation

Deep Syntax-Based Features String-to-Tree: – Models explain how to transduce a string in the source language into a structural representation in the target language – During decoding: No separate parsing on source side Decoding results in set of possible translations, each annotated with syntactic structure The best-scoring string + structure can be selected as the translation ne VB pas  (VP (AUX (does) RB (not) x2

String-to-Tree -Learn a direct translation model from word-level aligned corpus -Extract reordering patterns

Deep Syntax-Based Features Tree to Tree – Models explain how to transduce a structural representation of the source language input into a structural representation in the target language – During decoding: Decoder synchronously explores alternative ways of parsing the source-language input string and transduce it into corresponding target-language structural output. The best-scoring structure+structure can be selected as the translation

Tree to Tree cont. At each level of the tree: 1.At most one of the current node’s children is grouped with the current node into a single elementary tree with its probability conditioned on the current node and its children. 2.An alignment of the children of the current elementary tree is chosen with its probability conditioned on the current node an the children of child in the elementary tree. This is similar to the reorder operation in the tree- to-string model, but allows for node addition and removal. Leaf-level parameters are ignored when calculating probability of tree-to-tree.

Verb Arguments Idea: A feature that counts the difference in the number of arguments to the main verb between the source and target sentences Perform a breadth-first search traversal of the dependency trees – Mark the first verb encountered as the main verb – The number of arguments is equal to the number of its children – Account for differences in the number of argumetns

Syntax-augmented Phrase based MT Similar to phrase-based machine translation, but includes syntax in the creation of phrases.

“The world cannot be translated; It can only be dreamed of and touched.” Dejan StojanovicDejan Stojanovic, The CreatorThe Creator

As opposed to “letter salad” I cnduo't bvleiee taht I culod aulaclty uesdtannrd waht I was rdnaieg. Unisg the icndeblire pweor of the hmuan mnid, aocdcrnig to rseecrah at Cmabrigde Uinervtisy, it dseno't mttaer in waht oderr the lterets in a wrod are, the olny irpoamtnt tihng is taht the frsit and lsat ltteer be in the rhgit pclae. The rset can be a taotl mses and you can sitll raed it whoutit a pboerlm. Tihs is bucseae the huamn mnid deos not raed ervey ltteer by istlef, but the wrod as a wlohe. Aaznmig, huh? Yaeh and I awlyas tghhuot slelinpg was ipmorantt! See if yuor fdreins can raed tihs too. back