Computational Linguistics Seminar LING-696G Week 9.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Creating Tables The basics, nothing pretty, but it works.
Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
MMDS Secs Slides adapted from: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, October.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
the fourth iteration of this loop is shown here
CSC1016 Coursework Clarification Derek Mortimer March 2010.
Project topics Projects are due till the end of May Choose one of these topics or think of something else you’d like to code and send me the details (so.
MSU CSE 240 Fall 2003 Stockman CV: 3D to 2D mathematics Perspective transformation; camera calibration; stereo computation; and more.
VLSI Arithmetic. Multiplication A = a n-1 a n-2 … a 1 a 0 B = b n-1 b n-2 … b 1 b 0  eg)  Shift.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Using Reduction for the Game of NIM. At each turn, a player chooses one pile and removes some sticks. The player who takes the last stick wins. Problem:
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Lecture 09 Clustering-based Learning
Computer Organization and Assembly Language: Chapter 7 The Karnaugh Maps September 30, 2013 By Engineer. Bilal Ahmad.
∞ Ω ∞
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Guide to Assignment 3 Programming Tasks 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
Vlad Furash & Steven Wine.  Problem surfaced in 1848 by chess player Max Bezzel as 8 queens (regulation board size)  Premise is to place N queens on.
Computer Programming TCP1224 Chapter 2 Beginning the Problem-Solving Process.
Machine Translation Course 8 Diana Trandab ă ț Academic year:
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Yaomin Jin Design of Experiments Morris Method.
 Agenda 2/20/13 o Review quiz, answer questions o Review database design exercises from 2/13 o Create relationships through “Lookup tables” o Discuss.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
How to make tables in HTML By Daniel Arze. How do they do this?
Introduction to C++ Programming Language Assistant Professor Jeon, Seokhee Assistant Professor Department of Computer Engineering, Kyung Hee University,
Vector Quantization CAP5015 Fall 2005.
October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.
Guide to Assignment 3 and 4 Programming Tasks 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas.
KEYBOARD INTERFACING Keyboards are organized in a matrix of rows and columns The CPU accesses both rows and columns through ports. ƒTherefore, with two.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Approximating Derivatives Using Taylor Series and Vandermonde Matrices.
Matrix Multiplication The Introduction. Look at the matrix sizes.
Computational Linguistics Seminar LING-696G Week 5.
Computational Linguistics Seminar LING-696G Week 6.
Inter-disciplines and applied linguistics. Inter-disciplines: Sociolinguistics looks at how language is used in a social context, e.g. –language use and.
Computational Linguistics Seminar LING-696G Week 8.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Computational Linguistics Seminar LING-696G Week 7.
Ling 575: Machine Translation Yuval Marton Winter 2016 January 19: Spill-over from last class, some prob+stats, word alignment, phrase-based and hierarchical.
Computational Linguistics Seminar LING-696G Week 10.
Trace Tables In today’s lesson we will look at:
Binary numbers: Week 7 Lesson 1
Lecture 12. MLP (IV): Programming & Implementation
CSCI 5832 Natural Language Processing
Factoring Quadratics December 1st, 2016.
Lecture 12. MLP (IV): Programming & Implementation
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Machine Translation and MT tools: Giza++ and Moses
Problem Solving.
Instruction encoding We’ve already seen some important aspects of processor design. A datapath contains an ALU, registers and memory. Programmers and compilers.
Unsupervised Learning and Clustering
Machine Translation and MT tools: Giza++ and Moses
Calibration and homographies
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

Computational Linguistics Seminar LING-696G Week 9

IBM Models 4 and 5 Model 3: 1.translation of words (T-table), t(e|f) 2.reordering (distortion), 3.insertion of words (NULL insertion), 4.dropping of words (words with fertility 0), and one-to-many translation (fertility). – alignment (distortion) search space too big uses sampling with hill-climbing

IBM Models 4 and 5 Uses POS tagging (word classes): – need a tagger for foreign words (or some clustering method) Uses a relative distortion model: – cept: foreign word mapping to at least one English word – center of a cept: ceiling(average output position) – offset from center of previous cept π1π1 π2π2 π4π4 π3π3 π5π5 not go

IBM Models 4 and 5 Relative distortion: – first word of a cept i d 1 ( j − ⊙ i−1) Example: relative distortion for 1st word of π 3 = -1 since ⊙ 2 =4 – next words of a cept i π i,k = position of the kth word in the ith cept d >1 ( j − π i,k−1 ) Example: position of to is π 4,0 = 5, and the position of the is j = 6

IBM Models 4 and 5 Relative distortion: – conditioned on POS tags – for initial word in cept: d 1 ( j − ⊙ [i−1] |A( f [i−1] ), B(e j )) – for additional words: d >1 ( j − π i,k−1 |B(e j )) – Two functions A(f) and B(e) that map words to their word classes – the operator [i] maps cept i back to its corresponding foreign input word position.

NLTK IBM Model 4 class nltk.translate.ibm4.IBMModel4(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None) >>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book'])) >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book'])) >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize'])) >>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 } >>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 } ibm4 = IBMModel4(bitext, 5, src_classes, trg_classes)

IBM Model 5 Newer distortion model to prevent deficiency – With Models 3 and 4, nothing prohibits the placement of an output word into a position that has already been filled. Let v j the number of vacancies in the English output interval [1;j] – d 1 (v j |B(e j ),v ⊙ i−1,vmax) A( f [i−1] ) gone! – d >1 (v j −v πi,k−1 |B(e j ),vmax′)

class nltk.translate.ibm5.IBMModel5(sentence_aligned_corp us, iterations, source_word_classes, target_word_classes, probability_tables=None) ibm5 = IBMModel5(bitext, 5, src_classes, trg_classes)

Alignment Matrix English: rows; Foreign language: columns

Alignment Matrix English: rows; Foreign language: columns = "bit into the grass" phrasal alignment

ibm3a.py ibm1 training data: 1.the a book house 2.das ein Buch Haus 3.the house 4.das Haus 5.the book 6.das Buch 7.a book 8.ein Buch ibm3a training data: 1.to the house 2.das ja zum Haus 3.the house 4.das ja Haus to the house 12.zum Haus fertility counts e.g. line 5 das:1 ja:0 Haus:1 fertility counts e.g. line 5 das:1 ja:0 Haus:1

ibm_fertility.py

ibm3a.py ibm3a training data: 1.to the house 2.das ja zum Haus 3.the house 4.das ja Haus to the house 12.zum Haus the house das Haus the house ja Haus the house das ja the house das das the house ja ja the house Haus Haus

IBM Model 3a IBM Model 3 – Factor in fertility and t(null|fw j ) – Problems … n(0|fw j ) = 1 means t(null|fw j ) How to initialize t? – Code: given es and fs e.g. the house and das das k = t[es[j]][fs2[i]] * a[l_f][l_e][i][j] * n[fs2[i]][f_lookup[fs2[i]]]/ s_total[es[j]] where f_lookup = fertility number

English to English m