CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 2 - Search.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
1 CS 430: Information Discovery Lecture 4 Data Structures for Information Retrieval.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Chapter 2: Algorithm Analysis Application of Big-Oh to program analysis Running Time Calculations Lydia Sinapova, Simpson College Mark Allen Weiss: Data.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Natural Language Processing Expectation Maximization.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 16– Linear and Logistic Regression) Pushpak Bhattacharyya CSE Dept., IIT Bombay.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
CS344: Introduction to Artificial Intelligence (associated lab: CS386)
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Class1 Class2 The methods discussed so far are Linear Discriminants.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 3 - Search.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Association Analysis (3)
Chapter 12 search and speaker adaptation 12.1 General Search Algorithm 12.2 Search Algorithms for Speech Recognition 12.3 Language Model States 12.4 Speaker.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 13– Search 17 th August, 2010.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 3: Search, A*
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Statistical Machine Translation Part II: Word Alignments and EM
CSE 517 Natural Language Processing Winter 2015
CS 430: Information Discovery
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Build MT systems with Moses
CSE 373 Data Structures and Algorithms
CSCI 5832 Natural Language Processing
From Word Spotting to OOV Modeling
Expectation-Maximization Algorithm
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Papers from COLING 2004
Machine Translation and MT tools: Giza++ and Moses
Statistical Machine Translation Part VI – Phrase-based Decoding
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept., IIT Bombay 17 th Feb, 2011

Training Process It is a nine step process Prepare Data Run GIZA++ Align words Get lexical translation table Extract Phrases Score Phrases Build lexicalized reordering model Build generation models Create configuration file

Preparing Data (1/2) Sentence aligned data in two files. One file containing foreign sentences other containing English sentences Everything should be lowercase Sentence length less than 100 For factored model the training date should be Word0factor0|word0factor1|word0factor2 so on instead of the un-factored word0 word1 word 2 Cleaning the corpus drop empty lines, redundant spaces and eliminates sentence that violate 9-1 sentence ratio limit

Preparing Data (2/2) Input to GIZA++ is two vocabulary files Vocabulary files contain words, integer word identifiers and word count information 1UNK0 2and7 3,6 4irian4 Each sentence has 3 entries GIZA++ requires each word to be placed into word classes Done by mkcls

Run GIZA++ Used to establish word alignments Word alignments are taken from the intersection of bidirectional runs of GIZA++ For each word in each sentence it marks the possible alignment points in both the direction

Align Words Heuristics GROW-DIAG-FINAL(e2f,f2e): neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e); GROW-DIAG(): iterate until no new points added for english word e = 0... en for foreign word f = 0... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and ( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0... en for foreign word f-new = 0... fn if ( ( ( e-new, f-new ) in alignment a) and ( e-new not aligned or f-new not aligned ) ) add alignment point ( e-new, f-new )

Get lexical Translation Table Given alignment, maximum likelihood lexical translation table can be estimated Two files in model/lex.0-0.f2e and model/lex.0-0.e2f contains the lexical translation probability Extract Phrases: model/extract contains all the phrases, with their translation in target language and alignment points An inverted file named model/extract.inv contains the inverse mapping

Score Phrases To calculate phrase translation probability sort the extract file, so that translation for a particular foreign word are next to each other in the file Calculate counts for each foreign word Do this for the inverted file to calculate for each English word Each phrase table entry consists of 5 probabilities Phrase translation probability Lexical weighting lex(f|e) Phrase translation probability Lexical weighting lex(e|f) Phrase penalty (exp(1)= always )

Sample from Phrase-table b o ||| b aa ||| (0) (1) ||| (0) (1) ||| b ||| b ||| (0) ||| (0) ||| c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3) (1,2,4) (0) ||| c ||| p ||| (0) ||| (0) ||| d w ||| d w ||| (0) (1) ||| (0) (1) ||| d ||| d ||| (0) ||| (0) ||| e b ||| ah b ||| (0) (1) ||| (0) (1) ||| e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| e l ||| eh ||| (0) (0) ||| (0,1) ||| e ||| ah ||| (0) ||| (0) ||| h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| h ||| hh ||| (0) ||| (0) ||| l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| l e ||| l ah ||| (0) (1) ||| (0) (1) ||| l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| l l ||| l ||| (0) (0) ||| (0,1) ||| l o ||| l ow ||| (0) (1) ||| (0) (1) ||| l ||| l ||| (0) ||| (0) ||| m ||| m ||| (0) ||| (0) ||| n d ||| n d ||| (0) (1) ||| (0) (1) ||| n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| n e ||| n iy ||| (0) (1) ||| (0) (1) ||| n ||| eh n ||| (1) ||| () (0) ||| o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| o o ||| uw ||| (0) (0) ||| (0,1) ||| o ||| aa ||| (0) ||| (0) ||| o ||| ow eh ||| (0) ||| (0) () ||| o ||| ow ||| (0) ||| (0) ||| w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| w ||| w ||| (0) ||| (0) |||

Build reordering, generation model Distance based reordering model is included Generation model is build in case of factored model It is done at the target side For eg: root/lemma at target side + suffix at target side  surface word at target side

Decoding It uses Beam search algorithm Translation options: given an input, a number of phrase translations are applied Translation options are collected before any decoding takes place Following information are stored along with TO First foreign word covered Last foreign word covered English phrase translation Phrase translation probability

Algorithm Start with null hypothesis For all the foreign words look for the translation options are expand it along with its probability The best probability hypothesis is selected and that foreign word is marked as translated We maintain back pointers of the hypothesis to read partial translations of the sentence The probability is nothing but cost of the new state i.e. cost of original state multiplied with the translation, distortion and language model cost of the added phrasal translation Final state is the state where all the foreign words are covered

Recombining Hypothesis Two hypothesis can be recombined if they agree in Foreign words covered so far The last two English words generated The end of the last foreign phrase covered We keep the cheaper hypothesis and discard the other one The pruning not only include the cost of each hypothesis so far but also the future estimate of the remaining sentence

Pruning Two types of pruning Threshold pruning a hypothesis which is less than α times the best hypothesis Histogram pruning Keeps certain number of hypothesis (e.g. n=100) This type of pruning is not risk free in comparison to recombining of hypothesis

Example Hindi Sentence is विनोद ने सचिन को छूरे से मारा | Required English sentence is Vinod Stabbed Sachin. At each level the best translation option is selected whose probability is highest of all. Hindi Sentence is विनोद ने सचिन को छूरे से मारा | Required English sentence is Vinod Stabbed Sachin. At each level the best translation option is selected whose probability is highest of all.