Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, 2009 11/20/20141.

Slides:



Advertisements
Similar presentations
Tumenta P N. Leiden University,The Netherlands &
Advertisements

1/3/20141 SGVFFL League Information Comish RON WEYANT Cell phone
EER to Relation Models Mapping
Texas Conference of Urban Counties TIJIS Update TCJUIG Conference May 4, 2011 Tarrant County 1/8/20141.
Monday, January 13, Instructor Development Unit 1 Instructional Responsibilities Ed Humphrey.
Monday, January 13, Instructor Development Lesson 9.
Measles / MMR Vaccine Developments last 6 months Dr.Sanjay Srirampur 1/16/20141IAPCOI Measles/MMR - Dec 2011.
Dr. Peter OReilly Chairperson- ISM Services Group /23/20141 NAPM-AZ Presentation- March 2009.
Statistical Machine Translation
Hoai-Viet To1, Ryutaro Ichise2, and Hoai-Bac Le1
The Benefits of Publishing with IEEE Updated PROD-0073 Print Fix - Author PPT.
National Seminar on Developing a Program for the Implementation of the 2008 SNA and Supporting Statistics in Turkey Arzu TOKDEMİR 10 September 2013 Ankara.
Scale of quota assessments of the Member States Secretariat for Administration and Finance.
Scale of quota assessments of the Member States Secretariat for Administration and Finance.
BUS 220: ELEMENTARY STATISTICS
Vermelding onderdeel organisatie February 20, Restructuring Support Services and Administration at Delft University of Technology A major case of.
Welcome Welcome to the next session in the professional development program focused around the 9-12 Mathematics Standards. 3/1/20141Geometry.
SAP-Customizing SAP-Customizing.

Nordic Council of Ministers Friday, May 30, The Nordic Council of Ministers and the EU Baltic Sea Strategy.
6/1/20141 The Legislative Process in Alaska 6/1/20142 Courtesy of the Juneau Legislative Information Office.
June 2, Mobile Computing COE 446 Mobile Ad hoc Networks Tarek Sheltami KFUPM CCSE COE
Grade-3 Pine View School Mrs. Seider’s class
6/3/20141 Credit Policy and Household Level Data Kinnon Scott DECRG World Bank Data on Access of Poor and Low Income People to Financial Services.
Virtual Network Embedding with Coordinated Node and Link Mapping N. M. Mosharaf Kabir Chowdhury Muntasir Raihan Rahman and Raouf Boutaba University of.
© 2007 Cisco Systems, Inc. All rights reserved. 1 Valašské Meziříčí Networking Media.
6/14/20141 A Cluster Formation Algorithm with Self-Adaptive Population for Wireless Sensor Networks Luis J. Gonzalez.
Intersection Schemas as a Dataspace Integration Technique 8/21/20141 Richard BrownlowAlex Poulovassilis.
UUCS Congregational Meeting December 5, /25/20141.
10/6/20141 The PeopleSide of Change Agenda Why is the People Side of Change Important Components of a Successful Change Program How We Get There.
8/25/20141 Road Map to Success Business Plan Preparation Workshop.
10/8/20141 DV for Tax Module 5 Wage Item Validation.
10/11/20141 MART Managers’ Conference G. George Wallin, PhD, MBA Vice President/Chief Operating Officer Sherburne TeleSystems, Inc.
Sybase PowerBuilder Applications Modernization. 11 October About the Company Founded in 2002 Unites high-level information technology and organization.
10/22/20141 GDP and Economic Growth Chapter /22/20142 Outline Gross Domestic Product Gross Domestic Product Economic Growth Economic Growth.
MarcEdit "A Closer Look at Productivity Tools” NETSL 2014 Apr. 11, pm.
Propositional Predicate
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Morphological Analysis for Phrase-Based Statistical Machine Translation LUONG Minh Thang Supervisor: Dr. KAN Min Yen National University of Singapore Web.
Morphological Analysis for Phrase- Based Statistical Machine Translation Luong Minh Thang WING group meeting – 15 Aug, 2008 HYP update - part1 4/30/20151.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Translating from Morphologically Complex Languages: A Paraphrase-Based Approach Preslav Nakov & Hwee Tou Ng.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages Minh-Thang Luong, Preslav Nakov & Min-Yen Kan EMNLP 2010,
Korea Maritime and Ocean University NLP Jung Tae LEE
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Ankit Srivastava CNGL, DCU Sergio Penkale CNGL, DCU
A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages Minh-Thang Luong, Preslav Nakov & Min-Yen.
Statistical Machine Translation Papers from COLING 2004
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

Enriched translation model using morphology in MT Luong Minh Thang WING group meeting – 07 July, /20/20141

Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20142

SMT overview – alignment Parallel data 11/20/2014 These are, first and foremost, messages of concern at the economic and social problems that we are experiencing, in spite of a period of sustained growth stemming from years of efforts by all our fellow citizens. Ensinnäkin kohtaamiemme taloudellisten ja sosiaalisten vaikeuksien vuoksi on havaittavissa huolestumista, vaikka kasvu on kestävällä pohjalla ja tulosta vuosien ponnisteluista, kaikkien kansalaistemme taholta. Alignment: one-to-many (1-M) Marianodabaunabotefadaalabrujaverde NULLMarydidnotslapthegreenwitch Source Target 3

SMT overview – translation model Intersect alignment 1-M + M-1  M – M Extracting phrases from M-M alignment  translation model (phrase table). 11/20/2014 problems ||| ongelmat ||| problems ||| ongelmasta ||| … problems ||| vaikeuksista ||| problems ||| vaikeuksien ||| Phrase penalty Translation probabilities English eForeign f 4 Lexical probabilities

Recap - Morphological analysis Morpheme: minimal meaning-bearing unit English: machine + s, present + ed, etc. Finnish: oppositio + kansa + n + edusta + ja = opposition of parliament member Morfessor (Creutz & Lagus, 2007): segment words, unsupervised manner un/PRE + fortunate/STM + ly/SUF 11/20/20145

Motivation Problem: – Multiple word forms in morphology-complex language, e.g. ongelmat, ongelmasta, etc. – Rare words often occur and are hard to align  incorrect entries in normal (word-align) phrase table. Solution: – Construct morpheme-align phrase table (PT) to aggregate better statistics for rare words. – Combine word- and morpheme-align PTs to produce even better translation model in a proper way. 11/20/20146

Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/20147

Twin phrase-table (PT) construction 11/20/2014 GIZA++ Decoding Word alignment Morpheme alignment WordMorpheme PT m PT wm Phrase Extraction PT w Morphological segmentation Phrase Extraction GIZA++ PT merging problem/STM+ s/SUF ||| ongelma/STM+ t/SUF problem/STM+ s/SUF ||| vaikeu/STM+ ksi/SUF+ sta/SUF problems ||| vaikeuksista 8

Existing PT-merging methods Add-feature - (Nakov, 2008; Chen et. al. 2009): F1 = F2 = F3 =  heuristic-driven Interpolation - (Wu & Wang, 2007) : – tran(f|e) = α * tran 1 (f|e) + (1- α) * tran 2 (f|e) – lex(f|e) = β * lex 1 (f|e) + (1- β) * lex 2 (f|e)  not consider score “meaning” 11/20/ if from 1 st PT 0.5 otherwise 1 if from 2 nd PT 0.5 otherwise 1 if from both PTs 0.5 otherwise 9

Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 Undesired translation! tran(vaikeuksista | problems) = ( )/2 = tran(ongelmat | problems) = ( )/2 = tran(ongelmasta | problems) = ( )/2 = 0.25 Interpolation (ratio = 0.5) 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Our merging method – normalizing translation probabilities tran 1 (e|f) =count 1 (e, f) / ∑ e count 1 (e, f) tran 2 (e|f) =count 2 (e, f) / ∑ e count 2 (e, f) 11/20/ Normalization tran(e|f) =[ count 1 (e, f) + count 2 (e, f)] / [ ∑ e count 1 (e, f) + ∑ e count 2 (e, f) ] problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Our merging method – normalizing translation probabilities tran(vaikeuksista | problems) =1/2=0.5 tran(ongelmasta | problems) =1/2=0.5 tran(ongelmat | problems) = 3/4 = 0.75 tran(vaikeuksista | problems) = 1/4 = 0.25 tran(vaikeuksista | problems) = (1 + 1)/(2+4) = 0.33 tran(ongelmat | problems) = (0 + 3)/(2 + 4) = 0.5 tran(ongelmasta | problems) = (1 + 0)/(2 + 4) = 0.17 Desired translation! Normalization 11/20/ problem + s ||| vaikeu + ksi + sta problem + s ||| ongelma + sta problem + s ||| ongelma + t problem + s ||| vaikeu + ksi + sta PT wm PT m MLE

Our merging method – full lexical probability interpolation lex(vaikeuksista | problems) = w 1 lex(ongelmasta | problems) = w 2 lex(vaikeu + ksi + sta | problem + s) = m 1 lex(ongelma + t | problem + s) = m 3 lex(vaikeuksista | problems) = (w 1 + m 1 )/2 lex(ongelmat | problems) = (w 2 + 0)/2 lex(ongelmasta | problems) = (0 + m 3 ) /2 Normal Interpolation (ratio = 0.5) Missing interpolated probabilities ! PT m lexical model P(vaikeuksista|problems) P(ongelmasta|problems) P(vaikeu|problem), P(ongelma|problem), P(t|s), P(ksi|s),P(sta|s) 11/20/ PT w lexical model Estimate lex(ongelma + sta | problem + s) using PT m lexical model  m 2 Estimate lex(ongelmat | problems) using PT w lexical model  w 3 Full Interpolation

Overview Brief recap on SMT & morphological analysis Motivation Enriched translation model – Twin phrase-table construction – Merging phrase tables Experiments Conclusion 11/20/201415

Experiments – dataset 2005 ACL shared task (Koehn & Monz, 2005) 11/20/201416

Experiments – baselines w-system: uses PT w translate at word-level m-system: uses PT m translate at morpheme-level m-BLEU: BLEU where each token unit is a morpheme 11/20/201417

Experiments – our system Improvements over m-system and w-system are statistically significant using sign test by (Collins et al. 2005) 11/20/201418

Conclusion Our contributions: Enrich the translation model without using additional data. Propose a principal way to merge phrase tables generated at different granularities. 11/20/201419

Q & A Thank you !!! 11/20/201420