Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.


Similar presentations
Statistical Machine Translation

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.
MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Robust Textual Inference via Graph Matching Aria Haghighi Andrew Ng Christopher Manning.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Name Extraction from Chinese Novels CS224n Spring 2008 Jing Chen and Raylene Yung.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Parser Adaptation and Projection with Quasi-Synchronous Grammar Features David A. Smith (UMass Amherst) Jason Eisner (Johns Hopkins) 1.
1 CS546: Machine Learning and Natural Language Multi-Class and Structured Prediction Problems Slides from Taskar and Klein are used in this lecture TexPoint.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
An Effective Approach for Searching Closest Sentence Translations from The Web Ju Fan, Guoliang Li, and Lizhu Zhou Database Research Group, Tsinghua University.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Automatic Post-editing (pilot) Task Rajen Chatterjee, Matteo Negri and Marco Turchi Fondazione Bruno Kessler [ chatterjee | negri | turchi
Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Revisiting Output Coding for Sequential Supervised Learning Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Statistical NLP Spring 2011 Lecture 3: Language Models II Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.
Statistical Machine Translation Part II: Word Alignments and EM
CSE 517 Natural Language Processing Winter 2015
Statistical NLP Spring 2010
Suggestions for Class Projects
Statistical NLP Spring 2011
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Statistical Machine Translation Papers from COLING 2004
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Statistical NLP Spring 2011
Presentation transcript:

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

Identifying Phrasal Translations Inthepasttwoyears, anumberofUScitizens… 过去 两年中, 一批美国公民 … past twoyearin,onelotsUScitizen Phrase alignment models:Choose a segmentation and a one- to-one phrase alignment Underlying assumption:There is a correct phrasal segmentation

Unique Segmentations? Inthepasttwoyears, anumberofUScitizens… 过去 两年中, 一批美国公民 … past twoyearin,onelotsUScitizen Problem 1:Overlapping phrases can be useful (and complementary) Problem 2:Phrases and their sub-phrases can both be useful Hypothesis:This is why models of phrase alignment don’t work well

Identifying Phrasal Translations This talk:Modeling sets of overlapping, multi-scale phrase pairs Inthepasttwoyears, anumberofUScitizens… 过去 两年中, 一批美国公民 … past twoyearin,onelotsUScitizen Input: sentence pairs Output: extracted phrases

… But the Standard Pipeline has Overlap! M OTIVATION Inthepasttwoyears 过去 两 年 中 past two year in Sentence Pair Word Alignment Extracted Phrases

Related Work M OTIVATION Sentence Pair Word Alignment Extracted Phrases Translation models:Sinuhe system (Kääriäinen, 2009) Combining Aligners:Yonggang Deng & Bowen Zhou (2009) Fixed alignments; learned phrase pair weights Fixed directional alignments; learned symmetrization Extraction models:Moore and Quirk, 2007 Fixed alignments; learned phrase pair weights

Our Task: Predict Extraction Sets M OTIVATION Sentence Pair Extracted Phrases Conditional model of extraction sets given sentence pairs Inthepasttwoyears 过去 两 年 中 Inthepasttwoyears 过去 两 年 中 Extracted Phrases + ``Word Alignments’’

Alignments Imply Extraction Sets M ODEL Inthepasttwoyears 过去 两 年 中 past two year in Word-level alignment links Word-to-span alignments Extraction set of bispans

Nulls and Possibles 据 报道 according to news report itisreported 据 报道 according to news report itisreported Nulls: Possibles:

Incorporating Possible Alignments M ODEL Inthepasttwoyears 过去 两 年 中 past two year in Sure and possible word links Word-to-span alignments Extraction set of bispans

Linear Model for Extraction Sets M ODEL Inthepasttwoyears 过去 两 年 中 Features on sure links Features on all bispans

Features on Bispans and Sure Links F EATURES 过 地球 go over Earth overtheEarth Some features on sure links HMM posteriors Presence in dictionary Numbers & punctuation Features on bispans HMM phrase table features: e.g., phrase relative frequencies Lexical indicator features for phrases with common words Monolingual phrase features: e.g., “the _____” Shape features: e.g., Chinese character counts

Getting Gold Extraction Sets T RAINING Hand Aligned: Sure and possible word links Word-to-span alignments Extraction set of bispans Deterministic: A bispan is included iff every word within the bispan aligns within the bispan Deterministic: Find min and max alignment index for each word

Discriminative Training with MIRA T RAINING Loss function:F-score of bispan errors (precision & recall) Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin Gold (annotated)Guess (arg max w∙ɸ)

Inference: An ITG Parser I NFERENCE ITG captures some bispans

Coarse-to-Fine Approximation I NFERENCE Coarse Pass: Features that are local to terminal productions Fine Pass: Agenda search using coarse pass as a heuristic We use an agenda-based parser. It’s fast!

Experimental Setup R ESULTS Chinese-to-English newswire Parallel corpus: 11.3 million words; sentences length ≤ 40 MT systems: Tuned and tested on NIST ‘04 and ‘05 Supervised data: 150 training & 191 test sentences (NIST ‘02) Unsupervised Model: Jointly trained HMM (Berkeley Aligner)

Baselines and Limited Systems R ESULTS HMM: ITG: Coarse: State-of-the-art unsupervised baseline Joint training & competitive posterior decoding Source of many features for supervised models Supervised ITG aligner with block terminals State-of-the-art supervised baseline Re-implementation of Haghighi et al., 2009 Supervised block ITG + possible alignments Coarse pass of full extraction set model

Word Alignment Performance R ESULTS

Extracted Bispan Performance R ESULTS

Translation Performance (BLEU) R ESULTS Supervised conditions also included HMM alignments

Conclusions Extraction set model directly learns what phrases to extract The system performs well as an aligner or a rule extractor Are segmentations always bad? Idea: get overlap and multi-scale into the learning!

Thank you! nlp.cs.berkeley.edu

The Role of Possible Alignments M ODEL Role-equivalent word/phrase pairs wasdiscovered 被 发现 passive marker discover Language-specific function words 过 地球 go over Earth overtheEarth 31% 65% If a word has no sure links, possible links define its span

Definition: Phrasal Extraction Set M ODEL Inthepasttwoyears 过去 两 年 中 past two year in the past two 过去 两 Lexical type: Position: