Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

Identifying Phrasal Translations Inthepasttwoyears, anumberofUScitizens… 过去两年中, 一批美国公民 … past twoyearin,onelotsUScitizen Phrase alignment models:Choose a segmentation and a one- to-one phrase alignment Underlying assumption:There is a correct phrasal segmentation

Unique Segmentations? Inthepasttwoyears, anumberofUScitizens… 过去两年中, 一批美国公民 … past twoyearin,onelotsUScitizen Problem 1:Overlapping phrases can be useful (and complementary) Problem 2:Phrases and their sub-phrases can both be useful Hypothesis:This is why models of phrase alignment don’t work well

Identifying Phrasal Translations This talk:Modeling sets of overlapping, multi-scale phrase pairs Inthepasttwoyears, anumberofUScitizens… 过去两年中, 一批美国公民 … past twoyearin,onelotsUScitizen Input: sentence pairs Output: extracted phrases

… But the Standard Pipeline has Overlap! M OTIVATION Inthepasttwoyears 过去两年中 past two year in Sentence Pair Word Alignment Extracted Phrases

Related Work M OTIVATION Sentence Pair Word Alignment Extracted Phrases Translation models:Sinuhe system (Kääriäinen, 2009) Combining Aligners:Yonggang Deng & Bowen Zhou (2009) Fixed alignments; learned phrase pair weights Fixed directional alignments; learned symmetrization Extraction models:Moore and Quirk, 2007 Fixed alignments; learned phrase pair weights

Our Task: Predict Extraction Sets M OTIVATION Sentence Pair Extracted Phrases Conditional model of extraction sets given sentence pairs Inthepasttwoyears 过去两年中 0 1 2 3 4 012345 Inthepasttwoyears 过去两年中 0 1 2 3 4 012345 Extracted Phrases + ``Word Alignments’’

Alignments Imply Extraction Sets M ODEL Inthepasttwoyears 过去两年中 past two year in 0 1 2 3 4 012345 Word-level alignment links Word-to-span alignments Extraction set of bispans

Nulls and Possibles 据报道 according to news report itisreported 据报道 according to news report itisreported Nulls: Possibles:

Incorporating Possible Alignments M ODEL Inthepasttwoyears 过去两年中 past two year in 0 1 2 3 4 012345 Sure and possible word links Word-to-span alignments Extraction set of bispans

Linear Model for Extraction Sets M ODEL Inthepasttwoyears 过去两年中 0 1 2 3 4 012345 Features on sure links Features on all bispans

Features on Bispans and Sure Links F EATURES 过地球 go over Earth overtheEarth Some features on sure links HMM posteriors Presence in dictionary Numbers & punctuation Features on bispans HMM phrase table features: e.g., phrase relative frequencies Lexical indicator features for phrases with common words Monolingual phrase features: e.g., “the _____” Shape features: e.g., Chinese character counts

Getting Gold Extraction Sets T RAINING Hand Aligned: Sure and possible word links Word-to-span alignments Extraction set of bispans Deterministic: A bispan is included iff every word within the bispan aligns within the bispan Deterministic: Find min and max alignment index for each word

Discriminative Training with MIRA T RAINING Loss function:F-score of bispan errors (precision & recall) Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin Gold (annotated)Guess (arg max w∙ɸ)

Inference: An ITG Parser I NFERENCE ITG captures some bispans

Coarse-to-Fine Approximation I NFERENCE Coarse Pass: Features that are local to terminal productions Fine Pass: Agenda search using coarse pass as a heuristic We use an agenda-based parser. It’s fast!

Experimental Setup R ESULTS Chinese-to-English newswire Parallel corpus: 11.3 million words; sentences length ≤ 40 MT systems: Tuned and tested on NIST ‘04 and ‘05 Supervised data: 150 training & 191 test sentences (NIST ‘02) Unsupervised Model: Jointly trained HMM (Berkeley Aligner)

Baselines and Limited Systems R ESULTS HMM: ITG: Coarse: State-of-the-art unsupervised baseline Joint training & competitive posterior decoding Source of many features for supervised models Supervised ITG aligner with block terminals State-of-the-art supervised baseline Re-implementation of Haghighi et al., 2009 Supervised block ITG + possible alignments Coarse pass of full extraction set model

Word Alignment Performance R ESULTS

Extracted Bispan Performance R ESULTS

Translation Performance (BLEU) R ESULTS Supervised conditions also included HMM alignments

Conclusions Extraction set model directly learns what phrases to extract The system performs well as an aligner or a rule extractor Are segmentations always bad? Idea: get overlap and multi-scale into the learning!

Thank you! nlp.cs.berkeley.edu

The Role of Possible Alignments M ODEL Role-equivalent word/phrase pairs wasdiscovered 被发现 passive marker discover Language-specific function words 过地球 go over Earth overtheEarth 31% 65% If a word has no sure links, possible links define its span

Definition: Phrasal Extraction Set M ODEL Inthepasttwoyears 过去两年中 past two year in the past two 过去两 0-21-4 0 1 2 3 4 012345 Lexical type: Position:

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Similar presentations

Presentation on theme: "Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.

Similar presentations

Presentation on theme: "Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint."— Presentation transcript:

Similar presentations

About project

Feedback