Statistical NLP Spring 2010

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Haitham Elmarakeby.  Speech recognition
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Statistical NLP Spring 2010 Lecture 17: Word / Phrase MT Dan Klein – UC Berkeley.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
DERIVATION S RULES USEDPROBABILITY P(s) = Σ j P(T,S) where t is a parse of s = Σ j P(T) P(T) – The probability of a tree T is the product.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Statistical NLP Spring 2010 Lecture 20: Compositional Semantics Dan Klein – UC Berkeley.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Statistical Machine Translation Part II: Word Alignments and EM
CSC 594 Topics in AI – Natural Language Processing
Statistical NLP Winter 2009
Statistical NLP Spring 2011
CSE 517 Natural Language Processing Winter 2015
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Authorship Attribution Using Probabilistic Context-Free Grammars
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Statistical NLP Spring 2011
Suggestions for Class Projects
Statistical NLP Spring 2011
--Mengxue Zhang, Qingyang Li
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Parsing and More Parsing
CSCI 5832 Natural Language Processing
Statistical Machine Translation Papers from COLING 2004
A Path-based Transfer Model for Machine Translation
Statistical NLP Spring 2011
Dekai Wu Presented by David Goss-Grubbs
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

Statistical NLP Spring 2010 Lecture 18: Phrase / Syntactic MT Dan Klein – UC Berkeley

Decoding First, consider word-to-word models Finding best alignments is easy Finding translations is hard (why?)

Bag “Generation” (Decoding)

Bag Generation as a TSP Imagine bag generation with a bigram LM Words are nodes Edge weights are P(w|w’) Valid sentences are Hamiltonian paths Not the best news for word-based MT (and it doesn’t get better with phrases) is it . not clear

IBM Decoding as a TSP

Phrase-Based Systems Phrase table (translation model) Sentence-aligned corpus cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 … Phrase table (translation model) Word alignments

The Pharaoh “Model” [Koehn et al, 2003] Segmentation Translation Distortion

The Pharaoh “Model” Where do we get these counts?

Phrase-Based Decoding 这 7人 中包括 来自 法国 和 俄罗斯 的 宇航 员 . Decoder design is important: [Koehn et al. 03]

The Pharaoh Decoder Probabilities at each step include LM and TM

Hypotheis Lattices

Pruning Problem: easy partial analyses are cheaper Solution 1: use beams per foreign subset Solution 2: estimate forward costs (A*-like)

Phrase Scoring Learning weights has been tried, several times: [Marcu and Wong, 02] [DeNero et al, 06] … and others Seems not to work well, for a variety of partially understood reasons Main issue: big chunks get all the weight, obvious priors don’t help Though, [DeNero et al 08] les chats aiment le poisson cats like fresh fish . frais

Extraction Sets GIZA++ BITG ExSets [DeNero and Klein, in submission]

Phrase Size Phrases do help But they don’t need to be long Why should this be?

Lexical Weighting

WSD? Remember when we discussed WSD? Word-based MT systems rarely have a WSD step Why not?

Syntax-Based MT

Translation by Parsing

Translation by Parsing

Compact Forests

Compact Forests

Compact Forests

Compact Forests

Learning MT Grammars

Extracting syntactic rules Extract rules (Galley et. al. ’04, ‘06)

Rules can... capture phrasal translation reorder parts of the tree traverse the tree without reordering insert (and delete) words

Bad alignments make bad rules This isn’t very good, but let’s look at a worse example...

Sometimes they’re really bad One bad link makes a totally unusable rule!

Discriminative Block ITG Features φ( b0, s, s’ ) φ( b1, s, s’ ) φ( b2, s, s’ ) b0 b1 b2 recent years country entire in the warn to 近年 全 提醒 来 国 [Haghighi, Blitzer, Denero, and Klein, ACL 09]

Syntactic Correspondence Build a model 中文 EN

Synchronous Grammars? Now how do we talk about a model which has 2 corresponding trees? Synchronicity

Synchronous Grammars?

Synchronous Grammars?

Adding Syntax: Weak Synchronization Block ITG Alignment

Adding Syntax: Weak Synchronization Separate PCFGs

Adding Syntax: Weak Synchronization Get points for synchronization; not required

Weakly Synchronous Features Parsing Alignment (IP, s) (b0, s, s’) (NP, s) (b1, s, s’) (VP, s) (b2, s, s’) (S, s’) (IP, b0) (NP, s’) (b0, S) (AP, s’) (b1, NP) (VP, s’) (IP, b0, S) NP AP VP b0 b1 NP Agreement IP b2 VP

Weakly Synchronous Model EN EN 中文 中文 EN 中文 EN EN 中文 中文 Feature Type 1: Word Alignment EN 中文 Feature Type 3: Agreement 办公室 office PP [HBDK09] Our model can do more. We can handle non-synchronicity. Feature Type 2: Monolingual Parser EN PP in the office

Inference: Structured Mean Field Problem: Summing over weakly aligned hypotheses is intractable Factored approximation: EN 中文 Set to minimize EN 中文 Algorithm Initialize: Iterate: PP PP

Results [Burkett, Blitzer, and Klein, NAACL 10]

Incorrect English PP Attachment

Corrected English PP Attachment

Improved Translations 目前 导致 飞机 相撞 的 原因 尚 不 清楚, 当地 民航 部门 将 对此 展开 调查 Cur-rently cause plane crash DE reason yet not clear, local civil aero-nautics bureau will toward open investi-gations Reference At this point the cause of the plane collision is still unclear. The local caa will launch an investigation into this . Baseline (GIZA++) The cause of planes is still not clear yet, local civil aviation department will investigate this . mu4qian1 dao3zhi4 fei1ji1 xiang1zhuang4 DE yuan2yin1 shang4 bu4 qing1chu3, dang1di4 min2hang2 bu4men2 jiang1 dui4ci3 zhan3kai1 diao4cha2 Bilingual Adaptation Model The cause of plane collision remained unclear, local civil aviation departments will launch an investigation .