2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

The Learning Non-Isomorphic Tree Mappings for Machine Translation Jason Eisner - Johns Hopkins Univ. a b A B events of misinform wrongly report to-John.

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

Multilinugual PennTools that capture parses and predicate-argument structures, and their use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus,

Iowa State University Department of Computer Science, Iowa State University Artificial Intelligence Research Laboratory Center for Computational Intelligence,

1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.

In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.

1 Unsupervised Semantic Parsing Hoifung Poon and Pedro Domingos EMNLP 2009 Best Paper Award Speaker: Hao Xiong.

Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang 1, Xiaolei Huang 2, Dimitris Metaxas 1 Rutgers University 1, Lehigh University.

1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.

A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.

Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.

Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.

LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.

Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.

Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München

1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.

Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.

Natural Language Processing Expectation Maximization.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.

What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.

What you have learned and how you can use it : Grammars and Lexicons Parts I-III.

Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Supertagging CMSC Natural Language Processing January 31, 2006.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.

Approaches to Machine Translation

Parsing in Multiple Languages

Authorship Attribution Using Probabilistic Context-Free Grammars

Urdu-to-English Stat-XFER system for NIST MT Eval 2008

Statistical NLP: Lecture 13

Approaches to Machine Translation

Statistical Machine Translation Papers from COLING 2004

Dekai Wu Presented by David Goss-Grubbs

Presentation transcript:

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania2 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

2003 (c) University of Pennsylvania3 Motivation (1) Statistical MT Approaches Statistical MT approaches Pioneered by (Brown et al., 1990, 1993) Leverage large training corpus Outperform traditional transfer based approaches Major Criticism No internal representation, syntax/semantics

2003 (c) University of Pennsylvania4 Motivation (2) Hybrid Approaches Hybrid approaches (Wu, 1997) (Alshawi et al., 2000) (Yamada and Knight, 2001, 2002) (Gildea 2003) Applying statistical learning to structured data Problems with Hybrid MT Approaches Structural Divergence (Dorr, 1994) Vagaries of loose translations in real corpora

2003 (c) University of Pennsylvania5 Motivation (3) Holy grail: Syntax based MT which captures structural divergence Accomplished work A new approach to the alignment of parallel dependency trees (paper published at MT summit IX) Allowing non-isomorphism of dependency trees

2003 (c) University of Pennsylvania6 We are here…

2003 (c) University of Pennsylvania7 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

2003 (c) University of Pennsylvania8 Define the Alignment Problem Define the alignment problem In natural language: find word mappings between English and Foreign sentences In math: Definition For each, find a labeling, where

2003 (c) University of Pennsylvania9 The IBM Models The IBM way Model 1: Orders of words don’t matter, i.e. “bag of words” model Model 2: Condition the probabilities on the length and position Model 3, 4, 5: A. generate fertility of each english word B. generate the identity C. generate the position Gradually adding positioning information

2003 (c) University of Pennsylvania10 Using Dependency Trees Positioning information can be acquired from parse trees Parsers: (Collins, 1999) (Bikel, 2002) Problems with using parse trees directly Two types of nodes Unlexicalized non-terminals control the domain Using dependency trees (Fox, 2002): best* phrasal cohesion properties (Xia, 2001): constructing dependency trees from parse trees using the Tree Adjoining Grammar

2003 (c) University of Pennsylvania11 The Framework (1) Step 1: train IBM model 1 for lexical mapping probabilities Step 2: find and fix high confidence mappings according to a heuristic function h(f, e) The girl kissed her kitty cat The girl gave a kiss to her cat A pseudo-translation example

2003 (c) University of Pennsylvania12 The Framework (2) Step 3: Partition the dependency trees on both sides w.r.t. fixed mappings One fixed mapping creates one new “treelet” Create a new set of parallel dependency structures

2003 (c) University of Pennsylvania13 The Framework (3) Step 4: Go back to Step 1 unless enough nodes fixed Algorithm properties An iterative algorithm Time complexity O(n * T(h)), where T(h) is the time for the heuristic function in Step 2. P(f |e) in IBM Model 1 has a unique global maximun Guaranteed convergence Results only depend on the heuristic function h(f, e)

2003 (c) University of Pennsylvania14 Heuristics Heuristic functions for Step 2 Objective: find out the confidence of a mapping between a pair of words First Heuristic: Entropy Intuition: model probability distribution shape Second heuristic: Inside-outside probability Idea borrowed from PCFG parsing Fertility threshold: rule out unlikely fertility ratio (>2.0)

2003 (c) University of Pennsylvania15 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

2003 (c) University of Pennsylvania16 Walking through an Example (1) [English] I have been here since [Chinese] 1947 nian yilai wo yizhi zhu zai zheli. Iteration 1: One dependency tree pair. Align “I” and “wo”

2003 (c) University of Pennsylvania17 Walking through an Example (2) Iteration 2: Partition and form two treelet pairs. Align “since” and “yilai”

2003 (c) University of Pennsylvania18 Walking through an Example (3) Iteration 3: Partition and form three treelet pairs. Align “1947” and “1947”, “here” and “zheli”

2003 (c) University of Pennsylvania19 Outline Motivation The alignment algorithm Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

2003 (c) University of Pennsylvania20 Evaluation Training: LDC Xinhua newswire Chinese – English parallel corpus Filtered roughly 50%, 60K+ sentence pairs used The parser generated parsed sentence pairs. Evaluation: 500 sentence pairs provided by Microsoft Research Asia. Word level aligned by hand. F-score: A: set of word pairs aligned by automatic alignment G: set of word pairs aligned in the gold file.

2003 (c) University of Pennsylvania21 Results (1) Results for IBM Model 1 to Model 4 (GIZA) Bootstrapped from Model 1 to Model 4 Signs of overfitting Suspect caused by difference b/w genres in training/testing Itn#IBM 1IBM 2IBM 3IBM

2003 (c) University of Pennsylvania22 Results (2) Results for our algorithm: Heuristic h1: (entropy) Heuristic h2: (inside-outside probability) The table shows results after one iteration, M1 = IBM model 1 Overfitting problem mainly caused by violation of the partition assumption in fine- grained dependency structures. M1 Itn# Model h1 Model h

2003 (c) University of Pennsylvania23 Outline Motivation Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

2003 (c) University of Pennsylvania24 Conclusion Model based on partitioning sentences according to their dependency structure Without the unrealistic isomorphism assumption Outperforms the unstructured IBM models on a large data set. “Orthogonal” to the IBM models uses syntactic structure but no linear ordering information.

2003 (c) University of Pennsylvania25 Thank You!