PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.

Slides:



Advertisements
Similar presentations
Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.
Advertisements

Combining Word-Alignment Symmetrizations in Dependency Tree Projection David Mareček Charles University in Prague Institute of.
Cluster Computing for Statistical Machine Translation Qin Gao, Kevin Gimpel, Alok Palikar, Andreas Zollmann Stephan Vogel, Noah Smith.
Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
CAS LX 522 Syntax I Week 3b. Constituents.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
FACTORS.
Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Resource Acquisition for Syntax-based MT from Parsed Parallel data Alon Lavie, Alok Parlikar and Vamshi Ambati Language Technologies Institute Carnegie.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.
Features and Unification
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Even Numbers – Any number that can be divided by 2
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Stat-XFER: A General Framework for Search-based Syntax-driven MT Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
中文信息处理 Chinese NLP Lecture 9.
Database Management 9. course. Execution of queries.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
 Motivation & Previous Work  Sentence Compression Approach  Linguistically-motivated Heuristics  Word Significance  Compression Generation and Selection.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Machine Translation Syntax-Based Translation Models – Principles, Approaches, Acquisition Alon Lavie 16 March 2011.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
CPSC 503 Computational Linguistics
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
CMU Statistical-XFER System Hybrid “rule-based”/statistical system Scaled up version of our XFER approach developed for low-resource languages Large-coverage.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
CIS Treebanks, Trees, Querying, QC, etc. Seth Kulick Linguistic Data Consortium University of Pennsylvania
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
Treebanks, Trees, Querying, QC, etc.
Statistical NLP Winter 2009
Basic Parsing with Context Free Grammars Chapter 13
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Automatic Detection of Causal Relations for Question Answering
Stat-XFER: A General Framework for Search-based Syntax-driven MT
Statistical Machine Translation Papers from COLING 2004
Dekai Wu Presented by David Goss-Grubbs
Presentation transcript:

PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.

PFA Node Alignment Algorithm Each of the nodes stores a value. All nodes are initialized with the value 1. Each Word to Word alignment is assigned a unique prime number.

PFA Node Alignment Algorithm For every word to word alignment, we do the following: Let p be the unique prime value assigned to the alignment. Let w s and w t be the aligned words on the source and target side. Assign the value p to the nodes corresponding to the words w s and w t. Example: “Australia” gets value 2, “is” gets value 3.

PFA Node Alignment Algorithm In case there are “one-to- many” alignments, they are considered as multiple “one-to-one” alignments, and all of these alignments are given the same prime value. Example: “North Korea” is just one word on Chinese side. That word is assigned the value 25, which is a product 5*5.

PFA Node Alignment Algorithm Once all the lexical items have values, we propogate the values up the tree as follows: Work bottom-up A node updates its value as the product of the values of its children.

PFA Node Alignment Algorithm Once all the lexical items have values, we propogate the values up the tree as follows: Work bottom-up A node updates its value as the product of the values of its children. Values could become large!

PFA Node Alignment Algorithm Once all nodes have values, they can be aligned as follows: If a node on Chinese side has a value same as node on English side, align them. If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.

PFA Node Alignment Algorithm Once all nodes have values, they can be aligned as follows: If a node on Chinese side has a value same as node on English side, align them. If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.

PFA Node Alignment Algorithm Features of the algorithm: 1.Order of the constituents does not matter in node alignment. 2.Extra words in constituents are allowed, but the least number of them is allowed.

PFA Node Alignment Algorithm Extraction of Phrases: Get the Yields of the aligned nodes and build a phrase table tagged with syntactic categories on source and target sides! Example: NP # NP :: 澳洲 # Australia

PFA Node Alignment Algorithm All Phrases from this tree: 1.IP # S :: 澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 。 # Australia is one of the few countries that have diplomatic relations with North Korea. 2.VP # VP :: 是 与 北韩 有 邦交 的 少数 国家 之一 # is one of the few countries that have diplomatic relations with North Korea 3.NP # NP :: 与 北韩 有 邦交 的 少数 国家 之一 # one of the few countries that have diplomatic relations with North Korea 4.VP # VP :: 与 北韩 有 邦交 # have diplomatic relations with North Korea 5.NP # NP :: 邦交 # diplomatic relations 6.NP # NP :: 北韩 # North Korea 7.NP # NP :: 澳洲 # Australia

PFA Node Alignment Performance If data is manually word-aligned, alignment error rate is very small, so is the PFA Node- Alignment Error Rate. What happens when word-alignments are done automatically?

PFA Node Alignment Performance Evaluation Data: Treebank corpus. – Parallel Chinese-English Treebank with manual word- alignments – 3342 Sentence Pairs Node Alignments: (About 12/tree pair) NP to NP Alignments: 5427 – (Makes good phrase table!) With manual alignments as gold standard, evaluation done with automatic word alignments.

PFA Node Alignment Performance Viterbi Combination StrategyPrecisionRecall Intersection Union Sym-1 (Thot Toolkit) Sym-2 (Thot Toolkit) Grow-Diag-Final (Pharaoh) Viterbi word alignments from Chinese-English and reverse directions were merged Using different algorithms to test the performance of Node-Alignment