PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
PFA Node Alignment Algorithm Each of the nodes stores a value. All nodes are initialized with the value 1. Each Word to Word alignment is assigned a unique prime number.
PFA Node Alignment Algorithm For every word to word alignment, we do the following: Let p be the unique prime value assigned to the alignment. Let w s and w t be the aligned words on the source and target side. Assign the value p to the nodes corresponding to the words w s and w t. Example: “Australia” gets value 2, “is” gets value 3.
PFA Node Alignment Algorithm In case there are “one-to- many” alignments, they are considered as multiple “one-to-one” alignments, and all of these alignments are given the same prime value. Example: “North Korea” is just one word on Chinese side. That word is assigned the value 25, which is a product 5*5.
PFA Node Alignment Algorithm Once all the lexical items have values, we propogate the values up the tree as follows: Work bottom-up A node updates its value as the product of the values of its children.
PFA Node Alignment Algorithm Once all the lexical items have values, we propogate the values up the tree as follows: Work bottom-up A node updates its value as the product of the values of its children. Values could become large!
PFA Node Alignment Algorithm Once all nodes have values, they can be aligned as follows: If a node on Chinese side has a value same as node on English side, align them. If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.
PFA Node Alignment Algorithm Once all nodes have values, they can be aligned as follows: If a node on Chinese side has a value same as node on English side, align them. If two nodes have equal values, take the node at lowest level in the tree, but not the lexical level node.
PFA Node Alignment Algorithm Features of the algorithm: 1.Order of the constituents does not matter in node alignment. 2.Extra words in constituents are allowed, but the least number of them is allowed.
PFA Node Alignment Algorithm Extraction of Phrases: Get the Yields of the aligned nodes and build a phrase table tagged with syntactic categories on source and target sides! Example: NP # NP :: 澳洲 # Australia
PFA Node Alignment Algorithm All Phrases from this tree: 1.IP # S :: 澳洲 是 与 北韩 有 邦交 的 少数 国家 之一 。 # Australia is one of the few countries that have diplomatic relations with North Korea. 2.VP # VP :: 是 与 北韩 有 邦交 的 少数 国家 之一 # is one of the few countries that have diplomatic relations with North Korea 3.NP # NP :: 与 北韩 有 邦交 的 少数 国家 之一 # one of the few countries that have diplomatic relations with North Korea 4.VP # VP :: 与 北韩 有 邦交 # have diplomatic relations with North Korea 5.NP # NP :: 邦交 # diplomatic relations 6.NP # NP :: 北韩 # North Korea 7.NP # NP :: 澳洲 # Australia
PFA Node Alignment Performance If data is manually word-aligned, alignment error rate is very small, so is the PFA Node- Alignment Error Rate. What happens when word-alignments are done automatically?
PFA Node Alignment Performance Evaluation Data: Treebank corpus. – Parallel Chinese-English Treebank with manual word- alignments – 3342 Sentence Pairs Node Alignments: (About 12/tree pair) NP to NP Alignments: 5427 – (Makes good phrase table!) With manual alignments as gold standard, evaluation done with automatic word alignments.
PFA Node Alignment Performance Viterbi Combination StrategyPrecisionRecall Intersection Union Sym-1 (Thot Toolkit) Sym-2 (Thot Toolkit) Grow-Diag-Final (Pharaoh) Viterbi word alignments from Chinese-English and reverse directions were merged Using different algorithms to test the performance of Node-Alignment