Download presentation
Presentation is loading. Please wait.
1
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 1 Competitive Grouping in Integrated Segmentation and Alignment Model Ying ZhangStephan Vogel Language Technologies Institute School of Computer Science Carnegie Mellon University
2
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 2 Integrated Segmentation and Alignment Model Phrase alignment models (Och et al., 1999; Marcu and Wong, 2002; Kohen et al., 2003) –Many of these models rely on the pre-calculated word alignment. –Use different heuristics to extract phrase pairs from the Viterbi word alignment path. Integrated Segmentation and Alignment model (Zhang 2003) –No such word alignments needed –Segment source and target sentences into phrases and align them simultaneously –Use chi-square(f, e) instead of the conditional probability P(f|e) for word pair associations –Greedy search for phrase pairs –Key idea: competitive grouping algorithm –Inspired by the competitive linking algorithm (Melamed 1997) for word alignment
3
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 3 Competitive Linking Algorithm A greedy word alignment algorithm. The word pair has the highest likelihood L(f,e) “wins” the competition. One-to-one assumption: when pair{f, e} is “linked”, neither f nor e can be aligned with any other words. Example:
4
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 4 Competitive Grouping Algorithm Discard the one-to-one assumption in competitive linking, make it less greedy. When a pair {e, f} wins the competition, inviting the neighboring pairs to join the “winner’s club”. Introducing the locality assumption: a source phrase of adjacent words can only be aligned to a target phrase of adjacent words. –Words inside the aligned phrase pairs can not be aligned to other words
5
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 5 Expanding the Phrase Pair Aligned Two criteria have to be satisfied to expand the seeding word pair to phrase pairs 1.If a new source word f is to be grouped, the best e that f is associated should not be “blocked” by this expansion; similar for grouping a new target word. 2.The highest word pair likelihood value in the expanded area needs to be “similar” to the seed value According to the locality assumption, words in the aligned phrase pairs can not be aligned with other words again.
6
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 6 Exploring All Possible Phrase Pairs Criterion 2 is used to control the granularity of the phrase pairs aligned –Two short phrase pairs –Or one long phrase pairs Short phrases give better coverage for unseen testing data Long phrases encapsulate more context, e.g. local reordering, word sense, and etc. Hard to decided on the optimal granularity without knowing the testing data Solution: for each grouping, try all possible granularities
7
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 7 Exploring All Possible Phrase Pairs French: Je déclare reprise la session English: I declare resumed the session
8
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 8 The Likelihood of Word Associations Chi-square statistics is used to measure the likelihood of word associations for pair {e, f} For each word pair {e, f} null hypothesis: e and f are independent of each other. Calculating to measure how true is this hypothesis Construct the contingency table using the counts from the corpus given the current alignment, e.g. uniform alignment –O 11 : number of times when e and f are aligned –O 12 : number of times when e aligned with other f –O 21 : number of times when f aligned with other e –O 22 : number of times when other f aligned with other e f~f eO 11 O 12 ~eO 21 O 22
9
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 9 In WPT-05 Submitted results for all four languages Training data as provided Language model as provided Decoder (Pharaoh) as provided BLEUGermanSpanishFinnishFrench Dev-test18.6326.2012.8826.20 Test18.9326.1412.6626.71
10
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 10 Conclusion Competitive grouping algorithm at the core of the ISA model Simple and efficient model Comparable results as other phrase alignment models
11
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 11 The Evolution of ISA
12
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 12 Matrix of the Likelihood
13
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June 2005 13 Expanding the Phrase Pairs
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.