A Lightweight and High Performance Monolingual Word Aligner Xuchen Yao, Benjamin Van Durme, (Johns Hopkins) Chris Callison-Burch and Peter Clark (UPenn) (Vulcan)
ACL 2013, Sofia2 monolingual word alignment Aligning one sentence pair from RTE2 Premise: Linda Johnson, who lives with her husband, Charles, and two cats in..., said Katrina has... Hypothesis: Linda Johnson is married to Charles alignment contributed by Brockett (2007)
ACL 2013, Sofia3 monolingual vs. bilingual aligment less training data (labeled or unlabeled), but more lexical resources semantic relatedness: cued by distributional word similaries the same grammar shared by source/target sentences
ACL 2013, Sofia4 monolingual vs. bilingual aligment less training data (labeled or unlabeled), but more lexical resources semantic relatedness: cued by distributional word similaries the same grammar shared by source/target sentences
ACL 2013, Sofia5 monolingual vs. bilingual aligment less training data (labeled or unlabeled), but more lexical resources semantic relatedness: cued by distributional word similaries the same grammar shared by source/target sentences
ACL 2013, Sofia6 a discriminative model first proposed by Blunsom and Cohn (2006): s, t: source (observation), target sentence a: target word indices (0 to target length), state 0 is NULL state for deletion. f(): feature functions
ACL 2013, Sofia7 a discriminative model first proposed by Blunsom and Cohn (2006): s, t: source (observation), target sentence a: target word indices (0 to target length), state 0 is NULL state for deletion. f(): feature functions
ACL 2013, Sofia8 a discriminative model first proposed by Blunsom and Cohn (2006): s, t: source (observation), target sentence a: target word indices (0 to target length), state 0 is NULL state for deletion. f(): feature functions
ACL 2013, Sofia9
ACL 2013, Sofia10 desired Viterbi decoding path
ACL 2013, Sofia11 a discriminative model first proposed by Blunsom and Cohn (2006): s, t: source (observation), target sentence a: target word indices (0 to target length), state 0 is NULL state for deletion. f(): feature functions
ACL 2013, Sofia12 features string similarity –Jaro Winkler, Dice Sorensen, Hamming, Jaccard, Levenshtein, NGram overlapping and common prefix matching POS tags matching WordNet –hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part
ACL 2013, Sofia13 features string similarity –Jaro Winkler, Dice Sorensen, Hamming, Jaccard, Levenshtein, NGram overlapping and common prefix matching POS tags matching WordNet –hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part
ACL 2013, Sofia14 features string similarity –Jaro Winkler, Dice Sorensen, Hamming, Jaccard, Levenshtein, NGram overlapping and common prefix matching POS tags matching WordNet –hypernym, hyponym, synonym, derived form, entailing, causing, members of, have member, substances of, have substances, parts of, have part
ACL 2013, Sofia15 features positional –offset difference between src/tgt word context –whether neighboring words are similar –helps to align functional words distortion (Markov feature) –how far apart are two aligned target words
ACL 2013, Sofia16 features positional –offset difference between src/tgt word context –whether neighboring words are similar –helps to align functional words distortion (Markov feature) –how far apart are two aligned target words
ACL 2013, Sofia17 features positional –offset difference between src/tgt word context –whether neighboring words are similar –helps to align functional words distortion (Markov feature) –how far apart are two aligned target words
ACL 2013, Sofia18 Implementation: jacana-align source code at lightweight: only used a POS tagger and WordNet written in Scala, optimize with LBFGS platform independent, compiles to a.jar file, fully interoperable with Java high performance? -> evaluation
ACL 2013, Sofia19 Baselines GIZA++ Tree Edit Distance (with stem/wordnet matching) MANLI –MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language Inference, EMNLP 2008 MANLI-constraint (decoding with ILP) –Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for monolingual phrase-based alignment. ACL 2011
ACL 2013, Sofia20 Baselines GIZA++ Tree Edit Distance (with stem/wordnet matching) MANLI –MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language Inference, EMNLP 2008 MANLI-constraint (decoding with ILP) –Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for monolingual phrase-based alignment. ACL 2011
ACL 2013, Sofia21 Baselines GIZA++ Tree Edit Distance (with stem/wordnet matching) MANLI –MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language Inference, EMNLP 2008 MANLI-constraint (decoding with ILP) –Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for monolingual phrase-based alignment. ACL 2011
ACL 2013, Sofia22 Baselines GIZA++ Tree Edit Distance (with stem/wordnet matching) MANLI –MacCartney, B.; Galley, M. & Manning, C. D., A Phrase-Based Alignment Model for Natural Language Inference, EMNLP 2008 MANLI-constraint (decoding with ILP) –Thadani, K. & McKeown, K. Optimal and syntactically-informed decoding for monolingual phrase-based alignment. ACL 2011
ACL 2013, Sofia23 performance in F1 10.3%
ACL 2013, Sofia24 performance in F1 0.8% 3.3%
ACL 2013, Sofia25 performance in speed (seconds per sentecne) when sentences are more balanced, jacana- align is about 20x faster corpussentence pair length MANLI-approx.MANLI-exactjacana-align RTE229/111.67s0.08s0.025s FUSION27/ s2.45s0.096s 20x
ACL 2013, Sofia26 performance in speed (seconds per sentecne) the speed of jacana-align is not as sensitive to sentence length increase corpussentence pair length MANLI-approx.MANLI-exactjacana-align RTE229/111.67s0.08s0.025s FUSION27/ s2.45s0.096s 30x 4x
ACL 2013, Sofia27 Conclusion state-of-the-art monolingual word aligner –in accuracy –in speed open source, use it and hack it!
thank you with a demo