From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen.

From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen Livescu, and Dan Roth

The PPDB (Ganitkevitch et. al, 2013) is a vast collection of paraphrase pairs Motivation that allow thewhich enable the be given the opportunity tohave the possibility of i can hardly hear you.you 're breaking up. and the establishmentas well as the development laying the foundationspave the way making every effortto do its utmost ……

Motivation Improve coverage Have a parametric model Improve phrase pair scores

Contributions Powerful word embeddings that have human- level performance on SimLex999 and WordSim353 Phrase embeddings Model can re-rank phrases in PPDB 1.0 (Improve human correlation from 25 to 52 ρ.) Parameterization of PPDB that can be used downstream New datasets

Datasets Wanted clean way to evaluate paraphrase composition Two new datasets: One for bigram paraphrases and one for short-phrase paraphrases from PPDB

6 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLSimBigramPara television programmetv set5.81.0 training programmeeducation course5.75.0 bedroom windoweducation officer1.31.0

7 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLPara (this talk) MLSimMLPara television programmetv set5.81.0 training programmeeducation course5.75.0 bedroom windoweducation officer1.31.0

8 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLPara (this talk) Spearman’s rhoCohen’s kappa adjective noun0.870.79 noun 0.640.58 verb noun0.73

9 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLPara (this talk) Phrases AnnoPPDB (this talk)

10 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic

11 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic Mean Deviation: 0.60

12 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic Dev and test sets were designed to have: 1) Variety of lengths 2) Variety of quality 3) Low word overlap

13 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic See Pavlick et al., 2015 for similar but larger dataset

Learning Embeddings We now have datasets to test paraphrase similarity. Next we learn to embed words and phrases All similarities are computed using cosine distance

Learning Embeddings Related work on using PPDB to improve word embeddings: Yu and Dredze, 2014; Faruqui et al., 2015 We now have datasets to test paraphrase similarity. Next we learn to embed words and phrases All similarities are computed using cosine distance

16 Training examples (word pairs from PPDB): contaminationpollution convergedconvergence captionedsubtitled outwitthwart badvillain broadgeneral permanentpermanently bedsack carefreereckless absolutelyurgently ……

17 Loss Function for Learning sums over word pairs in PPDB

18 Loss Function for Learning sums over word pairs in PPDB positive example

19 Loss Function for Learning negative examples sums over word pairs in PPDB positive example

20 Choosing Negative Examples?

21 Choosing Negative Examples? only do argmax over current mini-batch (for efficiency)

22 Choosing Negative Examples? only do argmax over current mini-batch (for efficiency) we regularize by penalizing squared L 2 distance to initial embeddings

113k word pairs from PPDB (XL) 23 Training: WordSim353 Tuning: SimLex-999 Test: Notes:  1. trained with AdaGrad, tuned stepsize, mini-batch size, and regularization  2. initialized with 25-dim skip-gram vectors trained on Wikipedia  3. statistical significance computed using one-tailed method of Steiger (1980)  4. output of training: “paragram” embeddings contaminationpollution convergedconvergence captionedsubtitled ……

Results: SimLex-999 Spearman’s ρ × 100

Results: SimLex-999 Spearman’s ρ × 100 Paragram

170k word pairs from PPDB (XL) 26 Training: WordSim353 Tuning: SimLex-999 Test: Notes: 1. replaced dot product in objective with cosine distance  2. trained with AdaGrad, tuned stepsize, mini-batch size, margin and regularization  3. initialized with 300-dim GloVe common crawl embeddings  4. output of training: “paragram-ws353” embeddings (“paragram-sl999” if tuned on SimLex-999) contaminationpollution convergedconvergence captionedsubtitled …… Scaling up to 300 dimensions

170k word pairs from PPDB (XL) 27 Training: WordSim353 Tuning: SimLex-999 Test: Notes: 1. replaced dot product in objective with cosine distance  2. trained with AdaGrad, tuned stepsize, mini-batch, margin and regularization  3. initialized with 300-dim GloVe common crawl embeddings  4. output of training: “paragram-ws353” embeddings (“paragram-sl999” if tuned on SimLex-999) contaminationpollution convergedconvergence captionedsubtitled ……

Results: SimLex-999 Spearman’s ρ × 100

Results: SimLex-999 Paragram-ws353Human Spearman’s ρ × 100

Results: SimLex-999 Paragram-ws353Paragram-sl999Human Spearman’s ρ × 100

Results: WordSim-353 Tune on SimLex-999, test on WordSim-353 Spearman’s ρ × 100

Results: WordSim-353 Tune on SimLex-999, test on WordSim-353 HumanParagram-sl999 Spearman’s ρ × 100

Results: WordSim-353 Tune on SimLex-999, test on WordSim-353 Paragram-ws353Paragram-sl999Human Spearman’s ρ × 100

Extrinsic Evaluation: Sentiment Analysis 34 word vectorsdimensionalityaccuracy skip-gram2577.0 skip-gram5079.6 paragram2580.9 Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors 25 dimension case

Extrinsic Evaluation: Sentiment Analysis 35 word vectorsdimensionalityaccuracy skip-gram2577.0 skip-gram5079.6 paragram2580.9 Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors

Extrinsic Evaluation: Sentiment Analysis 36 word vectorsdimensionalityaccuracy GloVe30081.4 paragram-ws35330083.9 paragram-sl99930084.0 Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors 300 dimension case

Extrinsic Evaluation: Sentiment Analysis 37 word vectorsdimensionalityaccuracy GloVe30081.4 paragram-ws35330083.9 paragram-sl99930084.0 Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors

We compare standard approaches:  vector addition  recursive neural network (RvNN) (Socher et al., 2011)  recurrent neural networks (RtNN) Embedding Phrases? 38 requires binarized parse; we use Stanford parser

39 Loss Functions for Phrases replace word vectors by phrase vectors (computed by RvNN, RtNN, etc.) sum over phrase pairs in PPDB we regularize by penalizing squared L 2 distance to initial (skip-gram) embeddings and L 2 regularization on the composition parameters

bigram pairs extracted from PPDB 40 Training: MLSim (Mitchell & Lapata, 2010) Tuning: MLPara Test: adjective noun (134k)noun noun (36k)verb noun (63k) easy job simple tasktown meeting town councilachieve goal achieve aim Notes:    we extract bigram pairs of each type from PPDB using a part-of-speech tagger    when tuning/testing on one subset, we only train on bigram pairs for that subset

41 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun

42 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun Human Paragram, +

43 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun Paragram, + HumanParagram, RNN

44 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun 300 dimension case

45 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun

46 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun HumanParagram-ws353,+Paragram-sl999,+Paragram(25), RNN

60k phrase pairs from PPDB 47 Training: 260 annotated phrase pairs Tuning: 1000 annotated phrase pairs Test: that allow thewhich enable the be given the opportunity tohave the possibility of i can hardly hear you.you 're breaking up. and the establishmentas well as the development laying the foundationspave the way making every effortto do its utmost ……

48 Results: AnnoPPDB Spearman’s ρ × 100 support vector regression to predict gold similarities 5-fold cross validation on 260-example dev set

49 Results: AnnoPPDB Spearman’s ρ × 100 Paragram, +

50 Results: AnnoPPDB Spearman’s ρ × 100 Paragram, +Paragram, RtNNParagram, RvNN

51 Results: AnnoPPDB Spearman’s ρ × 100 300 dimension case

52 Results: AnnoPPDB Spearman’s ρ × 100

53 Results: AnnoPPDB Spearman’s ρ × 100 Paragram-sl999Paragram-ws353

54 Results: AnnoPPDB Spearman’s ρ × 100 RtNN (300)LSTM (300)Paragram-sl999Paragram-ws353

55 goldRvNN+ does not exceedis no more than5.04.83.5 could have an impact onmay influence4.64.23.2 earliest opportunityearly as possible4.44.32.9 goldRcNN+ scheduled to be held inthat will take place in4.62.94.4 according to the paper,the newspaper reported that4.62.84.1 ’s surnamefamily name of4.42.84.1 RvNN is better: addition is better: Qualitative Analysis: For positive examples, addition model outperforms RvNN when phrases 1)have similar length 2) have more “synonyms” in common

56 goldRvNN+ does not exceedis no more than5.04.83.5 could have an impact onmay influence4.64.23.2 earliest opportunityearly as possible4.44.32.9 goldRvNN+ scheduled to be held inthat will take place in4.62.94.4 according to the paper,the newspaper reported that4.62.84.1 ’s surnamefamily name of4.42.84.1 RvNN is better: Addition is better:

Conclusion Our work shows how to use PPDB to: 1) Create word embeddings that have human level performance on Simlex-999 and WordSim-353 2) Create compositonal paraphrase models that can improve correlation of PPDB 1.0 from 25 to 52 ρ. We have also released two new datasets for evaluation of short-phrase paraphrasing models Ongoing work: Phrase model improvements, off-the-shelf testing on downstream tasks

Thanks! 58

From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen.

Similar presentations

Presentation on theme: "From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen.

Similar presentations

Presentation on theme: "From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen."— Presentation transcript:

Similar presentations

About project

Feedback