From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen.

Slides:



Advertisements
Similar presentations
An Introduction to Boosting Yoav Freund Banter Inc.
Advertisements

Farag Saad i-KNOW 2014 Graz- Austria,
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Deep Learning in NLP Word representation and how to use it for Parsing
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
A Discriminative Latent Variable Model for Online Clustering Rajhans Samdani, Kai-Wei Chang, Dan Roth Department of Computer Science University of Illinois.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Intelligent Database Systems Lab Presenter: WU, MIN-CONG Authors: Zhiyuan Liu, Xinxiong Chen, Yabin Zheng, Maosong Sun 2011, FCCNLL Automatic Keyphrase.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Learning Adjective Meanings with a Tensor-Based Skip- Gram Model Review by – Masare Akshay Sunil Jean Millard & Stephan Clark University of Cambridge.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Kai Sheng-Tai, Richard Socher, Christopher D. Manning
Semantic Compositionality through Recursive Matrix-Vector Spaces
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
Show Me the Money! Deriving the Pricing Power of Product Features by Mining Consumer Reviews Nikolay Archak, Anindya Ghose, and Panagiotis G. Ipeirotis.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Dependency-Based Word Embeddings Omer LevyYoav Goldberg Bar-Ilan University Israel.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Language Identification and Part-of-Speech Tagging
Rationalizing Neural Predictions
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
A Simple Approach for Author Profiling in MapReduce
Cross-lingual Models of Word Embeddings: An Empirical Comparison
R-NET: Machine Reading Comprehension With Self-Matching Networks
Sentiment analysis using deep learning methods
CNN-RNN: A Unified Framework for Multi-label Image Classification
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
End-To-End Memory Networks
Deep Learning for Bacteria Event Identification
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Recursive Neural Networks
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Erasmus University Rotterdam
Progress Report WANG XUN 2015/10/02.
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
Visualizing and Understanding Neural Models in NLP
Giuseppe Attardi Dipartimento di Informatica Università di Pisa
Natural Language Processing of Knee MRI Reports
Neural networks (3) Regularization Autoencoder
Deep learning and applications to Natural language processing
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Bird-species Recognition Using Convolutional Neural Network
Convolutional Neural Networks for sentence classification
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui
NormFace:
Word Embeddings with Limited Memory
Paraphrase Generation Using Deep Learning
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Word embeddings based mapping
Word embeddings based mapping
Ali Hakimi Parizi, Paul Cook
Word embeddings (continued)
Attention for translation
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presented by: Anurag Paul
Deep Structured Scene Parsing by Learning with Image Descriptions
Presentation transcript:

From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen Livescu, and Dan Roth

The PPDB (Ganitkevitch et. al, 2013) is a vast collection of paraphrase pairs Motivation that allow thewhich enable the be given the opportunity tohave the possibility of i can hardly hear you.you 're breaking up. and the establishmentas well as the development laying the foundationspave the way making every effortto do its utmost ……

Motivation Improve coverage Have a parametric model Improve phrase pair scores

Contributions Powerful word embeddings that have human- level performance on SimLex999 and WordSim353 Phrase embeddings Model can re-rank phrases in PPDB 1.0 (Improve human correlation from 25 to 52 ρ.) Parameterization of PPDB that can be used downstream New datasets

Datasets Wanted clean way to evaluate paraphrase composition Two new datasets: One for bigram paraphrases and one for short-phrase paraphrases from PPDB

6 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLSimBigramPara television programmetv set training programmeeducation course bedroom windoweducation officer1.31.0

7 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLPara (this talk) MLSimMLPara television programmetv set training programmeeducation course bedroom windoweducation officer1.31.0

8 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLPara (this talk) Spearman’s rhoCohen’s kappa adjective noun noun verb noun0.73

9 WordSim353 Topical Paraphrastic SimLex-999 Words Bigrams MLSim (Mitchell and Lapata, 2010) MLPara (this talk) Phrases AnnoPPDB (this talk)

10 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic

11 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic Mean Deviation: 0.60

12 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic Dev and test sets were designed to have: 1) Variety of lengths 2) Variety of quality 3) Low word overlap

13 AnnoPPDB (this talk) AnnoPPDB can not be separated fromis inseparable from5.0 hoped to be able tolooked forward to3.4 come on, think about itpeople, please2.2 how do you mean thatwhat worst feelings1.6 Phrases Topical Paraphrastic See Pavlick et al., 2015 for similar but larger dataset

Learning Embeddings We now have datasets to test paraphrase similarity. Next we learn to embed words and phrases All similarities are computed using cosine distance

Learning Embeddings Related work on using PPDB to improve word embeddings: Yu and Dredze, 2014; Faruqui et al., 2015 We now have datasets to test paraphrase similarity. Next we learn to embed words and phrases All similarities are computed using cosine distance

16 Training examples (word pairs from PPDB): contaminationpollution convergedconvergence captionedsubtitled outwitthwart badvillain broadgeneral permanentpermanently bedsack carefreereckless absolutelyurgently ……

17 Loss Function for Learning sums over word pairs in PPDB

18 Loss Function for Learning sums over word pairs in PPDB positive example

19 Loss Function for Learning negative examples sums over word pairs in PPDB positive example

20 Choosing Negative Examples?

21 Choosing Negative Examples? only do argmax over current mini-batch (for efficiency)

22 Choosing Negative Examples? only do argmax over current mini-batch (for efficiency) we regularize by penalizing squared L 2 distance to initial embeddings

113k word pairs from PPDB (XL) 23 Training: WordSim353 Tuning: SimLex-999 Test: Notes:  1. trained with AdaGrad, tuned stepsize, mini-batch size, and regularization  2. initialized with 25-dim skip-gram vectors trained on Wikipedia  3. statistical significance computed using one-tailed method of Steiger (1980)  4. output of training: “paragram” embeddings contaminationpollution convergedconvergence captionedsubtitled ……

Results: SimLex-999 Spearman’s ρ × 100

Results: SimLex-999 Spearman’s ρ × 100 Paragram

170k word pairs from PPDB (XL) 26 Training: WordSim353 Tuning: SimLex-999 Test: Notes: 1. replaced dot product in objective with cosine distance  2. trained with AdaGrad, tuned stepsize, mini-batch size, margin and regularization  3. initialized with 300-dim GloVe common crawl embeddings  4. output of training: “paragram-ws353” embeddings (“paragram-sl999” if tuned on SimLex-999) contaminationpollution convergedconvergence captionedsubtitled …… Scaling up to 300 dimensions

170k word pairs from PPDB (XL) 27 Training: WordSim353 Tuning: SimLex-999 Test: Notes: 1. replaced dot product in objective with cosine distance  2. trained with AdaGrad, tuned stepsize, mini-batch, margin and regularization  3. initialized with 300-dim GloVe common crawl embeddings  4. output of training: “paragram-ws353” embeddings (“paragram-sl999” if tuned on SimLex-999) contaminationpollution convergedconvergence captionedsubtitled ……

Results: SimLex-999 Spearman’s ρ × 100

Results: SimLex-999 Paragram-ws353Human Spearman’s ρ × 100

Results: SimLex-999 Paragram-ws353Paragram-sl999Human Spearman’s ρ × 100

Results: WordSim-353 Tune on SimLex-999, test on WordSim-353 Spearman’s ρ × 100

Results: WordSim-353 Tune on SimLex-999, test on WordSim-353 HumanParagram-sl999 Spearman’s ρ × 100

Results: WordSim-353 Tune on SimLex-999, test on WordSim-353 Paragram-ws353Paragram-sl999Human Spearman’s ρ × 100

Extrinsic Evaluation: Sentiment Analysis 34 word vectorsdimensionalityaccuracy skip-gram skip-gram paragram Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors 25 dimension case

Extrinsic Evaluation: Sentiment Analysis 35 word vectorsdimensionalityaccuracy skip-gram skip-gram paragram Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors

Extrinsic Evaluation: Sentiment Analysis 36 word vectorsdimensionalityaccuracy GloVe paragram-ws paragram-sl Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors 300 dimension case

Extrinsic Evaluation: Sentiment Analysis 37 word vectorsdimensionalityaccuracy GloVe paragram-ws paragram-sl Stanford Sentiment Treebank, binary classification convolutional neural network (Kim, 2014) with 200 unigram filters static: no fine-tuning of word vectors

We compare standard approaches:  vector addition  recursive neural network (RvNN) (Socher et al., 2011)  recurrent neural networks (RtNN) Embedding Phrases? 38 requires binarized parse; we use Stanford parser

39 Loss Functions for Phrases replace word vectors by phrase vectors (computed by RvNN, RtNN, etc.) sum over phrase pairs in PPDB we regularize by penalizing squared L 2 distance to initial (skip-gram) embeddings and L 2 regularization on the composition parameters

bigram pairs extracted from PPDB 40 Training: MLSim (Mitchell & Lapata, 2010) Tuning: MLPara Test: adjective noun (134k)noun noun (36k)verb noun (63k) easy job simple tasktown meeting town councilachieve goal achieve aim Notes:    we extract bigram pairs of each type from PPDB using a part-of-speech tagger    when tuning/testing on one subset, we only train on bigram pairs for that subset

41 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun

42 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun Human Paragram, +

43 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun Paragram, + HumanParagram, RNN

44 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun 300 dimension case

45 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun

46 Spearman’s ρ × 100 Results: MLPara averages over three data splits: adj noun, noun noun, verb noun HumanParagram-ws353,+Paragram-sl999,+Paragram(25), RNN

60k phrase pairs from PPDB 47 Training: 260 annotated phrase pairs Tuning: 1000 annotated phrase pairs Test: that allow thewhich enable the be given the opportunity tohave the possibility of i can hardly hear you.you 're breaking up. and the establishmentas well as the development laying the foundationspave the way making every effortto do its utmost ……

48 Results: AnnoPPDB Spearman’s ρ × 100 support vector regression to predict gold similarities 5-fold cross validation on 260-example dev set

49 Results: AnnoPPDB Spearman’s ρ × 100 Paragram, +

50 Results: AnnoPPDB Spearman’s ρ × 100 Paragram, +Paragram, RtNNParagram, RvNN

51 Results: AnnoPPDB Spearman’s ρ × dimension case

52 Results: AnnoPPDB Spearman’s ρ × 100

53 Results: AnnoPPDB Spearman’s ρ × 100 Paragram-sl999Paragram-ws353

54 Results: AnnoPPDB Spearman’s ρ × 100 RtNN (300)LSTM (300)Paragram-sl999Paragram-ws353

55 goldRvNN+ does not exceedis no more than could have an impact onmay influence earliest opportunityearly as possible goldRcNN+ scheduled to be held inthat will take place in according to the paper,the newspaper reported that ’s surnamefamily name of RvNN is better: addition is better: Qualitative Analysis: For positive examples, addition model outperforms RvNN when phrases 1)have similar length 2) have more “synonyms” in common

56 goldRvNN+ does not exceedis no more than could have an impact onmay influence earliest opportunityearly as possible goldRvNN+ scheduled to be held inthat will take place in according to the paper,the newspaper reported that ’s surnamefamily name of RvNN is better: Addition is better:

Conclusion Our work shows how to use PPDB to: 1) Create word embeddings that have human level performance on Simlex-999 and WordSim-353 2) Create compositonal paraphrase models that can improve correlation of PPDB 1.0 from 25 to 52 ρ. We have also released two new datasets for evaluation of short-phrase paraphrasing models Ongoing work: Phrase model improvements, off-the-shelf testing on downstream tasks

Thanks! 58