Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann Technische Universität Darmstadt Germany
Lexical Inference
Lexical Inference: Task Definition
Distributional Methods of Lexical Inference
Unsupervised Distributional Methods
Supervised Distributional Methods
Main Questions
Experiment Setup
9 Word Representations 3 Representation Methods: PPMI, SVD (over PPMI), word2vec (SGNS) 3 Context Types Bag-of-Words (5 words to each side) Positional (2 words to each side + position) Dependency (all syntactically-connected words + dependency) Trained on English Wikipedia 5 Lexical-Inference Datasets Kotlerman et al., 2010 Baroni and Lenci, 2011 (BLESS) Baroni et al., 2012 Turney and Mohammad, 2014 Levy et al., 2014
Supervised Methods
Are current supervised DMs better than unsupervised DMs?
Previously Reported Success Prior Art: Supervised DMs better than unsupervised DMs Accuracy >95% (in some datasets) Our Findings: High accuracy of supervised DMs stems from lexical memorization
Lexical Memorization
Avoid lexical memorization with lexical train/test splits If “animal” appears in train, it cannot appear in test Lexical splits applied to all our experiments
Experiments without Lexical Memorization 4 supervised vs 1 unsupervised Cosine similarity Cosine similarity outperforms all supervised DMs in 2/5 datasets Conclusion: supervised DMs are not necessarily better
In practice: Almost as well as Concat & Diff Best method in 1/5 dataset
Prototypical Hypernyms
Recall: portion of real positive examples ( ✔ ) classified true Match Error: portion of artificial examples ( ✘ ) classified true Bottom-right: prefer ✔ over ✘ Good classifiers Top-left: prefer ✘ over ✔ Worse than random Diagonal: cannot distinguish ✔ from ✘ Predicted by hypothesis
Prototypical Hypernyms
Prototypical Hypernyms: Analysis
Conclusions
What if the necessary relational information does not exist in contextual features?
The Limitations of Contextual Features
Also in the Paper… Theoretical Analysis Explains our empirical findings Sim Kernel: A new supervised method Partially addresses the issue of prototypical hypernyms
Theoretical Analysis
Lexical Inference: Motivation
Lexical Inference