Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.

Similar presentations


Presentation on theme: "Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann."— Presentation transcript:

1 Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann Technische Universität Darmstadt Germany

2 Lexical Inference

3 Lexical Inference: Task Definition

4 Distributional Methods of Lexical Inference

5 Unsupervised Distributional Methods

6 Supervised Distributional Methods

7 Main Questions

8 Experiment Setup

9 9 Word Representations 3 Representation Methods: PPMI, SVD (over PPMI), word2vec (SGNS) 3 Context Types Bag-of-Words (5 words to each side) Positional (2 words to each side + position) Dependency (all syntactically-connected words + dependency) Trained on English Wikipedia 5 Lexical-Inference Datasets Kotlerman et al., 2010 Baroni and Lenci, 2011 (BLESS) Baroni et al., 2012 Turney and Mohammad, 2014 Levy et al., 2014

10 Supervised Methods

11 Are current supervised DMs better than unsupervised DMs?

12 Previously Reported Success Prior Art: Supervised DMs better than unsupervised DMs Accuracy >95% (in some datasets) Our Findings: High accuracy of supervised DMs stems from lexical memorization

13 Lexical Memorization

14 Avoid lexical memorization with lexical train/test splits If “animal” appears in train, it cannot appear in test Lexical splits applied to all our experiments

15 Experiments without Lexical Memorization 4 supervised vs 1 unsupervised Cosine similarity Cosine similarity outperforms all supervised DMs in 2/5 datasets Conclusion: supervised DMs are not necessarily better

16

17

18 In practice: Almost as well as Concat & Diff Best method in 1/5 dataset

19

20

21 Prototypical Hypernyms

22

23 Recall: portion of real positive examples ( ✔ ) classified true Match Error: portion of artificial examples ( ✘ ) classified true Bottom-right: prefer ✔ over ✘ Good classifiers Top-left: prefer ✘ over ✔ Worse than random Diagonal: cannot distinguish ✔ from ✘ Predicted by hypothesis

24 Prototypical Hypernyms

25 Prototypical Hypernyms: Analysis

26 Conclusions

27

28 What if the necessary relational information does not exist in contextual features?

29 The Limitations of Contextual Features

30

31 Also in the Paper… Theoretical Analysis Explains our empirical findings Sim Kernel: A new supervised method Partially addresses the issue of prototypical hypernyms

32 Theoretical Analysis

33 Lexical Inference: Motivation

34 Lexical Inference


Download ppt "Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann."

Similar presentations


Ads by Google