Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.

Slides:

Advertisements

Similar presentations

Posner and Keele; Rosch et al.. Posner and Keele: Two Main Points Greatest generalization is to prototype. –Given noisy examples of prototype, prototype.

Advertisements

Large-Scale Entity-Based Online Social Network Profile Linkage.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.

Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at

Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!

1 User Centered Design and Evaluation. 2 Overview My evaluation experience Why involve users at all? What is a user-centered approach? Evaluation strategies.

Distributed Representations of Sentences and Documents

From Semantic Similarity to Semantic Relations Georgeta Bordea, November 25 Based on a talk by Alessandro Lenci titled “Will DS ever become Semantic?”,

Large-Scale Cost-sensitive Online Social Network Profile Linkage.

An Examination of Learning Processes During Critical Incident Training: Implications for the Development of Adaptable Trainees Andrew Neal, Stuart T. Godley,

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Lecture 6: The Ultimate Authorship Problem: Verification for Short Docs Moshe Koppel and Yaron Winter.

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Multi-Task Learning for HIV Therapy Screening Steffen Bickel, Jasmina Bogojeska, Thomas Lengauer, Tobias Scheffer.

Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )

Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.

Experimental Evaluation of Learning Algorithms Part 1.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Matching Users and Items Across Domains to Improve the Recommendation Quality Created by: Chung-Yi Li, Shou-De Lin Presented by: I Gde Dharma Nugraha 1.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,

Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Dependency-Based Word Embeddings Omer LevyYoav Goldberg Bar-Ilan University Israel.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.

Jivko Sinapov, Kaijen Hsiao and Radu Bogdan Rusu Proprioceptive Perception for Object Weight Classification.

Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.

Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.

RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.

Intrinsic Subspace Evaluation of Word Embedding Representations Yadollah Yaghoobzadeh and Hinrich Schu ̈ tze Center for Information and Language Processing.

Unsupervised Sparse Vector Densification for Short Text Similarity

Machine Learning with Spark MLlib

Comparison with other Models Exploring Predictive Architectures

Learning Mid-Level Features For Recognition

Transfer Learning in Astronomy: A New Machine Learning Paradigm

Chapter 11: Learning Introduction

Intro to NLP and Deep Learning

Object-Graphs for Context-Aware Category Discovery

Jun Xu Harbin Institute of Technology China

Recognizing Partial Textual Entailment

Feature Selection Analysis

iSRD Spam Review Detection with Imbalanced Data Distributions

Ontology-Driven Sentiment Analysis of Product and Service Aspects

Supervised vs. unsupervised Learning

Basics of ML Rohan Suri.

Statistical NLP Spring 2011

Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Learning and Memorization

Machine Learning: Lecture 5

Presentation transcript:

Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann Technische Universität Darmstadt Germany

Lexical Inference

Lexical Inference: Task Definition

Distributional Methods of Lexical Inference

Unsupervised Distributional Methods

Supervised Distributional Methods

Main Questions

Experiment Setup

9 Word Representations 3 Representation Methods: PPMI, SVD (over PPMI), word2vec (SGNS) 3 Context Types Bag-of-Words (5 words to each side) Positional (2 words to each side + position) Dependency (all syntactically-connected words + dependency) Trained on English Wikipedia 5 Lexical-Inference Datasets Kotlerman et al., 2010 Baroni and Lenci, 2011 (BLESS) Baroni et al., 2012 Turney and Mohammad, 2014 Levy et al., 2014

Supervised Methods

Are current supervised DMs better than unsupervised DMs?

Previously Reported Success Prior Art: Supervised DMs better than unsupervised DMs Accuracy >95% (in some datasets) Our Findings: High accuracy of supervised DMs stems from lexical memorization

Lexical Memorization

Avoid lexical memorization with lexical train/test splits If “animal” appears in train, it cannot appear in test Lexical splits applied to all our experiments

Experiments without Lexical Memorization 4 supervised vs 1 unsupervised Cosine similarity Cosine similarity outperforms all supervised DMs in 2/5 datasets Conclusion: supervised DMs are not necessarily better

In practice: Almost as well as Concat & Diff Best method in 1/5 dataset

Prototypical Hypernyms

Recall: portion of real positive examples ( ✔ ) classified true Match Error: portion of artificial examples ( ✘ ) classified true Bottom-right: prefer ✔ over ✘ Good classifiers Top-left: prefer ✘ over ✔ Worse than random Diagonal: cannot distinguish ✔ from ✘ Predicted by hypothesis

Prototypical Hypernyms

Prototypical Hypernyms: Analysis

Conclusions

What if the necessary relational information does not exist in contextual features?

The Limitations of Contextual Features

Also in the Paper… Theoretical Analysis Explains our empirical findings Sim Kernel: A new supervised method Partially addresses the issue of prototypical hypernyms

Theoretical Analysis

Lexical Inference: Motivation

Lexical Inference