Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.

Slides:



Advertisements
Similar presentations
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Advertisements

Derandomized parallel repetition theorems for free games Ronen Shaltiel, University of Haifa.
Deep Learning in NLP Word representation and how to use it for Parsing
Vector space word representations
Evaluation Adam Bodnar CPSC 533C Monday, April 5, 2004.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
October 5, 2010Neural Networks Lecture 9: Applying Backpropagation 1 K-Class Classification Problem Let us denote the k-th class by C k, with n k exemplars.
Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.
Distributed Representations of Sentences and Documents
Linguistic Regularities in Sparse and Explicit Word Representations
Word sense induction using continuous vector space models
Find all the factors pairs of 20.
Perfect Squares
Multiplication is the process of adding equal sets together = 6 We added 2 three times.
Analysis of a Neural Language Model Eric Doi CS 152: Neural Networks Harvey Mudd College.
CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
ORDER OF OPERATIONS x 2 Evaluate the following arithmetic expression: x 2 Each student interpreted the problem differently, resulting in.
Evaluations. Step 1 Make an opening statement about how good you think your evidence is. How reliable do you think your results are? Do you think the.
Operations with Rational Numbers. When simplifying expressions with rational numbers, you must follow the order of operations while remembering your rules.
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Efficient Estimation of Word Representations in Vector Space
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
G Lecture 7 Confirmatory Factor Analysis
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
Deep Visual Analogy-Making
Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Examples 1.At City High School, 30% of students have part- time jobs and 25% of students are on the honor roll. What is the probability that a student.
Dependency-Based Word Embeddings Omer LevyYoav Goldberg Bar-Ilan University Israel.
Utilizing vector models for automatic text lemmatization Ladislav Gallay Supervisor: Ing. Marián Šimko, PhD. Slovak University of Technology Faculty of.
Review Day. ERROR ANALYSIS- Students have completed the following problems incorrectly. Write a sentence for each on a lined sheet of paper explaining.
Ganesh J, Soumyajit Ganguly, Manish Gupta, Vasudeva Varma, Vikram Pudi
Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.
Vector Semantics Dense Vectors.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
DeepWalk: Online Learning of Social Representations
PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.
Solving linear equations  Review the properties of equality  Equations that involve simplification  Equations containing fractions  A general strategy.
Unsupervised Sparse Vector Densification for Short Text Similarity
Distributed Representations for Natural Language Processing
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Why Stochastic Hydrology ?
METHODS IN ANTHROPOLOGY
Natural language understanding
On Dataless Hierarchical Text Classification
QM222 A1 Nov. 27 More tips on writing your projects
Vector-Space (Distributional) Lexical Semantics
A Verified DSL for MPC in
Efficient Estimation of Word Representation in Vector Space
Word2Vec CS246 Junghoo “John” Cho.
Distributed Representation of Words, Sentences and Paragraphs
RL methods in practice Alekh Agarwal.
Word Embeddings with Limited Memory
Pragmatics of Persuasion
Word Embedding Word2Vec.
Word embeddings based mapping
EOC Practice #15 SPI
Evaluate the limit: {image} Choose the correct answer from the following:
Socialized Word Embeddings
Order of Operations PowerPoint
Word embeddings (continued)
Question 5.
For vectors {image} , find u + v. Select the correct answer:
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Non-Negative Matrix Factorization
Cengizhan Can Phoebe de Nooijer
Presentation transcript:

Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel

Papers in ACL 2014* * Sampling error: +/- 100%

Neural Embeddings

Representing words as vectors is not new!

Explicit Representations (Distributional)

Questions Are analogies unique to neural embeddings? Compare neural embeddings with explicit representations Why does vector arithmetic reveal analogies? Unravel the mystery behind neural embeddings and their “magic”

Background

Mikolov et al. (2013a,b,c) Neural embeddings have interesting geometries

Mikolov et al. (2013a,b,c) Neural embeddings have interesting geometries These patterns capture “relational similarities” Can be used to solve analogies: man is to woman as king is to queen

Mikolov et al. (2013a,b,c)

Mikolov et al. (2013a,b,c)

Mikolov et al. (2013a,b,c)

Are analogies unique to neural embeddings?

Experiment: compare embeddings to explicit representations Are analogies unique to neural embeddings?

Experiment: compare embeddings to explicit representations

Are analogies unique to neural embeddings? Experiment: compare embeddings to explicit representations Learn different representations from the same corpus:

Are analogies unique to neural embeddings?

Analogy Datasets

Embedding vs Explicit (Round 1)

Many analogies recovered by explicit, but many more by embedding.

Why does vector arithmetic reveal analogies?

Why does vector arithmetic reveal analogies?

Why does vector arithmetic reveal analogies?

Why does vector arithmetic reveal analogies?

Why does vector arithmetic reveal analogies?

Why does vector arithmetic reveal analogies?

Why does vector arithmetic reveal analogies?

Why does vector arithmetic reveal analogies? royal?female?

What does each similarity term mean? Observe the joint features with explicit representations! uncrownedElizabeth majestyKatherine secondimpregnate ……

Can we do better?

Let’s look at some mistakes…

The Additive Objective

The Additive Objective

The Additive Objective

The Additive Objective

The Additive Objective

The Additive Objective Problem: one similarity might dominate the rest Much more prevalent in explicit representation Might explain why explicit underperformed

How can we do better?

Instead of adding similarities, multiply them!

How can we do better?

How can we do better?

Embedding vs Explicit (Round 2)

Multiplication > Addition

Explicit is on-par with Embedding

Embeddings are not “magical” Embedding-based similarities have a more uniform distribution The additive objective performs better on smoother distributions The multiplicative objective overcomes this issue

Conclusion Are analogies unique to neural embeddings? No! They occur in sparse and explicit representations as well. Why does vector arithmetic reveal analogies? Because vector arithmetic is equivalent to similarity arithmetic. Can we do better? Yes! The multiplicative objective is significantly better.

More Results and Analyses (in the paper) Evaluation on closed-vocabulary analogy questions (SemEval 2012) Experiments with a third objective function (PairDirection) Do different representations reveal the same analogies? Error analysis A feature-level interpretation of how word similarity reveals analogies

Agreement Objective Both Correct Both Wrong Embedding Correct Explicit Correct MSR43.97%28.06%15.12%12.85% Google57.12%22.17%9.59%11.12%

Error Analysis: Default Behavior A certain word acts as a “prototype” answer for its semantic type Examples: daughter for feminine answers Fresno for US cities Illinois for US states Their vectors are the centroid of that semantic type

Error Analysis: Verb Inflections In verb analogies: walked is to walking as danced is to… ? The correct lemma is often found ( dance ) But with the wrong inflection( dances ) Probably an artifact of the window context

The Iraqi Example

The Additive Objective

The Iraqi Example (Revisited)