Download presentation
1
Vector space word representations
Rani Nelken, PhD Director of Research, Outbrain @RaniNelken
3
Words = atoms?
5
That would be crazy for numbers
6
The distributional hypothesis
What is a word? Wittgenstein (1953): The meaning of a word is its use in the language Firth (1957): You shall know a word by the company it keeps
7
From atomic symbols to vectors
Map words to dense numerical vectors “representing” their contexts Map words with similar contexts to vectors with small angle
8
History Hard Clustering: Brown clustering
Soft clustering: LSA, Random projections, LDA Neural nets
9
Feedforward Neural Net Language Model
10
Training Input is one-hot vectors of context (0…0,1,0…0)
We’re trying to learn a vector for each word (“projection”) Such that the output is close to the one-hot vector of w(t)
11
Simpler model: Word2Vec
13
What can we do with these representations?
Plug them into your existing classifier Plug them into further neural nets – better! Improves accuracy on many NLP tasks Named entity recognition POS tagging sentiment analysis semantic role labeling
14
Back to cheese… cos(crumbled, cheese) = 0.042
cos(crumpled, cheese) = 0.203
15
And now for the magic
16
“Magical” property [Paris] - [France] + [Italy] ≈ [Rome]
[king] - [man] + [woman] ≈ [queen] We can use it to solve word analogy problems Boston: Red_Sox= New_York: ? Demo
18
Why does it work? [king] - [man] + [woman] ≈ [queen]
cos (x, ([king] – [man] + [woman])) = cos (x, [king]) – cos(x, [man]) + cos(x, [woman]) [queen] is a good candidate
19
It doesn’t always work London : England = Baghdad : ?
We expect Iraq, but get Mosul We’re looking for a word that is close to Baghdad, and to England, but not to London
20
Why did it fail? London : England = Baghdad : ?
cos(Mosul, Baghdad) >> cos(Iraq, London) Instead of adding the cosines, multiply them Improves accuracy
21
Word2Vec Open source C implementation from Google
Comes with pre-learned embeddings Gensim: fast python implementation
22
Active field of research
Bilingual embeddings Joint word and image embeddings Embeddings for sentiment Phrase and document embeddings
23
Bigger picture: how can we make NLP less fragile?
90’s: Linguistic engineering 00’s: Feature engineering 10’s: Unsupervised preprocessing
24
References https://code.google.com/p/word2vec/
25
Thanks @RaniNelken We’re hiring for NLP positions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.