Dependency-Based Word Embeddings Omer LevyYoav Goldberg Bar-Ilan University Israel
Neural Embeddings
Our Main Contribution: Generalizing Skip-Gram with Negative Sampling
Skip-Gram with Negative Sampling v2.0 Original implementation assumes bag-of-words contexts We generalize to arbitrary contexts Dependency contexts create qualitatively different word embeddings Provide a new tool for linguistically analyzing embeddings
Context Types
Australian scientist discovers star with telescope Example
Australian scientist discovers star with telescope Target Word
Australian scientist discovers star with telescope Bag of Words (BoW) Context
Australian scientist discovers star with telescope Bag of Words (BoW) Context
Australian scientist discovers star with telescope Bag of Words (BoW) Context
Australian scientist discovers star with telescope Syntactic Dependency Context
Australian scientist discovers star with telescope Syntactic Dependency Context prep_withnsubj dobj
Australian scientist discovers star with telescope Syntactic Dependency Context prep_withnsubj dobj
Generalizing Skip-Gram with Negative Sampling
How does Skip-Gram work?
Text Bag of Words Context Word-Context Pairs Learning
How does Skip-Gram work? Text Bag of Words Contexts Word-Context Pairs Learning
Our Modification Text Arbitrary Contexts Word-Context Pairs Learning
Our Modification Text Arbitrary Contexts Word-Context Pairs Learning Modified word2vec publicly available!
Our Modification: Example Text Syntactic Contexts Word-Context Pairs Learning
Our Modification: Example Text (Wikipedia) Syntactic Contexts Word-Context Pairs Learning
Our Modification: Example Text (Wikipedia) Syntactic Contexts (Stanford Dependencies) Word-Context Pairs Learning
What is the effect of different context types?
Thoroughly studied in explicit representations (distributional) Lin (1998), Padó and Lapata (2007), and many others… General Conclusion: Bag-of-words contexts induce topical similarities Dependency contexts induce functional similarities Share the same semantic type Cohyponyms Does this hold for embeddings as well?
Embedding Similarity with Different Contexts Target WordBag of Words (k=5)Dependencies DumbledoreSunnydale hallowsCollinwood Hogwartshalf-bloodCalarts (Harry Potter’s school)MalfoyGreendale SnapeMillfield Related to Harry Potter Schools
Embedding Similarity with Different Contexts Target WordBag of Words (k=5)Dependencies nondeterministicPauling non-deterministicHotelling TuringcomputabilityHeting (computer scientist)deterministicLessing finite-stateHamming Related to computability Scientists
Online Demo! Embedding Similarity with Different Contexts Target WordBag of Words (k=5)Dependencies singing dancerapping dancingdancesbreakdancing (dance gerund)dancersmiming tap-dancingbusking Related to dance Gerunds
Embedding Similarity with Different Contexts Dependency-based embeddings have more functional similarities This phenomenon goes beyond these examples Quantitative Analysis (in the paper)
Dependency-based embeddings have more functional similarities Quantitative Analysis Dependencies BoW (k=2) BoW (k=5)
Why do dependencies induce functional similarities?
Dependency Contexts & Functional Similarity Thoroughly studied in explicit representations (distributional) Lin (1998), Padó and Lapata (2007), and many others… In explicit representations, we can look at the features and analyze But embeddings are a black box! Dimensions are latent and don’t necessarily have any meaning
Analyzing Embeddings
Peeking into Skip-Gram’s Black Box
Associated Contexts Target WordDependencies students/prep_at -1 educated/prep_at -1 Hogwartsstudent/prep_at -1 stay/prep_at -1 learned/prep_at -1
Associated Contexts Target WordDependencies machine/nn -1 test/nn -1 Turingtheorem/poss -1 machines/nn -1 tests/nn -1
Associated Contexts Target WordDependencies dancing/conj dancing/conj -1 dancingsinging/conj -1 singing/conj ballroom/nn
Analyzing Embeddings We found a way to linguistically analyze embeddings Together with the ability to engineer contexts… …we now have the tools to create task-tailored embeddings!
Conclusion
Generalized Skip-Gram with Negative Sampling to arbitrary contexts Different contexts induce different similarities Suggest a way to peek inside the black box of embeddings Code, demo, and word vectors available from our websites Make linguistically-motivated task-tailored embeddings today! Thank you for listening :)
How does Skip-Gram work?
Generalize Skip-Gram to Arbitrary Contexts
Quantitative Analysis
WordSim353
Quantitative Analysis Define an artificial task of ranking functional pairs above topical ones Use embedding similarity (cosine) to rank Evaluate using precision-recall curve Higher curve means higher affinity to functional similarity
Quantitative Analysis Dependencies BoW (k=2) BoW (k=5) Dependencies BoW (k=2) BoW (k=5) Dependency-based embeddings have more functional similarities