Dependency-Based Word Embeddings Omer LevyYoav Goldberg Bar-Ilan University Israel.

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Katrin Erk Distributional models. Representing meaning through collections of words Doc 1: Abdullah boycotting challenger commission dangerous election.
Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.
Scott Wen-tau Yih Joint work with Kristina Toutanova, John Platt, Chris Meek Microsoft Research.
Do Supervised Distributional Methods Really Learn Lexical Inference Relations? Omer Levy Ido Dagan Bar-Ilan University Israel Steffen Remus Chris Biemann.
Distributed Representations of Sentences and Documents
Word sense induction using continuous vector space models
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
DOG I : an Annotation System for Images of Dog Breeds Antonis Dimas Pyrros Koletsis Euripides Petrakis Intelligent Systems Laboratory Technical University.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.
Natural Language Processing
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Detecting compositionality using semantic vector space models based on syntactic context Guillermo Garrido and Anselmo Peñas NLP & IR Group at UNED Madrid,
1 Business Proprietary © 2009 Oculus Info Inc. Everyone’s a Critic: Memory Models and Uses for an Artificial Turing Judge W. Joseph MacInnes, Blair C.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Learning Adjective Meanings with a Tensor-Based Skip- Gram Model Review by – Masare Akshay Sunil Jean Millard & Stephan Clark University of Cambridge.
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin, Sanghack Lee, Ngot Bui.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Link Distribution on Wikipedia [0407]KwangHee Park.
Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR
Text Summarization via Semantic Representation 吳旻誠 2014/07/16.
From Paraphrase Database to Compositional Paraphrase Model and Back John Wieting University of Illinois Joint work with Mohit Bansal, Kevin Gimpel, Karen.
Technology’s Role in Student Organizations: Beyond Twitter and Facebook Michael Shah Tufts University November 8, 2013 ~20 minutes
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Vector Semantics.
Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Vector Semantics Dense Vectors.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Link Distribution on Wikipedia [0422]KwangHee Park.
Word Sense Disambiguation Algorithms in Hindi
DeepWalk: Online Learning of Social Representations
Vector Semantics. Dan Jurafsky Why vector models of meaning? computing the similarity between words “fast” is similar to “rapid” “tall” is similar to.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Unsupervised Sparse Vector Densification for Short Text Similarity
Distributed Representations for Natural Language Processing
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1
Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.
Vector-Space (Distributional) Lexical Semantics
Word2Vec CS246 Junghoo “John” Cho.
Distributed Representation of Words, Sentences and Paragraphs
Jun Xu Harbin Institute of Technology China
Word Embeddings with Limited Memory
Mining Query Subtopics from Search Log Data
Data Integration for Relational Web
Word embeddings based mapping
Milton King, Waseem Gharbieh, Sohyun Park, and Paul Cook
Word2Vec.
<insert title> < presenter name >
Introduction to Sentiment Analysis
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Dependency-Based Word Embeddings Omer LevyYoav Goldberg Bar-Ilan University Israel

Neural Embeddings

Our Main Contribution: Generalizing Skip-Gram with Negative Sampling

Skip-Gram with Negative Sampling v2.0 Original implementation assumes bag-of-words contexts We generalize to arbitrary contexts Dependency contexts create qualitatively different word embeddings Provide a new tool for linguistically analyzing embeddings

Context Types

Australian scientist discovers star with telescope Example

Australian scientist discovers star with telescope Target Word

Australian scientist discovers star with telescope Bag of Words (BoW) Context

Australian scientist discovers star with telescope Bag of Words (BoW) Context

Australian scientist discovers star with telescope Bag of Words (BoW) Context

Australian scientist discovers star with telescope Syntactic Dependency Context

Australian scientist discovers star with telescope Syntactic Dependency Context prep_withnsubj dobj

Australian scientist discovers star with telescope Syntactic Dependency Context prep_withnsubj dobj

Generalizing Skip-Gram with Negative Sampling

How does Skip-Gram work?

Text Bag of Words Context Word-Context Pairs Learning

How does Skip-Gram work? Text Bag of Words Contexts Word-Context Pairs Learning

Our Modification Text Arbitrary Contexts Word-Context Pairs Learning

Our Modification Text Arbitrary Contexts Word-Context Pairs Learning Modified word2vec publicly available!

Our Modification: Example Text Syntactic Contexts Word-Context Pairs Learning

Our Modification: Example Text (Wikipedia) Syntactic Contexts Word-Context Pairs Learning

Our Modification: Example Text (Wikipedia) Syntactic Contexts (Stanford Dependencies) Word-Context Pairs Learning

What is the effect of different context types?

Thoroughly studied in explicit representations (distributional) Lin (1998), Padó and Lapata (2007), and many others… General Conclusion: Bag-of-words contexts induce topical similarities Dependency contexts induce functional similarities Share the same semantic type Cohyponyms Does this hold for embeddings as well?

Embedding Similarity with Different Contexts Target WordBag of Words (k=5)Dependencies DumbledoreSunnydale hallowsCollinwood Hogwartshalf-bloodCalarts (Harry Potter’s school)MalfoyGreendale SnapeMillfield Related to Harry Potter Schools

Embedding Similarity with Different Contexts Target WordBag of Words (k=5)Dependencies nondeterministicPauling non-deterministicHotelling TuringcomputabilityHeting (computer scientist)deterministicLessing finite-stateHamming Related to computability Scientists

Online Demo! Embedding Similarity with Different Contexts Target WordBag of Words (k=5)Dependencies singing dancerapping dancingdancesbreakdancing (dance gerund)dancersmiming tap-dancingbusking Related to dance Gerunds

Embedding Similarity with Different Contexts Dependency-based embeddings have more functional similarities This phenomenon goes beyond these examples Quantitative Analysis (in the paper)

Dependency-based embeddings have more functional similarities Quantitative Analysis Dependencies BoW (k=2) BoW (k=5)

Why do dependencies induce functional similarities?

Dependency Contexts & Functional Similarity Thoroughly studied in explicit representations (distributional) Lin (1998), Padó and Lapata (2007), and many others… In explicit representations, we can look at the features and analyze But embeddings are a black box! Dimensions are latent and don’t necessarily have any meaning

Analyzing Embeddings

Peeking into Skip-Gram’s Black Box

Associated Contexts Target WordDependencies students/prep_at -1 educated/prep_at -1 Hogwartsstudent/prep_at -1 stay/prep_at -1 learned/prep_at -1

Associated Contexts Target WordDependencies machine/nn -1 test/nn -1 Turingtheorem/poss -1 machines/nn -1 tests/nn -1

Associated Contexts Target WordDependencies dancing/conj dancing/conj -1 dancingsinging/conj -1 singing/conj ballroom/nn

Analyzing Embeddings We found a way to linguistically analyze embeddings Together with the ability to engineer contexts… …we now have the tools to create task-tailored embeddings!

Conclusion

Generalized Skip-Gram with Negative Sampling to arbitrary contexts Different contexts induce different similarities Suggest a way to peek inside the black box of embeddings Code, demo, and word vectors available from our websites Make linguistically-motivated task-tailored embeddings today! Thank you for listening :)

How does Skip-Gram work?

Generalize Skip-Gram to Arbitrary Contexts

Quantitative Analysis

WordSim353

Quantitative Analysis Define an artificial task of ranking functional pairs above topical ones Use embedding similarity (cosine) to rank Evaluate using precision-recall curve Higher curve means higher affinity to functional similarity

Quantitative Analysis Dependencies BoW (k=2) BoW (k=5) Dependencies BoW (k=2) BoW (k=5) Dependency-based embeddings have more functional similarities