Vector space word representations

Slides:



Advertisements
Similar presentations
Deep Learning and Text Mining Will Stanton Ski Hackathon Kickoff Ceremony, Feb 28, 2015.
Advertisements

Deep Learning in NLP Word representation and how to use it for Parsing
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.
Word/Doc2Vec for Sentiment Analysis
Distributed Representations of Sentences and Documents
An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.
Yuliya Morozova Institute for Informatics Problems of the Russian Academy of Sciences, Moscow.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
ELN – Natural Language Processing Giuseppe Attardi
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Ronan Collobert Jason Weston Leon Bottou Michael Karlen Koray Kavukcouglu Pavel Kuksa.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
A shallow introduction to Deep Learning
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Constructing Knowledge Graph from Unstructured Text Image Source: Kundan Kumar Siddhant Manocha.
Efficient Estimation of Word Representations in Vector Space
Word representations: A simple and general method for semi-supervised learning Joseph Turian with Lev Ratinov and Yoshua Bengio Goodies:
Deep Visual Analogy-Making
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.
Utilizing vector models for automatic text lemmatization Ladislav Gallay Supervisor: Ing. Marián Šimko, PhD. Slovak University of Technology Faculty of.
ECE172A Project Report Image Search and Classification Isaac Caldwell.
Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning Ronan Collobert Jason Weston Presented by Jie Peng.
Vector Semantics Dense Vectors.
A Tutorial on ML Basics and Embedding Chong Ruan
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Distributed Representations for Natural Language Processing
Brief Intro to Machine Learning CS539
Sentiment analysis using deep learning methods
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
Deep learning David Kauchak CS158 – Fall 2016.
Semantic Processing with Context Analysis
Natural language understanding
Vector Semantics Introduction.
Intro to NLP and Deep Learning
Word Embeddings and their Applications
Zhe Ye Word2vec Tutorial Zhe Ye
Distributed Representations of Words and Phrases and their Compositionality Presenter: Haotian Xu.
Deep learning and applications to Natural language processing
Vector-Space (Distributional) Lexical Semantics
Unsupervised Learning and Autoencoders
Efficient Estimation of Word Representation in Vector Space
Word2Vec CS246 Junghoo “John” Cho.
Basic Intro Tutorial on Machine Learning and Data Mining
Distributed Representation of Words, Sentences and Paragraphs
Jun Xu Harbin Institute of Technology China
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Word embeddings based mapping
3.1.1 Introduction to Machine Learning
Resource Recommendation for AAN
Word2Vec.
Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
One of these things is not like the other!
Word embeddings (continued)
From Unstructured Text to StructureD Data
Word representations David Kauchak CS158 – Fall 2016.
Natural Language Processing Is So Difficult
Vector Representation of Text
Presentation transcript:

Vector space word representations Rani Nelken, PhD Director of Research, Outbrain @RaniNelken

https://www.flickr.com/photos/hyku/295930906/in/photolist-EbXgJ-ajDBs8-9hevWb-s9HX1-5hZqnb-a1Jk8H-a1Mcx7-7QiUWL-6AFs53-9TRtkz-bqt2GQ-F574u-F56EA-3imqK7/

Words = atoms?

That would be crazy for numbers https://www.flickr.com/photos/proimos/4199675334/

The distributional hypothesis What is a word? Wittgenstein (1953): The meaning of a word is its use in the language Firth (1957): You shall know a word by the company it keeps

From atomic symbols to vectors Map words to dense numerical vectors “representing” their contexts Map words with similar contexts to vectors with small angle

History Hard Clustering: Brown clustering Soft clustering: LSA, Random projections, LDA Neural nets

Feedforward Neural Net Language Model

Training Input is one-hot vectors of context (0…0,1,0…0) We’re trying to learn a vector for each word (“projection”) Such that the output is close to the one-hot vector of w(t)

Simpler model: Word2Vec

What can we do with these representations? Plug them into your existing classifier Plug them into further neural nets – better! Improves accuracy on many NLP tasks Named entity recognition POS tagging sentiment analysis semantic role labeling

Back to cheese… cos(crumbled, cheese) = 0.042 cos(crumpled, cheese) = 0.203

And now for the magic http://en.wikipedia.org/wiki/Penn_%26_Teller#mediaviewer/File:Penn_and_Teller_(1988).jpg

“Magical” property [Paris] - [France] + [Italy] ≈ [Rome] [king] - [man] + [woman] ≈ [queen] We can use it to solve word analogy problems Boston: Red_Sox= New_York: ? Demo

Why does it work? [king] - [man] + [woman] ≈ [queen] cos (x, ([king] – [man] + [woman])) = cos (x, [king]) – cos(x, [man]) + cos(x, [woman]) [queen] is a good candidate

It doesn’t always work London : England = Baghdad : ? We expect Iraq, but get Mosul We’re looking for a word that is close to Baghdad, and to England, but not to London

Why did it fail? London : England = Baghdad : ? cos(Mosul, Baghdad) >> cos(Iraq, London) Instead of adding the cosines, multiply them Improves accuracy

Word2Vec Open source C implementation from Google Comes with pre-learned embeddings Gensim: fast python implementation

Active field of research Bilingual embeddings Joint word and image embeddings Embeddings for sentiment Phrase and document embeddings

Bigger picture: how can we make NLP less fragile? 90’s: Linguistic engineering 00’s: Feature engineering 10’s: Unsupervised preprocessing

References https://code.google.com/p/word2vec/ http://www.cs.bgu.ac.il/~yoavg/publications/conll2014analogies.pdf http://radimrehurek.com/2014/02/word2vec-tutorial/

Thanks @RaniNelken We’re hiring for NLP positions