Vector Representation of Text

Slides:



Advertisements
Similar presentations
2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.
Advertisements

Text Similarity David Kauchak CS457 Fall 2011.
Deep Learning in NLP Word representation and how to use it for Parsing
Word/Doc2Vec for Sentiment Analysis
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Distributed Representations of Sentences and Documents
Yang-de Chen Tutorial: word2vec Yang-de Chen
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Analysis of a Neural Language Model Eric Doi CS 152: Neural Networks Harvey Mudd College.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Text Summarization via Semantic Representation 吳旻誠 2014/07/16.
Vector Semantics Dense Vectors.
A Tutorial on ML Basics and Embedding Chong Ruan
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
DeepWalk: Online Learning of Social Representations
Bassem Makni SML 16 Click to add text 1 Deep Learning of RDF rules Semantic Machine Learning.
Sentiment Analysis CMPT 733. Outline What is sentiment analysis? Overview of approach Feature Representation Term Frequency – Inverse Document Frequency.
Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.
Distributed Representations for Natural Language Processing
Dimensionality Reduction and Principle Components Analysis
Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Comparison with other Models Exploring Predictive Architectures
Deep learning David Kauchak CS158 – Fall 2016.
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
ECE 5424: Introduction to Machine Learning
Recursive Neural Networks
Chinese Academy of Sciences, Beijing, China
Vector Semantics Introduction.
Intro to NLP and Deep Learning
Word Embeddings and their Applications
Deep learning and applications to Natural language processing
Vector-Space (Distributional) Lexical Semantics
Efficient Estimation of Word Representation in Vector Space
Word2Vec CS246 Junghoo “John” Cho.
Recursive Structure.
Distributed Representation of Words, Sentences and Paragraphs
Convolutional Neural Networks for sentence classification
Jun Xu Harbin Institute of Technology China
Institute for Big Data Analytics
Word Embeddings with Limited Memory
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Learning Emoji Embeddings Using Emoji Co-Occurrence Network Graph
Word Embedding Word2Vec.
Creating Data Representations
Word embeddings based mapping
Word embeddings based mapping
RCNN, Fast-RCNN, Faster-RCNN
A connectionist model in action
Word2Vec.
Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.
Presentation By: Eryk Helenowski PURE Mentor: Vincent Bindschaedler
Word embeddings (continued)
Attention for translation
Word representations David Kauchak CS158 – Fall 2016.
Natural Language Processing Is So Difficult
Dr. Reda Bouadjenek Data-Driven Decision Making Lab (D3M)
Vector Representation of Text
Professor Junghoo “John” Cho UCLA
Presentation transcript:

Vector Representation of Text Vagelis Hristidis Prepared with the help of Nhat Le Many slides are from Richard Socher, Stanford CS224d: Deep Learning for NLP

To compare pieces of text We need effective representation of : Words Sentences Text Approach 1: Use existing thesauri or ontologies like WordNet and Snomed CT (for medical). Drawbacks: Manual Not context specific Approach 2: Use co-occurrences for word similarity. Drawbacks: Quadratic space needed Relative position and order of words not considered

Approach 3: low dimensional vectors Store only “important” information in fixed, low dimensional vector. Single Value Decomposition (SVD) on co-occurrence matrix 𝑋 is the best rank k approximation to X , in terms of least squares Motel = [0.286, 0.792, -0.177, -0.107, 0.109, -0.542, 0.349, 0.271] m = n = size of vocabulary

Approach 3: low dimensional vectors An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence, Rohde et al. 2005

Problems with SVD Computational cost scales quadratically for n x m matrix: O(mn2) flops (when n<m) Hard to incorporate new words or documents Does not consider order of words

word2vec Approach to represent the meaning of word Represent each word with a low-dimensional vector Word similarity = vector similarity Key idea: Predict surrounding words of every word Faster and can easily incorporate a new sentence/document or add a word to the vocabulary

Represent the meaning of word – word2vec 2 basic neural network models: Continuous Bag of Word (CBOW): use a window of word to predict the middle word Skip-gram (SG): use a word to predict the surrounding ones in window.

Word2vec – Continuous Bag of Word E.g. “The cat sat on floor” Window size = 2 the cat sat on floor

Index of cat in vocabulary Input layer 1 … Index of cat in vocabulary Hidden layer cat Output layer 1 … one-hot vector one-hot vector sat 1 … on

𝑊 𝑉×𝑁 𝑊′ 𝑁×𝑉 𝑊 𝑉×𝑁 We must learn W and W’ Input layer Hidden layer cat 1 … Hidden layer cat 𝑊 𝑉×𝑁 Output layer 1 … V-dim 𝑊′ 𝑁×𝑉 sat 1 … 𝑊 𝑉×𝑁 N-dim V-dim on V-dim N will be the size of word vector

𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑐𝑎𝑡 = 𝑣 𝑐𝑎𝑡 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑐𝑎𝑡 = 𝑣 𝑐𝑎𝑡 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑜𝑛 = 𝑣 𝑜𝑛 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑐𝑎𝑡 = 𝑣 𝑐𝑎𝑡 0.1 2.4 1.6 1.8 0.5 0.9 … 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 1 … 2.4 2.6 … 1.8 Input layer 1 … × = xcat Output layer 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑐𝑎𝑡 = 𝑣 𝑐𝑎𝑡 1 … V-dim 𝑣 = 𝑣 𝑐𝑎𝑡 + 𝑣 𝑜𝑛 2 + sat 1 … 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑜𝑛 = 𝑣 𝑜𝑛 V-dim Hidden layer xon N-dim V-dim

𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑜𝑛 = 𝑣 𝑜𝑛 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑐𝑎𝑡 = 𝑣 𝑐𝑎𝑡 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑜𝑛 = 𝑣 𝑜𝑛 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑜𝑛 = 𝑣 𝑜𝑛 0.1 2.4 1.6 1.8 0.5 0.9 … 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 1 … 1.8 2.9 … 1.9 Input layer 1 … × = xcat Output layer 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑐𝑎𝑡 = 𝑣 𝑐𝑎𝑡 1 … V-dim 𝑣 = 𝑣 𝑐𝑎𝑡 + 𝑣 𝑜𝑛 2 + sat 1 … 𝑊 𝑉×𝑁 𝑇 × 𝑥 𝑜𝑛 = 𝑣 𝑜𝑛 V-dim Hidden layer xon N-dim V-dim

𝑊 𝑉×𝑁 𝑦 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧) 𝑊 𝑉×𝑁 ′ × 𝑣 =𝑧 𝑊 𝑉×𝑁 Input layer Hidden layer cat 1 … Hidden layer cat 𝑊 𝑉×𝑁 Output layer 1 … V-dim 𝑊 𝑉×𝑁 ′ × 𝑣 =𝑧 𝑦 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧) 1 … 𝑣 𝑊 𝑉×𝑁 on N-dim 𝑦 sat V-dim V-dim N will be the size of word vector

𝑊 𝑉×𝑁 𝑊 𝑉×𝑁 ′ × 𝑣 =𝑧 𝑦 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧) 𝑊 𝑉×𝑁 Input layer 1 … We would prefer 𝑦 close to 𝑦 𝑠𝑎𝑡 Hidden layer cat 𝑊 𝑉×𝑁 Output layer 1 … 0.01 0.02 0.00 0.7 … V-dim 𝑊 𝑉×𝑁 ′ × 𝑣 =𝑧 𝑦 =𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧) 1 … 𝑣 𝑊 𝑉×𝑁 on N-dim 𝑦 sat V-dim V-dim N will be the size of word vector 𝑦

𝑊 𝑉×𝑁 𝑇 0.1 2.4 1.6 1.8 0.5 0.9 … 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 Contain word’s vectors Input layer 1 … xcat Output layer 𝑊 𝑉×𝑁 1 … V-dim 𝑊 𝑉×𝑁 ′ sat 1 … 𝑊 𝑉×𝑁 W contains input word vectors. W’ contains output word vectors. We can consider either W or W’ as the word’s representation. Or even take the average. V-dim Hidden layer xon N-dim V-dim We can consider either W or W’ as the word’s representation. Or even take the average.

Some interesting results

Word analogies

Represent the meaning of sentence/text Paragraph vector (2014, Quoc Le, Mikolov) Extend word2vec to text level Also two models: add paragraph vector as the input

Applications Search, e.g., query expansion Sentiment analysis Classification Clustering

Resources Stanford CS224d: Deep Learning for NLP http://cs224d.stanford.edu/index.html The best “word2vec Parameter Learning Explained”, Xin Rong https://ronxin.github.io/wevi/ Word2Vec Tutorial - The Skip-Gram Model http://mccormickml.com/2016/04/19/word2vec- tutorial-the-skip-gram-model/

Represent the meaning of sentence/text Recursive neural network: Need a parse-tree Convolutional neural network: Can go beyond representation by end-to- end model: text  class