Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vector Representation of Text

Similar presentations


Presentation on theme: "Vector Representation of Text"β€” Presentation transcript:

1 Vector Representation of Text
Vagelis Hristidis Prepared with the help of Nhat Le Many slides are from Richard Socher, Stanford CS224d: Deep Learning for NLP

2 To compare pieces of text
We need effective representation of : Words Sentences Text Approach 1: Use existing thesauri or ontologies like WordNet and Snomed CT (for medical). Drawbacks: Manual Not context specific Approach 2: Use co-occurrences for word similarity. Drawbacks: Quadratic space needed Relative position and order of words not considered

3 Approach 3: low dimensional vectors
Store only β€œimportant” information in fixed, low dimensional vector. Single Value Decomposition (SVD) on co-occurrence matrix 𝑋 is the best rank k approximation to X , in terms of least squares Motel = [0.286, 0.792, , , 0.109, , 0.349, 0.271] m = n = size of vocabulary

4 Approach 3: low dimensional vectors
An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence, Rohde et al. 2005

5 Problems with SVD Computational cost scales quadratically for n x m matrix: O(mn2) flops (when n<m) Hard to incorporate new words or documents Does not consider order of words

6 word2vec Approach to represent the meaning of word
Represent each word with a low-dimensional vector Word similarity = vector similarity Key idea: Predict surrounding words of every word Faster and can easily incorporate a new sentence/document or add a word to the vocabulary

7 Represent the meaning of word – word2vec
2 basic neural network models: Continuous Bag of Word (CBOW): use a window of word to predict the middle word Skip-gram (SG): use a word to predict the surrounding ones in window.

8 Word2vec – Continuous Bag of Word
E.g. β€œThe cat sat on floor” Window size = 2 the cat sat on floor

9 Index of cat in vocabulary
Input layer 1 … Index of cat in vocabulary Hidden layer cat Output layer 1 … one-hot vector one-hot vector sat 1 … on

10 π‘Š 𝑉×𝑁 π‘Šβ€² 𝑁×𝑉 π‘Š 𝑉×𝑁 We must learn W and W’ Input layer Hidden layer cat
1 … Hidden layer cat π‘Š 𝑉×𝑁 Output layer 1 … V-dim π‘Šβ€² 𝑁×𝑉 sat 1 … π‘Š 𝑉×𝑁 N-dim V-dim on V-dim N will be the size of word vector

11 π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘π‘Žπ‘‘ = 𝑣 π‘π‘Žπ‘‘ π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘π‘Žπ‘‘ = 𝑣 π‘π‘Žπ‘‘ π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘œπ‘› = 𝑣 π‘œπ‘›
π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘π‘Žπ‘‘ = 𝑣 π‘π‘Žπ‘‘ 0.1 2.4 1.6 1.8 0.5 0.9 … 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 1 … 2.4 2.6 … 1.8 Input layer 1 … Γ— = xcat Output layer π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘π‘Žπ‘‘ = 𝑣 π‘π‘Žπ‘‘ 1 … V-dim 𝑣 = 𝑣 π‘π‘Žπ‘‘ + 𝑣 π‘œπ‘› 2 + sat 1 … π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘œπ‘› = 𝑣 π‘œπ‘› V-dim Hidden layer xon N-dim V-dim

12 π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘œπ‘› = 𝑣 π‘œπ‘› π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘π‘Žπ‘‘ = 𝑣 π‘π‘Žπ‘‘ π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘œπ‘› = 𝑣 π‘œπ‘›
π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘œπ‘› = 𝑣 π‘œπ‘› 0.1 2.4 1.6 1.8 0.5 0.9 … 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 1 … 1.8 2.9 … 1.9 Input layer 1 … Γ— = xcat Output layer π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘π‘Žπ‘‘ = 𝑣 π‘π‘Žπ‘‘ 1 … V-dim 𝑣 = 𝑣 π‘π‘Žπ‘‘ + 𝑣 π‘œπ‘› 2 + sat 1 … π‘Š 𝑉×𝑁 𝑇 Γ— π‘₯ π‘œπ‘› = 𝑣 π‘œπ‘› V-dim Hidden layer xon N-dim V-dim

13 π‘Š 𝑉×𝑁 𝑦 =π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯(𝑧) π‘Š 𝑉×𝑁 β€² Γ— 𝑣 =𝑧 π‘Š 𝑉×𝑁 Input layer Hidden layer cat
1 … Hidden layer cat π‘Š 𝑉×𝑁 Output layer 1 … V-dim π‘Š 𝑉×𝑁 β€² Γ— 𝑣 =𝑧 𝑦 =π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯(𝑧) 1 … 𝑣 π‘Š 𝑉×𝑁 on N-dim 𝑦 sat V-dim V-dim N will be the size of word vector

14 π‘Š 𝑉×𝑁 π‘Š 𝑉×𝑁 β€² Γ— 𝑣 =𝑧 𝑦 =π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯(𝑧) π‘Š 𝑉×𝑁 Input layer
1 … We would prefer 𝑦 close to 𝑦 π‘ π‘Žπ‘‘ Hidden layer cat π‘Š 𝑉×𝑁 Output layer 1 … 0.01 0.02 0.00 0.7 … V-dim π‘Š 𝑉×𝑁 β€² Γ— 𝑣 =𝑧 𝑦 =π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯(𝑧) 1 … 𝑣 π‘Š 𝑉×𝑁 on N-dim 𝑦 sat V-dim V-dim N will be the size of word vector 𝑦

15 π‘Š 𝑉×𝑁 𝑇 0.1 2.4 1.6 1.8 0.5 0.9 … 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 Contain word’s vectors Input layer 1 … xcat Output layer π‘Š 𝑉×𝑁 1 … V-dim π‘Š 𝑉×𝑁 β€² sat 1 … π‘Š 𝑉×𝑁 W contains input word vectors. W’ contains output word vectors. We can consider either W or W’ as the word’s representation. Or even take the average. V-dim Hidden layer xon N-dim V-dim We can consider either W or W’ as the word’s representation. Or even take the average.

16 Some interesting results

17 Word analogies

18 Represent the meaning of sentence/text
Paragraph vector (2014, Quoc Le, Mikolov) Extend word2vec to text level Also two models: add paragraph vector as the input

19 Applications Search, e.g., query expansion Sentiment analysis
Classification Clustering

20 Resources Stanford CS224d: Deep Learning for NLP
The best β€œword2vec Parameter Learning Explained”, Xin Rong Word2Vec Tutorial - The Skip-Gram Model tutorial-the-skip-gram-model/

21 Represent the meaning of sentence/text
Recursive neural network: Need a parse-tree Convolutional neural network: Can go beyond representation by end-to- end model: text οƒ  class


Download ppt "Vector Representation of Text"

Similar presentations


Ads by Google