Download presentation
Presentation is loading. Please wait.
1
Vector Representation of Text
Vagelis Hristidis Prepared with the help of Nhat Le Many slides are from Richard Socher, Stanford CS224d: Deep Learning for NLP
2
To compare pieces of text
We need effective representation of : Words Sentences Text Approach 1: Use existing thesauri or ontologies like WordNet and Snomed CT (for medical). Drawbacks: Manual Not context specific Approach 2: Use co-occurrences for word similarity. Drawbacks: Quadratic space needed Relative position and order of words not considered
3
Approach 3: low dimensional vectors
Store only βimportantβ information in fixed, low dimensional vector. Single Value Decomposition (SVD) on co-occurrence matrix π is the best rank k approximation to X , in terms of least squares Motel = [0.286, 0.792, , , 0.109, , 0.349, 0.271] m = n = size of vocabulary
4
Approach 3: low dimensional vectors
An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence, Rohde et al. 2005
5
Problems with SVD Computational cost scales quadratically for n x m matrix: O(mn2) flops (when n<m) Hard to incorporate new words or documents Does not consider order of words
6
word2vec Approach to represent the meaning of word
Represent each word with a low-dimensional vector Word similarity = vector similarity Key idea: Predict surrounding words of every word Faster and can easily incorporate a new sentence/document or add a word to the vocabulary
7
Represent the meaning of word β word2vec
2 basic neural network models: Continuous Bag of Word (CBOW): use a window of word to predict the middle word Skip-gram (SG): use a word to predict the surrounding ones in window.
8
Word2vec β Continuous Bag of Word
E.g. βThe cat sat on floorβ Window size = 2 the cat sat on floor
9
Index of cat in vocabulary
Input layer 1 β¦ Index of cat in vocabulary Hidden layer cat Output layer 1 β¦ one-hot vector one-hot vector sat 1 β¦ on
10
π πΓπ πβ² πΓπ π πΓπ We must learn W and Wβ Input layer Hidden layer cat
1 β¦ Hidden layer cat π πΓπ Output layer 1 β¦ V-dim πβ² πΓπ sat 1 β¦ π πΓπ N-dim V-dim on V-dim N will be the size of word vector
11
π πΓπ π Γ π₯ πππ‘ = π£ πππ‘ π πΓπ π Γ π₯ πππ‘ = π£ πππ‘ π πΓπ π Γ π₯ ππ = π£ ππ
π πΓπ π Γ π₯ πππ‘ = π£ πππ‘ 0.1 2.4 1.6 1.8 0.5 0.9 β¦ 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 1 β¦ 2.4 2.6 β¦ 1.8 Input layer 1 β¦ Γ = xcat Output layer π πΓπ π Γ π₯ πππ‘ = π£ πππ‘ 1 β¦ V-dim π£ = π£ πππ‘ + π£ ππ 2 + sat 1 β¦ π πΓπ π Γ π₯ ππ = π£ ππ V-dim Hidden layer xon N-dim V-dim
12
π πΓπ π Γ π₯ ππ = π£ ππ π πΓπ π Γ π₯ πππ‘ = π£ πππ‘ π πΓπ π Γ π₯ ππ = π£ ππ
π πΓπ π Γ π₯ ππ = π£ ππ 0.1 2.4 1.6 1.8 0.5 0.9 β¦ 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 1 β¦ 1.8 2.9 β¦ 1.9 Input layer 1 β¦ Γ = xcat Output layer π πΓπ π Γ π₯ πππ‘ = π£ πππ‘ 1 β¦ V-dim π£ = π£ πππ‘ + π£ ππ 2 + sat 1 β¦ π πΓπ π Γ π₯ ππ = π£ ππ V-dim Hidden layer xon N-dim V-dim
13
π πΓπ π¦ =π πππ‘πππ₯(π§) π πΓπ β² Γ π£ =π§ π πΓπ Input layer Hidden layer cat
1 β¦ Hidden layer cat π πΓπ Output layer 1 β¦ V-dim π πΓπ β² Γ π£ =π§ π¦ =π πππ‘πππ₯(π§) 1 β¦ π£ π πΓπ on N-dim π¦ sat V-dim V-dim N will be the size of word vector
14
π πΓπ π πΓπ β² Γ π£ =π§ π¦ =π πππ‘πππ₯(π§) π πΓπ Input layer
1 β¦ We would prefer π¦ close to π¦ π ππ‘ Hidden layer cat π πΓπ Output layer 1 β¦ 0.01 0.02 0.00 0.7 β¦ V-dim π πΓπ β² Γ π£ =π§ π¦ =π πππ‘πππ₯(π§) 1 β¦ π£ π πΓπ on N-dim π¦ sat V-dim V-dim N will be the size of word vector π¦
15
π πΓπ π 0.1 2.4 1.6 1.8 0.5 0.9 β¦ 3.2 2.6 1.4 2.9 1.5 3.6 6.1 0.6 2.7 1.9 2.0 1.2 Contain wordβs vectors Input layer 1 β¦ xcat Output layer π πΓπ 1 β¦ V-dim π πΓπ β² sat 1 β¦ π πΓπ W contains input word vectors. Wβ contains output word vectors. We can consider either W or Wβ as the wordβs representation. Or even take the average. V-dim Hidden layer xon N-dim V-dim We can consider either W or Wβ as the wordβs representation. Or even take the average.
16
Some interesting results
17
Word analogies
18
Represent the meaning of sentence/text
Paragraph vector (2014, Quoc Le, Mikolov) Extend word2vec to text level Also two models: add paragraph vector as the input
19
Applications Search, e.g., query expansion Sentiment analysis
Classification Clustering
20
Resources Stanford CS224d: Deep Learning for NLP
The best βword2vec Parameter Learning Explainedβ, Xin Rong Word2Vec Tutorial - The Skip-Gram Model tutorial-the-skip-gram-model/
21
Represent the meaning of sentence/text
Recursive neural network: Need a parse-tree Convolutional neural network: Can go beyond representation by end-to- end model: text ο class
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.