Download presentation
Presentation is loading. Please wait.
1
Socialized Word Embeddings
Ziqian Zeng1, Yichun Yin1,2, Yangqiu Song1 and Ming Zhang2 The Hong Kong University of Science and Technology1, Peking University2
2
Motivation Facts Everyone has his/her own personal characteristics of language use.
3
Motivation Facts Linguistic homophily: socially connected individuals tend to use language in similar ways. 1 2 Frequently Used Words Language feature tdd, mvc, linq acronyms anipals, pawsome, furever animal based puns kradam, glambert, glamily puns around pop star Adam Lambert Yi Yang et al. Overcoming Language Variation in Sentiment Analysis with Social Attention, 2016. Table is from Bryden John et al, Word usage mirrors community structure in the online social network Twitter, 2013 with partial deletion.
4
Motivation Facts Everyone has his/her own personal characteristics of language use. Linguistic homophily: socially connected individuals tend to use language in similar ways. 1 Goal Develop a word embeddings algorithm which can consider the facts. Yi Yang et al. Overcoming Language Variation in Sentiment Analysis with Social Attention, 2016.
5
CBOW Figure 1: Illustration of CBOW (1)
in the paper, we take CBOW as an example to introduce our algorithm. Figure 1: Illustration of CBOW Mikolov Tomas et al. Efficient estimation of word representations in vector space. 2013
6
Socialized Word Embeddings
Figure 2: Illustration of Socialized Word Embeddings
7
Socialized Word Embeddings
Personalized Fact Everyone has his/her own personal characteristics of language use. Figure 2: Illustration of Socialized Word Embeddings
8
Socialized Word Embeddings
Fact Linguistic homophily: socially connected individuals tend to use language in similar ways. Figure 2: Illustration of Socialized Word Embeddings
9
Personalization Notations Personalized CBOW users:
A word ’s context , where is the half window size. A corpus for user . Global word vector: ; Local user vector: . Vector representation of a word for user : Personalized CBOW (2)
10
Socialization Notations Socialized Regularization
, where is the number of friends of Socialized Regularization (3)
11
Socialization Personalized CBOW Socialized Regularization
(2) Socialized Regularization (3) Socialized Word Embeddings Trade-off Parameter (4) Constraint
12
Experiments Dataset – Yelp Challenge
Yelp Challenge Datasets contain billions of reviews and ratings for various businesses.
13
Experiments Perplexity We reported:
Perplexity is used to evaluate how good a model is to predict the current word based on several previous words. We reported: Perplexity trend with varied ( -norm constraint for user vectors) and fixed λ (socialized regularization parameter). Better
14
Experiments Perplexity trend with varied λ and fixed Better
15
Experiments SVM Sentiment Classification + average review user
classifier rating Figure 5. Illustration of the procedure of sentiment classification
16
Experiments Sentiment Classification – Head & Tail Statistics
Train on only head users or tail users. Select the users who contributed half of the total reviews as head users, and the other users are left as tail users. Statistics
17
Experiments Better
18
Experiments Better
19
Experiments User Vector as Attention
User attention vectors can improve sentiment classification. User vectors as fixed attention vectors. Comparison with the baseline without attention, and the upper bound with trainable attention. Figure 8: The architecture of User Product Attention based Neural Sentiment Classification model.1 Figure 8 is form Chen Huimin et al. Neural sentiment classification with user and product attention. 2016
20
Experiments Supervised Better Unsupervised
Figure 9: Comparison of our model and other baseline methods on user attention based deep learning for sentiment analysis.
21
Conclusion Representation: global word vector + local user vector Socialized regularization: friends’ user vectors should be similar. Thank You Figure 2. Illustration of Socialized Word Embeddings
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.