Presentation is loading. Please wait.

Presentation is loading. Please wait.

CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 School of Computing National.

Similar presentations


Presentation on theme: "CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 School of Computing National."— Presentation transcript:

1 CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 Email: xiangnan@comp.nus.edu.sg School of Computing National University of Singapore

2 Xiangnan He Introduction Motivations: – Users comment on items based on their own interests. – Most users’ interests are limited. – The categories of items can be inferred from the comments. Proposed problem: – Clustering items by exploiting user comments. Applications: – Improve search diversity. – Automatic tag generation from comments. – Group-based recommendation 2WING, NUS

3 Xiangnan He Challenges Traditional solution: – Represent items as a feature space. – Apply any clustering algorithm, e.g. k-means. Key challenges: – Items have heterogeneous features: 1.Own features (e.g. words for articles, pixels for images) 2.Comments  Usernames  Textual contents – Simply concatenate all features does not preform well. – How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)? 3WING, NUS

4 Xiangnan He Proposed solution Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering… 4WING, NUS

5 Xiangnan He NMF (Non-negative Matrix Factorization) 5WING, NUS Factorize data matrix V (#doc×#words) as: –where W is #doc × k and H is k × #words, and each entry is non- negative Alternating optimization: – With Lagrange multipliers, differentiate on W and H respectively. Local optimum, not global! Goal is minimizing the objective function: –where || || denotes the Frobenius norm

6 Xiangnan He Difference with SVD(LSI): Characteristics of NMF Matrix Factorization with a non-negative constraint Reduce the dimension of the data; derive the latent space CharacteristicSVDNMF Orthogonal basisYesNo Negative entryYesNo Post clusteringYesNo Theoretically proved suitable for clustering (Chis et al. 2005) Practically shown superior performance than SVD and k-means in document clustering (Xu et al. 2003)

7 Xiangnan He Extensions of NMF Relationships with other clustering algorithms: – K-means: Orthogonal NMF = K-means – PLSI: KL-Divergence NMF = PLSI – Spectral clustering Extensions: –Tri-factor of NMF( V = W S H ) (Ding et al. 2006) –NMF with sparsity constraints (Hoyer 2004) –NMF with graph regularization (Cai et al. 2011) – However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013) My proposal: – Extend NMF to support multi-view clustering 7WING, NUS

8 Xiangnan He Proposed solution - CoNMF Idea: – Couple the factorization process of NMF Example: – Single NMF:  Factorization equation :  Objective function:  Constraints: all entries of W and H are non-negative. 8WING, NUS - 2-view CoNMF:  Factorization equation:  Objective function:

9 Xiangnan He CoNMF Framework – Mutual-based:  Point-wise:  Cluster-wise: 9WING, NUS Objective function: –Similar alternating optimization with Lagrange multipliers can solve it. Coupling the factorization process of multiple matrices(i.e. views) via regularization. Different options of regularization: – Centroid-based (Liu et al. 2013):

10 Xiangnan He Experiments Last.fm dataset: 3-views: Ground-truth: – Music type of each artist provided by Last.fm Evaluation metrics: – Accuracy and F1 Average performance of 20 runs. 10WING, NUS #Items#Users#Comments#Clusters 9,694131,8982,500,27121 View#Items#FeaturesToken type Items-Desc. words9,69414,076TF – IDF Items-Comm. words9,69431,172TF – IDF Items-Users9,694131,898Boolean

11 Xiangnan He Statistics of datasets 11WING, NUS Statistics of #items/userStatistics of #clusters/user P(T<=3) = 0.6229 P(T<=5) = 0.8474 P(T<=10) = 0.9854 Verify our assumption: each user usually comments on limited music types.

12 Xiangnan He Experimental results (Accuracy) InitializationMethodDesc.Comm.UsersComb. Randomkmeans0.250.280.340.415 12WING, NUS SVD0.290.310.280.294 RandomNMF0.240.270.320.313 K-meansNMF0.260.320.400.417 K-means CoNMF – point 0.460 K-means CoNMF – cluster 0.420 NMF Multi-NMF (SDM'13) 0.369 Random MM-LDA (WSDM'09) 0.366 1. Users>Comm.>Desc., while combined is best. 2. SVD performs badly on users (non-textual). 3. Users>Comm.>Desc., while combined does worse. 4. Initialization is important for NMF. 5. CoNMF-point performs best. 6. Other two state-of-the-art baselines.

13 Xiangnan He Experimental results (F1) 13WING, NUS InitializationMethodDesc.Comm.UsersCombined Randomkmeans0.150.160.150.254 SVD0.25 0.240.249 RandomNMF0.130.180.210.216 K-meansNMF0.150.210.270.298 K-means CoNMF – point 0.320 K-means CoNMF – cluster 0.284 NMF Multi-NMF (SDM'13) 0.265 Random MM-LDA (WSDM'09) 0.286

14 Xiangnan He Conclusions Comments benefit clustering. Mining different views from the comments is important: – The two views (commenting words and users) contribute differently for clustering. – For this Last.fm dataset, users is more useful. – Combining all views works best. For NMF-based methods, initialization is important. 14WING, NUS

15 Xiangnan He Ongoing More experiments on other datasets. Improve the CoNMF framework through adding the sparseness constraints. The influence of normalization on CoNMF. 15WING, NUS

16 Xiangnan He Thanks! QA? 16WING, NUS

17 Xiangnan He References(I) Ding Chris, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In Proc. SIAM Data Mining Conf 2005. Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003 Chris Ding, Tao Li, Wei Peng. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006 Patrik O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004 Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011 Jialu Liu, Chi Wang, Jing Gao and Jiawei Han. 2013. Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13) 17WING, NUS


Download ppt "CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 School of Computing National."

Similar presentations


Ads by Google