CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 School of Computing National.

Slides:

Advertisements

Similar presentations

Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.

Advertisements

Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.

Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach Xiaoli Zhang Fern, Carla E. Brodley ICML’2003 Presented by Dehong Liu.

Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Probabilistic Clustering-Projection Model for Discrete Data

An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.

Robust Object Tracking via Sparsity-based Collaborative Model

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.

Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Non-Negative Tensor Factorization with RESCAL Denis Krompaß 1, Maximilian Nickel 1, Xueyan Jiang 1 and Volker Tresp 1,2 1 Department of Computer Science.

Image Denoising via Learned Dictionaries and Sparse Representations

Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce Chao Liu, Hung-chih Yang, Jinliang Fan, Li-Wei He, Yi-Min.

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

HCC class lecture 14 comments John Canny 3/9/05. Administrivia.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo.

1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.

POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.

Joint Image Clustering and Labeling by Matrix Factorization

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,

Presented By Wanchen Lu 2/25/2013

Non Negative Matrix Factorization

Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.

Multi-task Low-rank Affinity Pursuit for Image Segmentation Bin Cheng, Guangcan Liu, Jingdong Wang, Zhongyang Huang, Shuicheng Yan (ICCV’ 2011) Presented.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

Data Mining and Machine Learning Lab Unsupervised Feature Selection for Linked Social Media Data Jiliang Tang and Huan Liu Computer Science and Engineering.

Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.

Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

1 Robust Nonnegative Matrix Factorization Yining Zhang

Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.

Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.

SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON.

No. 1 Knowledge Acquisition from Documents with both Fixed and Free Formats* Shigeich Hirasawa Department of Industrial and Management Systems Engineering.

Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):

Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.

Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.

An Efficient Greedy Method for Unsupervised Feature Selection

Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun

Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,

Non-negative Matrix Factorization

Matrix Factorization and its applications By Zachary 16 th Nov, 2010.

Adaptive Multi-view Clustering via Cross Trace Lasso

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Wei Xu,

Unsupervised Streaming Feature Selection in Social Media

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Ultra-high dimensional feature selection Yun Li

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.

Cross-modal Hashing Through Ranking Subspace Learning

Designing a framework For Recommender system Based on Interactive Evolutionary Computation Date : Mar 20 Sat, 2011 Project Number :

Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta

Term Project Proposal By J. H. Wang Apr. 7, 2017.

Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu

Document Clustering Based on Non-negative Matrix Factorization

A Consensus-Based Clustering Method

Community Distribution Outliers in Heterogeneous Information Networks

Spectral Clustering Eric Xing Lecture 8, August 13, 2010

Sparse Learning Based on L2,1-norm

Asymmetric Transitivity Preserving Graph Embedding

Non-Negative Matrix Factorization

Improving Cross-lingual Entity Alignment via Optimal Transport

Presentation transcript:

CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National University of Singapore

Xiangnan He Introduction Motivations: – Users comment on items based on their own interests. – Most users’ interests are limited. – The categories of items can be inferred from the comments. Proposed problem: – Clustering items by exploiting user comments. Applications: – Improve search diversity. – Automatic tag generation from comments. – Group-based recommendation 2WING, NUS

Xiangnan He Challenges Traditional solution: – Represent items as a feature space. – Apply any clustering algorithm, e.g. k-means. Key challenges: – Items have heterogeneous features: 1.Own features (e.g. words for articles, pixels for images) 2.Comments  Usernames  Textual contents – Simply concatenate all features does not preform well. – How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)? 3WING, NUS

Xiangnan He Proposed solution Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering… 4WING, NUS

Xiangnan He NMF (Non-negative Matrix Factorization) 5WING, NUS Factorize data matrix V (#doc×#words) as: –where W is #doc × k and H is k × #words, and each entry is nonnegative Alternating optimization: – With Lagrange multipliers, differentiate on W and H respectively. Local optimum, not global! Goal is minimizing the objective function: –where || || denotes the Frobenius norm

Xiangnan He Difference with SVD(LSI): Characteristics of NMF Matrix Factorization with a non-negative constraint Reduce the dimension of the data; derive the latent space CharacteristicSVDNMF Orthogonal basisYesNo Negative entryYesNo Post clusteringYesNo Theoretically proved suitable for clustering (Chis et al. 2005) Practically shown superior performance than SVD and k-means in document clustering (Xu et al. 2003)

Xiangnan He Extensions of NMF Relationships with other clustering algorithms: – K-means: Orthogonal NMF = K-means – PLSI: KL-Divergence NMF = PLSI – Spectral clustering Extensions: –Tri-factor of NMF( V = W S H ) (Ding et al. 2006) –NMF with sparsity constraints (Hoyer 2004) –NMF with graph regularization (Cai et al. 2011) – However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013) My proposal: – Extend NMF to support multi-view clustering 7WING, NUS

Xiangnan He Proposed solution - CoNMF Idea: – Couple the factorization process of NMF Example: – Single NMF:  Factorization equation ：  Objective function:  Constraints: all entries of W and H are non-negative. 8WING, NUS - 2-view CoNMF:  Factorization equation:  Objective function:

Xiangnan He CoNMF Framework – Mutual-based:  Point-wise:  Cluster-wise: 9WING, NUS Objective function: –Similar alternating optimization with Lagrange multipliers can solve it. Coupling the factorization process of multiple matrices(i.e. views) via regularization. Different options of regularization: – Centroid-based (Liu et al. 2013):

Xiangnan He Experiments Last.fm dataset: 3-views: Ground-truth: – Music type of each artist provided by Last.fm Evaluation metrics: – Accuracy and F1 Average performance of 20 runs. 10WING, NUS #Items#Users#Comments#Clusters 9,694131,8982,500,27121 View#Items#FeaturesToken type Items-Desc. words9,69414,076TF – IDF Items-Comm. words9,69431,172TF – IDF Items-Users9,694131,898Boolean

Xiangnan He Statistics of datasets 11WING, NUS Statistics of #items/userStatistics of #clusters/user P(T<=3) = P(T<=5) = P(T<=10) = Verify our assumption: each user usually comments on limited music types.

Xiangnan He Experimental results (Accuracy) InitializationMethodDesc.Comm.UsersComb. Randomkmeans WING, NUS SVD RandomNMF K-meansNMF K-means CoNMF – point K-means CoNMF – cluster NMF Multi-NMF (SDM'13) Random MM-LDA (WSDM'09) Users>Comm.>Desc., while combined is best. 2. SVD performs badly on users (non-textual). 3. Users>Comm.>Desc., while combined does worse. 4. Initialization is important for NMF. 5. CoNMF-point performs best. 6. Other two state-of-the-art baselines.

Xiangnan He Experimental results (F1) 13WING, NUS InitializationMethodDesc.Comm.UsersCombined Randomkmeans SVD RandomNMF K-meansNMF K-means CoNMF – point K-means CoNMF – cluster NMF Multi-NMF (SDM'13) Random MM-LDA (WSDM'09) 0.286

Xiangnan He Conclusions Comments benefit clustering. Mining different views from the comments is important: – The two views (commenting words and users) contribute differently for clustering. – For this Last.fm dataset, users is more useful. – Combining all views works best. For NMF-based methods, initialization is important. 14WING, NUS

Xiangnan He Ongoing More experiments on other datasets. Improve the CoNMF framework through adding the sparseness constraints. The influence of normalization on CoNMF. 15WING, NUS

Xiangnan He Thanks! QA? 16WING, NUS

Xiangnan He References(I) Ding Chris, Xiaofeng He, and Horst D. Simon On the equivalence of nonnegative matrix factorization and spectral clustering. In Proc. SIAM Data Mining Conf Wei Xu, Xin Liu, and Yihong Gong Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003 Chris Ding, Tao Li, Wei Peng Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006 Patrik O. Hoyer Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004 Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell Jialu Liu, Chi Wang, Jing Gao and Jiawei Han Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13) 17WING, NUS