Department of Automation Xiamen University

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Information retrieval – LSI, pLSI and LDA
Collaborative QoS Prediction in Cloud Computing Department of Computer Science & Engineering The Chinese University of Hong Kong Hong Kong, China Rocky.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.
Title: The Author-Topic Model for Authors and Documents
Supervisor: Associate Prof. Jiuyong Li(John) Student: Kang Sun Date: 28 th May 2010.
Ilias Theodorakopoulos PhD Candidate
Recommender Systems – An Introduction Dietmar Jannach, Markus Zanker, Alexander Felfernig, Gerhard Friedrich Cambridge University Press Which digital.
Lecture 14: Collaborative Filtering Based on Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
ACM Multimedia th Annual Conference, October , 2004
Top-N Recommendation Algorithm Based on Item-Graph
Modeling User Rating Profiles For Collaborative Filtering
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
Sparsity, Scalability and Distribution in Recommender Systems
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
1 Collaborative Filtering: Latent Variable Model LIU Tengfei Computer Science and Engineering Department April 13, 2011.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
POTENTIAL RELATIONSHIP DISCOVERY IN TAG-AWARE MUSIC STYLE CLUSTERING AND ARTIST SOCIAL NETWORKS Music style analysis such as music classification and clustering.
Item-based Collaborative Filtering Recommendation Algorithms
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
1 Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi,
LCARS: A Location-Content-Aware Recommender System
서울대학교 컴퓨터공학부 바이오지능 연구실 2014 Spring Semester Course Instructor: Prof. Byoung-Tak Zhang TAs: Ha-Young Jang & Beom-Jin Lee Classroom: , Time:
Collaborative Filtering Recommendation Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Agenda  Summary and outlook –Summary –Outlook –References.
1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
Clustering-based Collaborative filtering for web page recommendation CSCE 561 project Proposal Mohammad Amir Sharif
A Hybrid Recommender System: User Profiling from Keywords and Ratings Ana Stanescu, Swapnil Nagar, Doina Caragea 2013 IEEE/WIC/ACM International Conferences.
2009 IEEE Symposium on Computational Intelligence in Cyber Security 1 LDA-based Dark Web Analysis.
1 Linmei HU 1, Juanzi LI 1, Zhihui LI 2, Chao SHAO 1, and Zhixing LI 1 1 Knowledge Engineering Group, Dept. of Computer Science and Technology, Tsinghua.
Presented By :Ayesha Khan. Content Introduction Everyday Examples of Collaborative Filtering Traditional Collaborative Filtering Socially Collaborative.
Google News Personalization: Scalable Online Collaborative Filtering
Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.
1 Social Networks and Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
The Effect of Dimensionality Reduction in Recommendation Systems
Collaborative Data Analysis and Multi-Agent Systems Robert W. Thomas CSCE APR 2013.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Topic Modeling using Latent Dirichlet Allocation
WSP: A Network Coordinate based Web Service Positioning Framework for Response Time Prediction Jieming Zhu, Yu Kang, Zibin Zheng and Michael R. Lyu The.
Collaborative Filtering Zaffar Ahmed
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Project Seminar on STABLE CLUSTERING ALGORITHM TO IDENTIFY CPU USAGE OF COMPUTERS BEHAVIOR IN GRID ENVIRONMENT Under the guidance of Prof. Lakshmi Rajamani.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences Lu Bai,
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
Community-Based Link Prediction/Recommendation in the Bipartite Network of BoardGameGeek.com Brett Boge CS 765 University of Nevada, Reno.
Venue Recommendation: Submitting your Paper with Style Zaihan Yang and Brian D. Davison Department of Computer Science and Engineering, Lehigh University.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Collaborative Filtering and Recommender Systems Brian Lewis INF 385Q Knowledge Management Systems November 10, 2005.
Personalizing the Web Todd Lanning Project 1 - Presentation CSE 8331 Dr. M. Dunham.
Designing a framework For Recommender system Based on Interactive Evolutionary Computation Date : Mar 20 Sat, 2011 Project Number :
Information Overload on the Internet: The Web Mining Techniques Approach UNIVERSITI UTARA MALAYSIA COLLEGE OF ARTS AND SCIENCES RESEARCH METHODOLOGY (SZRZ6014)
CSE 4705 Artificial Intelligence
Term Project Proposal By J. H. Wang Apr. 7, 2017.
TJTS505: Master's Thesis Seminar
Data-Driven Educational Data Mining ---- the Progress of Project
Patrina Sili ID No. – S E01 Campus
Eick: Introduction Machine Learning
Advisor: Prof. Shou-de Lin (林守德) Student: Eric L. Lee (李揚)
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Christoph F. Eick: A Gentle Introduction to Machine Learning
Speech Enhancement Based on Nonparametric Factor Analysis
--WWW 2010, Hongji Bao, Edward Y. Chang
Presentation transcript:

Missing Value Prediction Using Co-clustering and RBF for Collaborative Filtering Department of Automation Xiamen University Youchun Ji, Wenxing Hong*, Jianwei Qi November, 2015 Good morning, everyone. My name is Ji Youchun. It is a privilege to have the opportunity to report on behalf of our group. The title of my presentation is Missing Value Prediction Using Co-clustering and RBF for Collaborative Filtering.

I come from the beautiful coastal city Xiamen I come from the beautiful coastal city Xiamen. Our school Xiamen University is known as one of the most beautiful universities in China. 1

Case Website 5501.cn i.xmrc.com.cn 17du.info Interest Expert finding Job recommendation News recommendation My major is automation. Our group focus on studying datamining, recommendation system, and so on. Our case include the fist one expert finding, the second one job recommendation, and the third one news recommendation. I focus on news recommendation. Our research have done on how to modeling the users, how to improve the precision of recommendation and how to alleviate the problem of sparsity. Today I want to talk about our work on how to reduce the sparseness of user-item rating matrix. 2012-2014 2014-now 2014-now 2

Outline 1 Introduction 2 The Problem Definition 3 Algorithms & Experiments Conclusion 4 The outline of my talk is as follows. The first part I want to introduce the background of this research. The second part defines the problem. The third part suggests the algorithms for the problem and then introduces the experiments. Finally, a simple conclusion is given. 3

Introduction As we know, with the rapid development of the Internet, the quantity of news and information increase explosively. People can conveniently obtain information on the Internet, but produce a lot of redundant information at the same time. Help users find interesting articles that match the users’ preference as much as possible. Jannach, D., M. Zanker, A. Felfernig, &G. Friedrich, Recommender systems: an introduction. 2010: Cambridge University Press. Zheng, L., L. Li, W. Hong, &T. Li, PENETRATE: Personalized news recommendation using ensemble hierarchical clustering. Expert Systems with Applications, 2013. 40(6): p. 2127-2136. Das, A.S., M. Datar, A. Garg, &S. Rajaram. Google news personalization: scalable online collaborative filtering. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM. Breese, J.S., D. Heckerman, &C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. in Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. 1998. Morgan Kaufmann Publishers Inc. 4

Introduction Therefore, a variety of strategies are proposed to make choices about what to read. News recommendation systems automate some of these strategies with the goal of providing affordable, personal, and high-quality recommendation. Nowadays, approaches oriented from content filtering, collaborative filtering are proposed and widely used by existing news recommender systems. Collaborative filtering is one of the most successful methods for news recommendation systems. Collaborative filtering is one of the most successful methods for news recommendation systems. Pazzani, M.J., A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 1999. 13(5-6): p. 393-408. Huang, Z., H. Chen, &D. Zeng, Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 116-142. Hofmann, T., Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 89-115. Blei, D.M., A.Y. Ng, &M.I. Jordan, Latent dirichlet allocation. The Journal of machine Learning research, 2003. 3: p. 993-1022. 5

Motivation Scenario 2 Scenario 3 Scenario 1 In order to overcome the problem, we predict the values of user-item rating matrix combining two approaches: co-clustering and Radial Basis Function network (RBF). The sparsity of user-item rating matrix will lead to the negative effect of collaborative filtering algorithm. The number of news which users have read is far less than the news published on the website. But, in reality, the number of news which users have read is far less than the news published on the website, and the latest published news do not have enough evaluation ratings. The sparsity of user-item rating matrix will reduce the performance of collaborative filtering algorithm in news recommendation system. In order to overcome the problem, we predict the values of user-item rating matrix combining two approaches: co-clustering and Radial Basis Function network (RBF). 6 Zhang, S., W. Wang, J. Ford, &F. Makedon. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. in SDM. 2006. SIAM. Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. ACM.

Outline 1 Introduction 2 The Problem Definition 3 Algorithms & Experiments Conclusion 4 These are the motivation of our research. But how to defines the problem? 7

The Problem Definition           It is defined as an optimization problem. The optimization goal is minimizing the Root Mean Square Error of R and R^' apostrophe. The process of news recommendation is the process of missing value prediction. We predict the missing values in user-item rating matrix, and recommend the top N rating news that the users have not read. Using a matrix representation, we transform this problem to a weighted matrix approximation problem. U is the set of users and V is the set of news. is the real rating of the user i for the news j. ’apostrophe is the prediction value. We can use this mathematical model to describe the problem. 8 George, T., &S. Merugu. A scalable collaborative filtering framework based on co-clustering. in Data Mining, Fifth IEEE International Conference on. 2005. IEEE.

Outline 1 Introduction 2 The Problem Definition 3 Algorithms & Experiments Conclusion 4 And how do we solve this problem? In the paper, we combine co-clustering and RBF to predict the missing values of the user-item rating matrix. 9

Data sample – Reading History News ID v1 v2 v3 v4 v5 v6 v7 v8 u1 3 u2 2 u3 u4 4 u5 1 u6 5 u7 u8 u9 u10 u11 u12 u13 User ID This is the data sample of our experiment. The data sample comes from the log of Xiamen University news reading website. We assume that the times a user access to a page represent the preference of the user. And the IP address represents users’ ID. We calculate the item ratings by the users’ click times .For example, we choose 13 users and 8 news. If an article is accessed by a reader 5 times is assigned the score of 5. Similarly, an article is accessed by a reader 1 times is assigned the score of 1. If an article haven’t been read by a reader, it is assigned the score of zero. The data sample like this. 10

Algorithms – Flow chart The figure shows the workflow of the proposed algorithm. The input is the user-item rating matrix and the number of clusters. Then co-cluster the matrix into several small matrix with high similarity. Then use Radial Basis Function network to predict the missing values. The output is the matrix after predicting. For example, user one user three and user five have the similar behavior in item one and item two. They can cluster the similar users and items into a cluster. Then we choose this cluster. If user three haven’t read news one. We predict the score of user three for news one. The prediction value is 2. 11

Algorithms – Co-clustering 0.8756 0.7535 Now we introduce the details of the algorithm. The first part is co-clustering. Co-clustering simultaneous cluster the rows and columns of the user-item rating matrix and reduce the dimensions of the user-item rating matrix. It can cluster the rating matrix into some small matrices. The user or item in the same matrix have high similarity. Co-clustering calculate the probability of the first formula , it is the probability that the user-item rating belongs to the k cluster. It is based on the probability of the second formula that user belongs to the k cluster and the probability of the third formula that item belongs to the k cluster. We can know the probability of the forth formula that the probability of rating in the k cluster. We calculate the probability of the first formula until it converges. The procedure value input and output like this. Take user one, news one and the third cluster for example. The input of this one is 0.21, it calculate by these three formular ,the output is 0.8756. the value of the second formula is changing according to the first formula. 0.9400 0.2241 1. Hu, W., W. Yong-Ji, W. Zhe, W. Xiu-Li, et al., Two-Phase Collaborative Filtering Algorithm Based on Co-Clustering. Journal of Software. 21: p. 1042-1054 (in Chinese). 2. George, T., &S. Merugu. A scalable collaborative filtering framework based on co-clustering. in Data Mining, Fifth IEEE International Conference on. 2005. IEEE. 3. Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. ACM. 12

Algorithms - RBF The second part is the Radial Basis Function network. The input is one small matrix which has a high similarity after co-clustering. And the expected output is the mean rating vector which only calculate the non-null values. The output of the network is a scalar function of the input vector. It is given by We used Gaussian Function as the radial basis function. Because the value is changing according to the input matrix, so I don’t want to describe in detail, 13 1. https://en.wikipedia.org/wiki/Radial_basis_function_network. 2. Fuliang, X., &Z. Huiying, A Research of Collaborative Filtering Recommender MethodBased on SOM and RBFN Filling Missing Values. XIANDAI TUSHU QINGBAO JISHU, 2014. 7/8: p. 56-63 (in Chinese).

Platform - http://yiqidu.xmu.edu.cn/ In order to do the experiments, we built an online website yiqidu. The platform like this. It includes the function of search and login. And show the campus news ,popular news and recommendation list. It is an online system. Welcome to visit the website. We are also improving it. 14

Data set – XMU News Rating:: UserID:: NewsID:: News title The experiment data comes from Xiamen University news reading website which is focus on campus news. It includes 9502 users, 6372 news and 932640 rating. The sparseness of the user-item rating matrix is 98.46%. The data set was divided into testing set and training sets. The experiment data comes from Xiamen University news reading website which is focus on campus news. It includes about 9502 users, 6372 news and 932640 rating. The sparseness of the user-item rating matrix is 98.46%. The data set was divided into testing set and training set. This is the original fragments of data. It includes UserID News ID, News title and the rating. Rating:: UserID:: NewsID:: News title 15 1. Jiang, S., &W. Hong. A vertical news recommendation system: CCNS—An example from Chinese campus news reading system. in Computer Science & Education (ICCSE), 2014 9th International Conference on. 2014. IEEE.

Experiments The number of co-clustering is 36 in the experiment. After prediction the missing values, the sparseness of the user-item rating matrix reduce to about 60%. sparseness Before >0.95 The number of co-clustering is 36 in the experiment. After prediction the missing values, the sparseness of the user-item rating matrix reduce from 98% to about 60%. After <0.65 co- clustering number 16

Experiments As the experiment result shows, the prediction method that combine co-clustering and RBF work effective on XMUNEWS data set. The root mean square error is 1.553. Algorithm RMSE Time(s) Co-clustering 2.455 40 RBF 2.092 150 Co-clustering & RBF 1.553 330 As the experiment result shows, the prediction method that combine co-clustering and RBF work effective on XMUNEWS data set. The root mean square error is 1.553. Although the algorithm cost more time, we can predict the missing value offline. We concentrate on the Root Mean Square Error. We observe that the combing method is better than the individual algorithms on XMUNEWS data set. 17

Outline 1 Introduction 2 The Problem Definition 3 Algorithms & Experiments Conclusion 4 Finally, I want to conclude my presentation. 18

Conclusion Before prediction, the sparseness of the user-item rating matrix is above 96%. But after prediction, it reduce to below 60%. The root mean square error of true rating values and prediction rating values is 1.553 on XMUNEWS data set. As the experiment result shows, the combining algorithm is better than the separate algorithm. We built an online website to collect data and do experiments (http://yiqidu.xmu.edu.cn/). For future work, we will concentrate on how to improve the computational efficiency and how to choose the number of clusters. we use co-clustering and Radial Basis Function network (RBF) to predict the missing values of user-item rating matrix. Before prediction, the sparseness of the user-item rating matrix is above 96%. But after prediction, it reduce to below 60%. The root mean square error of true rating values and prediction rating values is 1.553 on XMUNEWS data set. As the experiment result shows, the combining algorithm is better than the separate algorithm. We built an online website to collect data and do experiments (http://yiqidu.xmu.edu.cn/). For future work, we will concentrate on how to improve the computational efficiency and how to choose the number of clusters. 19

References Jannach, D., M. Zanker, A. Felfernig, &G. Friedrich, Recommender systems: an introduction. 2010: Cambridge University Press. Zheng, L., L. Li, W. Hong, &T. Li, PENETRATE: Personalized news recommendation using ensemble hierarchical clustering. Expert Systems with Applications, 2013. 40(6): p. 2127-2136. Das, A.S., M. Datar, A. Garg, &S. Rajaram. Google news personalization: scalable online collaborative filtering. in Proceedings of the 16th international conference on World Wide Web. 2007. ACM. Breese, J.S., D. Heckerman, &C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. in Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence. 1998. Morgan Kaufmann Publishers Inc. Pazzani, M.J., A framework for collaborative, content-based and demographic filtering. Artificial Intelligence Review, 1999. 13(5-6): p. 393-408. Huang, Z., H. Chen, &D. Zeng, Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 116-142. Hofmann, T., Latent semantic models for collaborative filtering. ACM Transactions on Information Systems (TOIS), 2004. 22(1): p. 89-115. Blei, D.M., A.Y. Ng, &M.I. Jordan, Latent dirichlet allocation. The Journal of machine Learning research, 2003. 3: p. 993-1022. Zhang, S., W. Wang, J. Ford, &F. Makedon. Learning from Incomplete Ratings Using Non-negative Matrix Factorization. in SDM. 2006. SIAM. Dhillon, I.S. Co-clustering documents and words using bipartite spectral graph partitioning. in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. 2001. ACM. George, T., &S. Merugu. A scalable collaborative filtering framework based on co-clustering. in Data Mining, Fifth IEEE International Conference on. 2005. IEEE. https://en.wikipedia.org/wiki/Radial_basis_function_network. Fuliang, X., &Z. Huiying, A Research of Collaborative Filtering Recommender MethodBased on SOM and RBFN Filling Missing Values. XIANDAI TUSHU QINGBAO JISHU, 2014. 7/8: p. 56-63 (in Chinese). Jiang, S., &W. Hong. A vertical news recommendation system: CCNS—An example from Chinese campus news reading system. in Computer Science & Education (ICCSE), 2014 9th International Conference on. 2014. IEEE.  20

Thanks! Q&A Acknowledgment The research was supported by the National Natural Science Foundation of China under Grant No.61303081 and by the Fundamental Research Funds for the Xiamen University under Grant No. 20720152008. Thanks! Q&A Ok, these are my representation. Once again I would like to thank you for the opportunity of talking to you on the subject.   Do you have any question? 21