RecTree: An Efficient Collaborative Filtering Method 2004.5.4.

Slides:



Advertisements
Similar presentations
Recommender Systems & Collaborative Filtering
Advertisements

Google News Personalization: Scalable Online Collaborative Filtering
Item Based Collaborative Filtering Recommendation Algorithms
Random Forest Predrag Radenković 3237/10
Clustering Categorical Data The Case of Quran Verses
Fast Algorithms For Hierarchical Range Histogram Constructions
Decision Tree.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
1 RegionKNN: A Scalable Hybrid Collaborative Filtering Algorithm for Personalized Web Service Recommendation Xi Chen, Xudong Liu, Zicheng Huang, and Hailong.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Lecture 14: Collaborative Filtering Based on Breese, J., Heckerman, D., and Kadie, C. (1998). Empirical analysis of predictive algorithms for collaborative.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.
Ensemble Learning: An Introduction
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Topic-Specific Recommendation An Approach to Greater Prediction Diversity and Accuracy Minho Kim Brian Tran CS 345a.
Lecture 5 (Classification with Decision Trees)
Sparsity, Scalability and Distribution in Recommender Systems
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Ensemble Learning (2), Tree and Forest
Item-based Collaborative Filtering Recommendation Algorithms
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Gene expression & Clustering (Chapter 10)
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Chapter 9 – Classification and Regression Trees
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
1 Social Networks and Collaborative Filtering Qiang Yang HKUST Thanks: Sonny Chee.
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering IDS Lab. Seminar Spring 2009 강 민 석강 민 석 May 21 st, 2009 Nathan.
Collaborative Filtering  Introduction  Search or Content based Method  User-Based Collaborative Filtering  Item-to-Item Collaborative Filtering  Using.
Clustering.
A more efficient Collaborative Filtering method Tam Ming Wai Dr. Nikos Mamoulis.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Cosine Similarity Item Based Predictions 77B Recommender Systems.
Pearson Correlation Coefficient 77B Recommender Systems.
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Recommendation Algorithms for E-Commerce. Introduction Millions of products are sold over the web. Choosing among so many options is proving challenging.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Lecture Notes for Chapter 4 Introduction to Data Mining
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Experimental Study on Item-based P-Tree Collaborative Filtering for Netflix Prize.
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Slope One Predictors for Online Rating-Based Collaborative Filtering Daniel Lemire, Anna Maclachlan In SIAM Data Mining (SDM’05), Newport Beach, California,
Item-Based Collaborative Filtering Recommendation Algorithms
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Dense-Region Based Compact Data Cube
Recommender Systems & Collaborative Filtering
Trees, bagging, boosting, and stacking
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Q4 : How does Netflix recommend movies?
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

RecTree: An Efficient Collaborative Filtering Method

Outline 1.Introduction1.Introduction 2.Group average, Memory-based algorithm2.Group average, Memory-based algorithm 3.RecTree Algorithm3.RecTree Algorithm 4.Experiments and Comparisons4.Experiments and Comparisons 5.Future work5.Future work 6.Question & Answer6.Question & Answer

1. Introduction1. Introduction –Collaborative Filtering Can recommend items to active userCan recommend items to active user Based on others ratesBased on others rates –Application Two e-commerce sites integrates the CF: Amazon & CDNowTwo e-commerce sites integrates the CF: Amazon & CDNow Recommend books, music and moviesRecommend books, music and movies –An example

–Group average algorithm Assumes that all advisors are equally trusted and consequently.Assumes that all advisors are equally trusted and consequently. A common set of friends of Sam and BazA common set of friends of Sam and Baz Predict a movie by computing average rating of their friendsPredict a movie by computing average rating of their friends Shortcomings: same recommendation to all active users.Shortcomings: same recommendation to all active users. –Memory-based algorithm Use history recordUse history record Identify similar advisorIdentify similar advisor Sum advisors rating to recommendSum advisors rating to recommend 2. Group average algorithm & Memory-based algorithm

–Correlation-based CF (corrCF)] Pearson correlationPearson correlation –Pair-wise similarity –Pair-wise similarity w u,a –Standard deviation, u, a – –Items set user and advisor rated: Yu,a PredictionPrediction –Alpha is a constant –N, n matrix, Thus O(n 2 )

An open issueAn open issue –Sparsity problem Rating density is low, it is hard to produce accurate recommendationRating density is low, it is hard to produce accurate recommendation Two approach:Two approach: –Force very one vote for all items in a date set. –OLAP used as a simple way of filtering

3.RecTree3.RecTree –Solve the scalability problem by divide-and-conquer approach –Dynamically create a hierarchy of cliques of users, users within a clique is more similar to each other –Seek advisors from the cliques that user belongs to –Yielding a higher overall accuracy.

RecTree AlgorithmRecTree Algorithm –Partitions data into cliques of similar users –Recursively splitting the dataset –Maximize the intra-partition similarity –Minimize the inter-partition similarity Resembles a binary treeResembles a binary tree Within leaf node, computes a similarity matrix.Within leaf node, computes a similarity matrix. Predicts by taking a weighted deviation from cliques advisor ratingsPredicts by taking a weighted deviation from cliques advisor ratings

Lemma 1Lemma 1 The training time for a partitioned memory-based collaborative filter is O(nb), where n is the dataset size if the partitions are approximately b in size and there are n/b partitions. Growing the TreeGrowing the Tree due to sparsity and low cost of RAM, a fast in-memory clustering algorithm KMeans is used here. –KMeans Select k initial seedsSelect k initial seeds Assign users to the clusterAssign users to the cluster Recompute the centroid of the clustersRecompute the centroid of the clusters Reassign users to the new clusterReassign users to the new cluster –Not an optimal clustering algorithm but guarantee the fast training. –ConstructRecTree() Three thresholdThree threshold –Partition size –Branch depth –Current branch depth

Algorithm 1. ConstructRecTree(parent, dataSet, partitionMaxSize, curDepth, maxDepth) Input: parent is the parent node from which the dataSet originates. partitionMaxSize is the maximum partition size and curDepth is the depth of the current branch. maxDepth is the maximum tree depth. Output: The RecTree. Method: 1. Create a node and link it to parent. 2. Assign dataSet to node. 3. If SizeOf(dataSet) ≤ partitionMaxSize OR curDepth > maxDepth then ComputeCorrelationMatrix(dataSet); RETURN. 4. curDepth Call KMeans (dataSet, numberofClusters=2) 6. For each child cluster resulting from KMeans: Call ConstructRecTree(node,childClusterDataSet, partitionMaxSize, curDepth, maxDepth).

Lemma 2. The RecTree data structure is constructed in O(gnlog2(n/b)), if the maximum depth is glog2(n). n is the dataset size, b is the maximum partition size and g is a constant. Generating a RecommendationsGenerating a Recommendations Without default voting, Bea and Dan are indistinguishableWithout default voting, Bea and Dan are indistinguishable With default voting, Dan is more similar to Sam than BeaWith default voting, Dan is more similar to Sam than Bea ConstructRecTree divide uses into two groups and capture users tasteConstructRecTree divide uses into two groups and capture users taste

4. Experiments4. Experiments –The Off-line/On-line Execution Time The off-line and on-line execution time for RecTree as a function of the number of users and maximum partition size, b RecTree’s on-line performance is independent of the number of users already in the system.

Normalized Mean Absolute Error (NMAE)Normalized Mean Absolute Error (NMAE) The accuracy of RecTree is a function of number of users and maximum partition size, b, see figure5 RecTree’s accuracy is due in part to the success of the partitioning phase in localizing highly correlated users in the same partition.

5.Future work5.Future work – –how the internal nodes of the RecTree data structure can contribute to even more accurate predictions. – –Parallelize computing