Download presentation
Presentation is loading. Please wait.
1
RecTree: An Efficient Collaborative Filtering Method 2004.5.4
2
Outline 1.Introduction1.Introduction 2.Group average, Memory-based algorithm2.Group average, Memory-based algorithm 3.RecTree Algorithm3.RecTree Algorithm 4.Experiments and Comparisons4.Experiments and Comparisons 5.Future work5.Future work 6.Question & Answer6.Question & Answer
3
1. Introduction1. Introduction –Collaborative Filtering Can recommend items to active userCan recommend items to active user Based on others ratesBased on others rates –Application Two e-commerce sites integrates the CF: Amazon & CDNowTwo e-commerce sites integrates the CF: Amazon & CDNow Recommend books, music and moviesRecommend books, music and movies –An example
4
–Group average algorithm Assumes that all advisors are equally trusted and consequently.Assumes that all advisors are equally trusted and consequently. A common set of friends of Sam and BazA common set of friends of Sam and Baz Predict a movie by computing average rating of their friendsPredict a movie by computing average rating of their friends Shortcomings: same recommendation to all active users.Shortcomings: same recommendation to all active users. –Memory-based algorithm Use history recordUse history record Identify similar advisorIdentify similar advisor Sum advisors rating to recommendSum advisors rating to recommend 2. Group average algorithm & Memory-based algorithm
5
–Correlation-based CF (corrCF)] Pearson correlationPearson correlation –Pair-wise similarity –Pair-wise similarity w u,a –Standard deviation, u, a – –Items set user and advisor rated: Yu,a PredictionPrediction –Alpha is a constant –N, n matrix, Thus O(n 2 )
6
An open issueAn open issue –Sparsity problem Rating density is low, it is hard to produce accurate recommendationRating density is low, it is hard to produce accurate recommendation Two approach:Two approach: –Force very one vote for all items in a date set. –OLAP used as a simple way of filtering
7
3.RecTree3.RecTree –Solve the scalability problem by divide-and-conquer approach –Dynamically create a hierarchy of cliques of users, users within a clique is more similar to each other –Seek advisors from the cliques that user belongs to –Yielding a higher overall accuracy.
8
RecTree AlgorithmRecTree Algorithm –Partitions data into cliques of similar users –Recursively splitting the dataset –Maximize the intra-partition similarity –Minimize the inter-partition similarity Resembles a binary treeResembles a binary tree Within leaf node, computes a similarity matrix.Within leaf node, computes a similarity matrix. Predicts by taking a weighted deviation from cliques advisor ratingsPredicts by taking a weighted deviation from cliques advisor ratings
9
Lemma 1Lemma 1 The training time for a partitioned memory-based collaborative filter is O(nb), where n is the dataset size if the partitions are approximately b in size and there are n/b partitions. Growing the TreeGrowing the Tree due to sparsity and low cost of RAM, a fast in-memory clustering algorithm KMeans is used here. –KMeans Select k initial seedsSelect k initial seeds Assign users to the clusterAssign users to the cluster Recompute the centroid of the clustersRecompute the centroid of the clusters Reassign users to the new clusterReassign users to the new cluster –Not an optimal clustering algorithm but guarantee the fast training. –ConstructRecTree() Three thresholdThree threshold –Partition size –Branch depth –Current branch depth
10
Algorithm 1. ConstructRecTree(parent, dataSet, partitionMaxSize, curDepth, maxDepth) Input: parent is the parent node from which the dataSet originates. partitionMaxSize is the maximum partition size and curDepth is the depth of the current branch. maxDepth is the maximum tree depth. Output: The RecTree. Method: 1. Create a node and link it to parent. 2. Assign dataSet to node. 3. If SizeOf(dataSet) ≤ partitionMaxSize OR curDepth > maxDepth then ComputeCorrelationMatrix(dataSet); RETURN. 4. curDepth++. 5. Call KMeans (dataSet, numberofClusters=2) 6. For each child cluster resulting from KMeans: Call ConstructRecTree(node,childClusterDataSet, partitionMaxSize, curDepth, maxDepth).
11
Lemma 2. The RecTree data structure is constructed in O(gnlog2(n/b)), if the maximum depth is glog2(n). n is the dataset size, b is the maximum partition size and g is a constant. Generating a RecommendationsGenerating a Recommendations Without default voting, Bea and Dan are indistinguishableWithout default voting, Bea and Dan are indistinguishable With default voting, Dan is more similar to Sam than BeaWith default voting, Dan is more similar to Sam than Bea ConstructRecTree divide uses into two groups and capture users tasteConstructRecTree divide uses into two groups and capture users taste
12
4. Experiments4. Experiments –The Off-line/On-line Execution Time The off-line and on-line execution time for RecTree as a function of the number of users and maximum partition size, b RecTree’s on-line performance is independent of the number of users already in the system.
13
Normalized Mean Absolute Error (NMAE)Normalized Mean Absolute Error (NMAE) The accuracy of RecTree is a function of number of users and maximum partition size, b, see figure5 RecTree’s accuracy is due in part to the success of the partitioning phase in localizing highly correlated users in the same partition.
14
5.Future work5.Future work – –how the internal nodes of the RecTree data structure can contribute to even more accurate predictions. – –Parallelize computing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.