Download presentation
Presentation is loading. Please wait.
Published byRodney Wilson Modified over 8 years ago
1
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA (2012239) VAIBHAV JAISWAL(2012249)
2
Problem The problem – To predict the opinion the user will have on the different items and be able to recommend the “best” items to each user.
3
Recommender System Apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services, usually during a live interaction.
4
Pseudo-Distributed Cluster We use Hadoop to split the set of users across n machines, copy the input data to each, and then run one Recommender on each machine to process recommendations for a subset of users.
5
Algorithm used
6
Item Based(Sequential) Input: User preferences for items Begin for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u 's preference for j, weighted by s, to a running average end for End for return the top items, ranked by weighted average End
7
Co-occurrence matrix (Parallel) It’ll compute the number of times each pair of items occurs together in some user’s list of preferences. The more two items turn up together, the more related or similar they probably are Note that the entries in the matrix aren’t affected by preference values
8
Co-occurrence Matrix
9
Computing user vectors In a data model with n items, user preferences are like a vector over n dimensions, with one dimension for each item. The user’s preference values for items are the values in the vector Items that the user expresses no preference for map to a 0 value in the vector. Such a vector is typically quite sparse, and mostly zeroes, because users typically express a preference for only a small subset of all items.
10
Producing Recommendation
11
MapReduce 1. Input is assembled in the form of many key-value (K1,V1) pairs, typically as input files on an HDFS instance. 2. A map function is applied to each (K1,V1) pair, which results in zero or more key-value pairs of a different kind (K2,V2).(Mapping) 3. All V2 for each K2 are combined, during shuffle and sort phase. 4. A reduce function is called for each K2 and all its associated V2, which results in zero or more key- value pairs of yet a different kind (K3,V3), output back to HDFS.(Reducing)
12
Translating to MapReduce: generating user vectors 1. Input files are treated as (Long,String) pairs by the framework, where the Long key is a position in the file and the String value is the line of the text file 2. Each line is parsed into a user ID and several item IDs by a map function. The function emits new key-value pairs: a user ID mapped to item ID, for each item ID. 3. The framework collects all item IDs that were mapped to each user ID together. 4. A reduce function constructs a Vector from all item IDs for the user, and outputs the user ID mapped to the user’s preference vector.
13
Calculating co- occurrence The next phase of the computation is another MapReduce that uses the output of the first MapReduce to compute co-occurrences. 1. Input is user IDs mapped to Vectors of user preferences—the output of the last MapReduce. 2. The map function determines all co- occurrences from one user’s preferences, and emits one pair of item IDs for each co- occurrence—item ID mapped to item ID. Both mappings, from one item ID to the other and vice versa, are recorded.
14
3. The framework collects, for each item, all co- occurrences mapped from that item. 4.The reducer counts, for each item ID, all co- occurrences that it receives and constructs a new Vector that represents all co- occurrences for one item with a count of the number of times they have co-occurred. These can be used as the rows—or columns—of the co-occurrence matrix.
15
Matrix Multiplication algorithm Begin Assign R to be the zero vector for each column i in the co-occurrence matrix multiply column vector i by the ith element of the user vector add this vector to R End for End
19
System configuration (Implemented on VMWare) Memory - 1 GB Hard Disk - 8 GB Processor- 1 Os - Ubuntu 10.10(32 bit)
20
No. of preferencesSequential(ms)Parallel(ms) 204431063 407734152 10013040012 20030057421 Results
21
Sequential
22
Conclusion The overhead of initializing the cluster, distributing the data and executable code, and marshalling the results is nontrivial. So the results will be better if it used for computing on large data with multiple machines in cluster or on cloud.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.