Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recommender System with Hadoop and Spark

Similar presentations


Presentation on theme: "Recommender System with Hadoop and Spark"— Presentation transcript:

1 Recommender System with Hadoop and Spark
Wu, Yongkai Zhang, Fengli

2 Hadoop Basically parallel processing framework running on cluster of commodity machines Stateless functional programming because processing of each row of data does not depend upon any other row or state Data is replicated, partitioned and resides in Hadoop Distributed File System (HDFS) Each partitions get processed by a separated mapper or reducer task

3 Item-based Collaborative Filtering
the similarities between different items in the dataset are calculated according to the items’ properties  these similarity values are used to recommend items to users One problems? If a lot of users like item A and B, but A and B have no similarity such as beer and baby diapers?

4 Item-Based Recommender using Co-occurrence similarity
Look at User 1 and 2 who have watched Movie 101, you will see that there is 2 cooccurrence of Movie 102

5

6 Implement of the recommender system
Built history matrix — contains the interactions between users and items as a user-by-item matrix Built co-occurrence matrix — transforms the history matrix into an item-by-item matrix, recording which items appear together in user histories Do recommendation

7 History matrix Map (key, value) key.set (User ID) Value.set (ItemID+
“:” + Rating) <1, (101: 5.0)> <1, (102:3.0)> Combine(key, list( value)) <1, ((101: 5.0), (102:3.0))> Reduce

8 Co-occurrence matrix transforms the history matrix into an item-by-item matrix, recording which items appear together in user histories

9 Indicator Matrix

10 Validation Select 20% users from the data set, and assume 20% movies have not been watched

11 Mahout Mahout  is a machine-learning library with tools for clustering, classification, and several types of recommenders, including tools to calculate most-similar items or build item recommendations for users. Mahout employs the Hadoop framework to distribute calculations across a cluster Command “mahout recommenditembesed --input /movielens.txt --output recommendations --numRecommendations 10 -- outputPathForSimilarityMatrix similarity-matrix --similarityClassname SIMILARITY_COOCCURRENCE”

12 User-based CF RS with Spark
Task: to implement a recommendation system using Apache Spark framework Approach: user-based collaborative filtering Similarity metric Dataset: MovieLens

13 Spark

14 Spark operators

15 Word Count Example Combine tuples with the same key.

16

17

18

19 Very easy approach, but…

20 Memory-based Approach
Movielens : 1. 10 million ratings applied to 10,000 movies by 72,000 users 2. The size of rating matrix: 10 k * 72 k = 720 M

21 memory-based model in distributed platform
Sparse matrix Node 1 Node 2 Node n

22 Similarity metric loadFile.map().groupByKey()

23 map().groupByKey()

24 map(calcSim()).map().groupByKey()

25 Load history record loadFile.map().groupByKey()

26 Neighbor recommendation
Similarity model Historical data (broadcast)


Download ppt "Recommender System with Hadoop and Spark"

Similar presentations


Ads by Google