Download presentation
Presentation is loading. Please wait.
Published byEvangeline Bates Modified over 9 years ago
1
AWS, HADOOP AND MAHOUT – VIDEO GAME RECOMMENDER BEN GOODING UNIVERSITY OF ARKANSAS – DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING PRESENTED - APRIL 30, 2015
2
MAHOUT Pronounced like Trout Open Source Machine Learning platform from Apache Used Mahout 0.9
3
RECOMMENDER TYPES Item-Item Based Recommenders How similar items are to items User Based Recommenders Based on the notion of some similarity between users
4
NEIGHBORHOODS Two types of Neighborhoods N-Nearest Neighbor Nearest Neighbor Threshold
5
SIMILARITIES Euclidean Distance Similarity 1/(1+d) where d is the distance between two users Co-occurrence Similarity Explained by previous presentations Tanimoto Coefficient Ignores user preference numbers, only cares that a user has a preference Loglikelihood Similarity Based on # of items in common but is an expression of how unlikely two users are to have a similar interest Pearson Correlation Similarity # between -1 and 1. Measures tendency of two numbers when paired to move together High correlation the similarity is close to 1. Opposite, close to -1
6
THE DATASET 228,570 Users 21,025 Games 463,669 Reviews Dataset contained excess information. Stanford provided Python script to parse data, but not enough parsing. Modified Python script to parse out everything except User ID, Product ID, and Review Score Eliminated unknown user names Used G-Edit to remove some other excess information Wrote a C++ program to convert the User and Product IDs into numerical values
7
USER BASED NEAREST-N RECOMMENDER EVALUATION Similarityn=1n=2n=4n=8n=16n=32n=64n=128 EuclideanNaN0.2050.2840.3610.4980.5420.6040.646 PearsonNaN0.7990.8680.8860.8780.9040.9600.989 Log- likelihood NaN0.5260.7710.7690.7660.8080.7840.718 TanimotoNaN0.7230.9550.8260.7920.8070.8220.755
8
USER BASED NEIGHBOR THRESHOLD RECOMMENDER EVAULATION Similarityt = 0.95t = 0.9t=0.85t=0.8t=0.75t=0.7 Euclidean0.503 0.504 Pearson0.689 0.6650.6390.6290.703 Log- likelihood 0.8010.7790.7910.7960.7900.796 TanimotoNaN
9
ITEM BASED RECOMMENDER EVALUATION SimilarityScore Euclidean0.786 Pearson0.944 Log-likehood0.789 Tanimoto0.783
10
HADOOP Distributed File System Difficult to setup without an easy to understand tutorial Got working on my virtual machine Couldn’t get Mahout to work with Hadoop as a single node cluster Java Class Not Found Exception
11
AMAZON WEB SERVICES Provides Elastic Map Reduce clusters Pre-installed with Mahout and Hadoop Used 1 Master Node and 3 Slaves Utilized the AWS Command Line Interface
12
AWS RECOMMENDER Took roughly 10-20 minutes to produce all of the recommendations. Used the item based recommender No distributed Generic User Based Recommender Generated recommendations for the users Utilized a Python based web server to display recommendations Input user id, spits out recommendations
13
FUTURE WORK Attempt to use Parallel ALS recommendations. Should provide more accurate results than the item based recommender Code available upon request, along with AWS Command Line commands
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.