Download presentation
Presentation is loading. Please wait.
Published byTimothy Booth Modified over 8 years ago
1
Netflix Prize: Predicting Ratings
2
Data mv_00(movieID).txt: 1: (1-2,649,429) (1-5) Over 17,000 movie txt files Over 400,000 userID Two Gigs zipped
3
Overall Plan Compute user similarity using: –termFrequency:# of movies in common –documentFrequency:1/|rating 1 – rating 2 | tfdf = (# of movies in common) * 1/|rating 1 – rating 2 |
4
Plan 1 Store it all in memory (haha) in java Store a User class with: –UserID –Array of Movies classes: movieID Rating Then have matrix of users with an array of top similar users using (tfdf) Problem 1 - Memory issues
5
Plan 2* Step 1: store in text files on hard drive in java –text file for each user Step 2: compute similarity (tfdf) –text file of top then users for each user Step 3: predictions –Run through two directories of text files to compute an average movie rating prediction Problem 2 - Very Slow: –Step 1: 3 days – ~5000 movie text files currently –Step 2: 1 user every 35 mins | 1 user every 5 mins –Step 3: ~10 minutes currently
6
Plan 3 Step 1: Store in text file’s data in a database using php –Table: userID | movieID | rating Primary keys: userID, movieID Step 2: Compute Similarity –Table: userID | 1 st userIDs | 2 nd userID | etc. Primary key: userID Step 3: Predictions Problem 3 - Very Slow: –Step 1: 4 days – 7000 movie text files currently –Step 2: n/a –Step 3: n/a
7
Results Predicting everything 3.0: –RMSE = 1.3149 Similarities I have so far: –RMSE = 1.3149 | 384 users –RMSE = 1.3149 | 575 users http://www.netflixprize.com/leaderboard –Grand Prize RMSE = 0.8563 RMSE: –sqrt(avg((actual_rating - predicted rating) * (actual_rating - predicted rating))).
8
Future Idea
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.