Download presentation
Presentation is loading. Please wait.
1
PolyFlix Recommendation System Trevor Koritza Gabriel De La Calzada
2
Purpose Create a recommendation system for movies Create a recommendation system for movies – Use the existing Netflix dataset available online Two goals Two goals – Learn how recommendation systems work – Win 1 million dollars
3
Requirements Recommend movies which you would probably rate 4 or 5 stars Recommend movies which you would probably rate 4 or 5 stars Fast Fast Scalable Scalable Low Space Requirements Low Space Requirements Make better recommendations Make better recommendations
4
The Netflix Dataset Total movies: 17,770 Total movies: 17,770 Total number of ratings: 100,480,507 Total number of ratings: 100,480,507 Total number of unique users: 480,189 Total number of unique users: 480,189 Overall: 4.5 Gigabytes of information Overall: 4.5 Gigabytes of information
5
DBMS Choice MySQL MySQL – Pros: Feature rich, scalable, concurrency – Cons: administrative overhead SQLite SQLite – Pros: Simplicity – Cons: Simplicity
6
Database Schema CREATE TABLE thresholds ( movie_id integer(2) primary key, t1 real, t2 real, t3 real, t4 real, t5 real ); CREATE TABLE thresholds ( movie_id integer(2) primary key, t1 real, t2 real, t3 real, t4 real, t5 real ); CREATE TABLE user_ratings ( user_id integer(3), movie_id integer(2), rating integer(1), constraint pk_primary_key primary key (user_id, movie_id) ); CREATE TABLE user_ratings ( user_id integer(3), movie_id integer(2), rating integer(1), constraint pk_primary_key primary key (user_id, movie_id) ); CREATE TABLE weights ( to_id integer(2), from_id integer(2), weight1 real, weight2 real, weight3 real, weight4 real, weight5 real, constraint pk_primary_key primary key (to_id, from_id) ); CREATE TABLE weights ( to_id integer(2), from_id integer(2), weight1 real, weight2 real, weight3 real, weight4 real, weight5 real, constraint pk_primary_key primary key (to_id, from_id) );
7
Design Issues Massive dataset Massive dataset – Tables grow exponentionally as we increase number of movies – (17770^2)/2 ~ 160 million connections – 104 million user ratings How do we process this information in a timely manner? How do we process this information in a timely manner?
8
Implementation General Get all thresholds to database. Get all thresholds to database. For each movie i For each movie i – Get all ratings, customer pairs for movie i from database – Get all weights for movie i from database – For each rating, customer pair If first time seeing customer If first time seeing customer – Retrieve all of customers ratings from database Update weights based on rating and customers previous ratings. Update weights based on rating and customers previous ratings. – Write weights back to database Write thresholds to database Write thresholds to database
9
Implementation Weights Updates
10
Implementation Weights Updates Equation ∆W i1 = rate * (A 1 – E 1 ) * I 1 ∆W i1 = rate * (A 1 – E 1 ) * I 1 ∆W i2 = rate * (A 2 – E 2 ) * I 2 ∆W i2 = rate * (A 2 – E 2 ) * I 2 ∆W i3 = rate * (A 3 – E 3 ) * I 3 ∆W i3 = rate * (A 3 – E 3 ) * I 3 ∆W i4 = rate * (A 4 – E 4 ) * I 4 ∆W i4 = rate * (A 4 – E 4 ) * I 4 ∆W i5 = rate * (A 5 – E 5 ) * I 5 ∆W i5 = rate * (A 5 – E 5 ) * I 5 Where E is the estimated rating (0 or 1) and A is the actual rating (0 or 1) Where E is the estimated rating (0 or 1) and A is the actual rating (0 or 1) The I values are relational constants, relating all of the ∆Ws to one another. The I values are relational constants, relating all of the ∆Ws to one another.
11
Implementation Weights Updates Equation Example Actual Rating = 4 Actual Rating = 4 – A 1 = 0, A 2 = 0, A 3 = 0, A 4 = 1, A 5 = 0 Estimated Rating = 2 Estimated Rating = 2 – E 1 = 1, E 2 = 1, E 3 = 1, E 4 = 0, E 5 = 0 Relational Constants Relational Constants – I 1 = 0.25, I 2 = 0.5, I 3 = 0.75, I 4 = 1.0, I 5 = 0.75, Rate = 0.06 Rate = 0.06 ∆W i1 = 0.06 * (0 – 1) * 0.25 = -0.015 ∆W i1 = 0.06 * (0 – 1) * 0.25 = -0.015 ∆W i2 = 0.06 * (0 – 1) * 0.5= -0.03 ∆W i2 = 0.06 * (0 – 1) * 0.5= -0.03 ∆W i3 = 0.06 * (0 – 1) * 0.75= -0.045 ∆W i3 = 0.06 * (0 – 1) * 0.75= -0.045 ∆W i4 = 0.06 * (1 – 0) * 1.0= +0.06 ∆W i4 = 0.06 * (1 – 0) * 1.0= +0.06 ∆W i5 = 0.06 * (0 – 0) * 0.75= +0.0 ∆W i5 = 0.06 * (0 – 0) * 0.75= +0.0
12
Evaluate - RMSE RMSE = Root Mean Square Error RMSE = Root Mean Square Error Sqrt((∑(actual – expected) 2 )/num) Sum the squares of the differences between the expected rating and the actual. Sum the squares of the differences between the expected rating and the actual. Take the average of those values. Take the average of those values. Then take the square root. Then take the square root.
13
Results Cinematch RMSE Cinematch RMSE 0.9514 0.9514 Current Leader RMSE Current Leader RMSE 0.8613 0.8613 Million Dollar RMSE Million Dollar RMSE 0.8563 0.8563
14
Results Our RMSE Our RMSE – 250 Movies – Probe - NA – Full - 1.1980 – 1000 Movies – Probe = 1.1894 – Full (# ratings >= 14) = 0.8774 – 2500 Movies – Probe = 1.0614 – Full (# ratings >= 14) = 0.8378 TimeTime –250 Movies 2 mpi2 mpi –1000 Movies 6 mpi6 mpi –2500 Movies 45 mpi45 mpi mpi = minutes per iteration mpi = minutes per iteration
15
Future Work Faster read/writes to database.Faster read/writes to database. Convert to lower overhead language.Convert to lower overhead language. Look into different relational constants.Look into different relational constants.
16
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.