PolyFlix Recommendation System Trevor Koritza Gabriel De La Calzada.

Slides:



Advertisements
Similar presentations
Copyright © 2003 Pearson Education, Inc. Slide 8-1 The Web Wizards Guide to PHP by David Lash.
Advertisements

Online Recommendations
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
Results of the survey and relational dbs Fall 2011.
COMP 3715 Spring 05. Working with data in a DBMS Any database system must allow user to  Define data Relations Attributes Constraints  Manipulate data.
A Fast Growing Market. Interesting New Players Lyzasoft.
Preference Elicitation [Conjoint Analysis]. Conjoint Analysis Market research: assess consumer’s preferences on homogenous class of products Approach:
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Monday, 08 June 2015Dr. Mohamed Osman1 What is Database Administration A high level function (technical Function) that is responsible for ► physical DB.
Chapter 3 The Relational Model Transparencies © Pearson Education Limited 1995, 2005.
ISMT221 Information Systems Analysis and Design Prototyping with MS Access Lab 6 Tony Tam.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
Chapter 3. 2 Chapter 3 - Objectives Terminology of relational model. Terminology of relational model. How tables are used to represent data. How tables.
Chapter 9 & 10 Database Planning, Design and Administration.
Data Management Design
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Recommender systems Ram Akella November 26 th 2008.
A Guide to MySQL 7. 2 Objectives Understand, define, and drop views Recognize the benefits of using views Use a view to update data Grant and revoke users’
A Guide to SQL, Seventh Edition. Objectives Understand, create, and drop views Recognize the benefits of using views Grant and revoke user’s database.
CSC 2720 Building Web Applications Database and SQL.
Thomas Connolly and Carolyn Begg’s
SQLite and the.NET Framework This PPT:
Objectives Learn what a file system does
CSCI 6962: Server-side Design and Programming
Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Copyright © 2003 Pearson Education, Inc. Slide 8-1 The Web Wizard’s Guide to PHP by David Lash.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
By Rachsuda Jiamthapthaksin 10/09/ Edited by Christoph F. Eick.
PowerPoint Presentation for Dennis & Haley Wixom, Systems Analysis and Design, 2 nd Edition Copyright 2003 © John Wiley & Sons, Inc. All rights reserved.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
Database Organization and Design
311: Management Information Systems Database Systems Chapter 3.
Report #1 By Team: Green Ensemble AusDM 2009 ENSEMBLE Analytical Challenge: Rules, Objectives, and Our Approach.
Marketing and CS Philip Chan. Enticing you to buy a product 1. What is the content of the ad? 2. Where to advertise? TV, radio, newspaper, magazine, internet,
Relational Databases Database Driven Applications Retrieving Data Changing Data Analysing Data What is a DBMS An application that holds the data manages.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Online Learning for Collaborative Filtering
Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
McGraw-Hill/Irwin © 2008 The McGraw-Hill Companies, All Rights Reserved Chapter 7 Storing Organizational Information - Databases.
MySQL Database Management Systems Universitas Muhammadiyah Surakarta Yogiek Indra Kurniawan.
Index Example From Garcia-Molina, Ullman, and Widom: Database Systems, the Complete Book pp
Database Server Concepts and Possibilities Lee Lueking D0 Data Browser Workshop April 8, 2002.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Powers and roots. Square each number a) 7 b) 12 c) 20 d) 9 e) 40 a) 49 b) 144 c) 400 d) 81 e) 1600.
Lecture 5 Instructor: Max Welling Squared Error Matrix Factorization.
Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.
Netflix Challenge: Combined Collaborative Filtering Greg Nelson Alan Sheinberg.
Marketing and CS Philip Chan.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Chapter 3: Relational Databases
Working with MySQL A290/A590, Fall /07/2014.
Chapter 4 The Relational Model Pearson Education © 2009.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Introduction to MySQL  Working with MySQL and MySQL Workbench.
Fast Pseudo-Random Fingerprints Yoram Bachrach, Microsoft Research Cambridge Ely Porat – Bar Ilan-University.
Netflix Prize: Predicting Ratings. Data mv_00(movieID).txt: 1: (1-2,649,429) (1-5) Over 17,000 movie txt files Over 400,000 userID Two Gigs zipped.
3 Copyright © 2006, Oracle. All rights reserved. Designing and Developing for Performance.
Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD
Q4 : How does Netflix recommend movies?
Hundred Dollar Questions
HPML Conference, Lyon, Sept 2018
Ensembles.
Notes Over 9.1 Finding Square Roots of Numbers
Presentation transcript:

PolyFlix Recommendation System Trevor Koritza Gabriel De La Calzada

Purpose Create a recommendation system for movies Create a recommendation system for movies – Use the existing Netflix dataset available online Two goals Two goals – Learn how recommendation systems work – Win 1 million dollars

Requirements Recommend movies which you would probably rate 4 or 5 stars Recommend movies which you would probably rate 4 or 5 stars Fast Fast Scalable Scalable Low Space Requirements Low Space Requirements Make better recommendations Make better recommendations

The Netflix Dataset Total movies: 17,770 Total movies: 17,770 Total number of ratings: 100,480,507 Total number of ratings: 100,480,507 Total number of unique users: 480,189 Total number of unique users: 480,189 Overall: 4.5 Gigabytes of information Overall: 4.5 Gigabytes of information

DBMS Choice MySQL MySQL – Pros: Feature rich, scalable, concurrency – Cons: administrative overhead SQLite SQLite – Pros: Simplicity – Cons: Simplicity

Database Schema CREATE TABLE thresholds ( movie_id integer(2) primary key, t1 real, t2 real, t3 real, t4 real, t5 real ); CREATE TABLE thresholds ( movie_id integer(2) primary key, t1 real, t2 real, t3 real, t4 real, t5 real ); CREATE TABLE user_ratings ( user_id integer(3), movie_id integer(2), rating integer(1), constraint pk_primary_key primary key (user_id, movie_id) ); CREATE TABLE user_ratings ( user_id integer(3), movie_id integer(2), rating integer(1), constraint pk_primary_key primary key (user_id, movie_id) ); CREATE TABLE weights ( to_id integer(2), from_id integer(2), weight1 real, weight2 real, weight3 real, weight4 real, weight5 real, constraint pk_primary_key primary key (to_id, from_id) ); CREATE TABLE weights ( to_id integer(2), from_id integer(2), weight1 real, weight2 real, weight3 real, weight4 real, weight5 real, constraint pk_primary_key primary key (to_id, from_id) );

Design Issues Massive dataset Massive dataset – Tables grow exponentionally as we increase number of movies – (17770^2)/2 ~ 160 million connections – 104 million user ratings How do we process this information in a timely manner? How do we process this information in a timely manner?

Implementation General Get all thresholds to database. Get all thresholds to database. For each movie i For each movie i – Get all ratings, customer pairs for movie i from database – Get all weights for movie i from database – For each rating, customer pair If first time seeing customer If first time seeing customer – Retrieve all of customers ratings from database Update weights based on rating and customers previous ratings. Update weights based on rating and customers previous ratings. – Write weights back to database Write thresholds to database Write thresholds to database

Implementation Weights Updates

Implementation Weights Updates Equation ∆W i1 = rate * (A 1 – E 1 ) * I 1 ∆W i1 = rate * (A 1 – E 1 ) * I 1 ∆W i2 = rate * (A 2 – E 2 ) * I 2 ∆W i2 = rate * (A 2 – E 2 ) * I 2 ∆W i3 = rate * (A 3 – E 3 ) * I 3 ∆W i3 = rate * (A 3 – E 3 ) * I 3 ∆W i4 = rate * (A 4 – E 4 ) * I 4 ∆W i4 = rate * (A 4 – E 4 ) * I 4 ∆W i5 = rate * (A 5 – E 5 ) * I 5 ∆W i5 = rate * (A 5 – E 5 ) * I 5 Where E is the estimated rating (0 or 1) and A is the actual rating (0 or 1) Where E is the estimated rating (0 or 1) and A is the actual rating (0 or 1) The I values are relational constants, relating all of the ∆Ws to one another. The I values are relational constants, relating all of the ∆Ws to one another.

Implementation Weights Updates Equation Example Actual Rating = 4 Actual Rating = 4 – A 1 = 0, A 2 = 0, A 3 = 0, A 4 = 1, A 5 = 0 Estimated Rating = 2 Estimated Rating = 2 – E 1 = 1, E 2 = 1, E 3 = 1, E 4 = 0, E 5 = 0 Relational Constants Relational Constants – I 1 = 0.25, I 2 = 0.5, I 3 = 0.75, I 4 = 1.0, I 5 = 0.75, Rate = 0.06 Rate = 0.06 ∆W i1 = 0.06 * (0 – 1) * 0.25 = ∆W i1 = 0.06 * (0 – 1) * 0.25 = ∆W i2 = 0.06 * (0 – 1) * 0.5= ∆W i2 = 0.06 * (0 – 1) * 0.5= ∆W i3 = 0.06 * (0 – 1) * 0.75= ∆W i3 = 0.06 * (0 – 1) * 0.75= ∆W i4 = 0.06 * (1 – 0) * 1.0= ∆W i4 = 0.06 * (1 – 0) * 1.0= ∆W i5 = 0.06 * (0 – 0) * 0.75= +0.0 ∆W i5 = 0.06 * (0 – 0) * 0.75= +0.0

Evaluate - RMSE RMSE = Root Mean Square Error RMSE = Root Mean Square Error Sqrt((∑(actual – expected) 2 )/num) Sum the squares of the differences between the expected rating and the actual. Sum the squares of the differences between the expected rating and the actual. Take the average of those values. Take the average of those values. Then take the square root. Then take the square root.

Results Cinematch RMSE Cinematch RMSE Current Leader RMSE Current Leader RMSE Million Dollar RMSE Million Dollar RMSE

Results Our RMSE Our RMSE – 250 Movies – Probe - NA – Full – 1000 Movies – Probe = – Full (# ratings >= 14) = – 2500 Movies – Probe = – Full (# ratings >= 14) = TimeTime –250 Movies 2 mpi2 mpi –1000 Movies 6 mpi6 mpi –2500 Movies 45 mpi45 mpi mpi = minutes per iteration mpi = minutes per iteration

Future Work Faster read/writes to database.Faster read/writes to database. Convert to lower overhead language.Convert to lower overhead language. Look into different relational constants.Look into different relational constants.

Questions?