Download presentation
Presentation is loading. Please wait.
Published byRachel Dorsey Modified over 6 years ago
1
M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron
Movie Advisor M.Sc. Project Doron Harlev Supervisor: Dr. Dana Ron
2
Introduction Predict a user’s rating on a scale of 1-5
Prediction based on user’s ratings, peers’ rating and item information Use of MovieLens database Field of recommender systems or collaborative filtering Previous works use Pearson R correlation as distance metric Novel approach, “hybrid-genre”: more efficient performs better than Pearson R on diluted database 11/18/2018 Movie Advisor
3
MovieLens database Freely available on the Internet
100,000 entries on a scale of 1-5 1682 movies, 943 users, sparseness 6% Mean score 3.53 Mean scores/movie 60 Mean scores/user 106 (min 20) 1 2 3 4 5 0.5 1.5 2.5 3.5 x 10 Score Score Histogram 11/18/2018 Movie Advisor
4
Data Sets Test data set is 5 entries from 10% of users, a total of 470 entries Base data set is all other entries, a total of 99,530 entries Same random division used throughout presentation Different instances examined at the end 11/18/2018 Movie Advisor
5
Evaluation Criteria Mean Average Error (MAE) Coverage
Calculation shown, where, Si is the actual score and Ri is the predicted score Most widely used in field Coverage Percentage of movies in the test data set that can be predicted MAE usually improves with decrease in coverage In related works, other criteria are shown to produce similar results 11/18/2018 Movie Advisor
6
Base Algorithms All Average- predict average of all entries
Movie Average- predict movie average User Average- predict user average User Movie Average- use both movie and user average as shown in equation Method MAE Coverage [%] All Average 1.046 100 Movie Average 0.887 99.8 User Average 0.849 User Movie Average 0.830 11/18/2018 Movie Advisor
7
Pearson R Coefficient Pearson R correlation coefficient used as distance metric between users Coefficient calculated as shown in equation. Values in the range ±1, where extremes designate strong correlation -1 -0.5 0.5 1 50 100 150 Pearson R Coefficient for user 104 Typical Histogram 11/18/2018 Movie Advisor
8
Pearson R Algorithm Predicted score is a weighted sum as shown in equation Only Mutually Related Movies (MRM) taken into account Yields a MAE of 0.79 and coverage of 99.8% for base and test data sets Clearly better than basic methods 10 20 30 40 50 60 70 5 15 25 35 Average Number of Mutually Rated Movies 11/18/2018 Movie Advisor
9
Pearson R Enhancements
MRM threshold suggested by Herlocker MRM threshold modification Correlation threshold suggested by Shardanand shown in equation 11/18/2018 Movie Advisor
10
Pearson R Performance 11/18/2018 Movie Advisor 0.5 0.6 0.7 0.8 0.9 1
20 40 60 80 100 120 140 0.75 0.8 0.85 0.9 0.95 Users TH=3, Pearson TH=0.1 Herlocker Threshold MAE Pearson R Algorithm User Average 0.4 0.6 1 Coverage 0.5 0.6 0.7 0.8 0.9 1 0.72 0.74 0.76 0.78 0.82 0.84 0.86 0.88 Coverage MAE Users TH=3, Pearson TH=0.10 H/MRM (H/MRM) 2 User Average 11/18/2018 Movie Advisor
11
Mean Square Difference (MSD) Algorithm
Mean Square Difference used as a distance metric Calculation shown in equation Threshold applied as shown Predictions calculated as they were for Pearson R 11/18/2018 Movie Advisor
12
MSD Performance 11/18/2018 Movie Advisor 0.5 0.6 0.7 0.8 0.9 1 0.75
0.77 0.78 0.79 0.81 0.82 0.83 0.84 0.85 Coverage MAE Users TH=3 MSD Method User Average 0.5 0.6 0.7 0.8 0.9 1 0.75 0.85 0.95 1.05 1.1 Coverage MAE Discarded User Average 11/18/2018 Movie Advisor
13
Genre Information Database provides genre information.
Each entry may have several genres 19 genres exist in the database 11/18/2018 Movie Advisor
14
Genre Statistics Figures depict number of ratings and average genre score for the entire database 2 4 6 8 10 12 14 16 18 20 30 40 Genre Number Average Number of User Ratings -0.4 -0.2 0.2 0.4 Average Score 11/18/2018 Movie Advisor
15
Base Genre Algorithm User Genre matrix is average score for each user and genre Base algorithm shown in equation MAE 0.836, coverage 98.7 Better than user average prediction (0.849) 11/18/2018 Movie Advisor
16
Genre Algorithm Uses matrix G to calculate MSD distance between users
0.5 0.6 0.7 0.8 0.9 1 0.75 0.85 Coverage MAE Users TH=3 Genre Method User Average Uses matrix G to calculate MSD distance between users Pearson R performs poorly on G Threshold and prediction same as MSD Much more efficient since matrix size is reduced (19 instead of 1682, 1682/1990) Results comparable to Pearson R 11/18/2018 Movie Advisor
17
Hybrid Genre Algorithm
Takes into account peers’ ratings as well as user’s genre preferences as shown in equation Performs consistently better than genre algorithm at =0.65 0.5 0.6 0.7 0.8 0.9 1 0.68 0.72 0.74 0.76 0.78 0.82 Coverage MAE Users TH=3, rat=0.65 Hybrid Method Genre Method 11/18/2018 Movie Advisor
18
Algorithm Comparison All methods on same chart
Hybrid genre performs better at low coverage, while Pearson R performs better at high coverage 0.5 0.6 0.7 0.8 0.9 1 0.72 0.74 0.76 0.78 0.82 Coverage MAE Hybrid Genre MSD Pearson R 11/18/2018 Movie Advisor
19
Database Instantiation
Average of 10 base/test divisions shown Coverage of 0.8 is the turning point for performance of Pearson R Vs. hybrid-genre 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 0.72 0.73 0.74 0.76 0.77 Coverage MAE Hybrid-Genre Pearson R 11/18/2018 Movie Advisor
20
Database Dilution and Instantiation
2/3 of each user’s entries omitted 2 entries from 10% of users used as test data set 10 instantiations averaged Hybrid-genre clearly performs better 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.76 0.77 0.78 0.79 Coverage MAE Hybrid-Genre Pearson R 11/18/2018 Movie Advisor
21
Conclusions Database, base methods and existing methods presented and analyzed Novel approach, hybrid-genre, explained and compared Hybrid-genre is more efficient, and performs better as sparseness is increased May prove more practical in real-world applications 11/18/2018 Movie Advisor
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.