North Dakota State University Fargo, ND USA

Slides:



Advertisements
Similar presentations
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Advertisements

Item-based Collaborative Filtering Idea: a user is likely to have the same opinion for similar items [if I like Canon cameras, I might also like Canon.
Recommendations via Collaborative Filtering. Recommendations Relevant for movies, restaurants, hotels…. Recommendation Systems is a very hot topic in.
Data Mining on Streams  We should use runlists for stream data mining (unless there is some spatial structure to the data, of course, then we need to.
Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
Vertical Set Square Distance: A Fast and Scalable Technique to Compute Total Variation in Large Datasets Taufik Abidin, Amal Perera, Masum Serazi, William.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
MULTI-LAYERED SOFTWARE SYSTEM FRAMEWORK FOR DISTRIBUTED DATA MINING
Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,
Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
The Effect of Dimensionality Reduction in Recommendation Systems
Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.
TEMPLATE DESIGN © Predicate-Tree based Pretty Good Protection of Data William Perrizo, Arjun G. Roy Department of Computer.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
Evaluation of Recommender Systems Joonseok Lee Georgia Institute of Technology 2011/04/12 1.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
The Universality of Nearest Neighbor Sets in Classification and Prediction Dr. William Perrizo, Dr. Gregory Wettstein, Dr. Amal Shehan Perera and Tingda.
Cosine Similarity Item Based Predictions 77B Recommender Systems.
Pearson Correlation Coefficient 77B Recommender Systems.
Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10.
The Universality of Nearest Neighbor Sets in Classification and Prediction Dr. William Perrizo, Dr. Gregory Wettstein, Dr. Amal Shehan Perera and Tingda.
Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.
Singular Value Decomposition and Item-Based Collaborative Filtering for Netflix Prize Presentation by Tingda Lu at the Saturday Research meeting 10_23_10.
Recommendation Algorithms for E-Commerce. Introduction Millions of products are sold over the web. Choosing among so many options is proving challenging.
Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Knowledge Discovery in Protected Vertical Information Dr. William Perrizo University Distinguished Professor of Computer Science North Dakota State University,
Online Evolutionary Collaborative Filtering RECSYS 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering Seoul National University.
Experimental Study on Item-based P-Tree Collaborative Filtering for Netflix Prize.
Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker,
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Item-Based Collaborative Filtering Recommendation Algorithms
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.
Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.
P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.
Announcements Paper presentation Project meet with me ASAP
Item-Based P-Tree Collaborative Filtering applied to the Netflix Data
Recommender Systems & Collaborative Filtering
Decision Tree Classification of Spatial Data Streams Using Peano Count Trees Qiang Ding Qin Ding * William Perrizo Department of Computer Science.
Decision Tree Induction for High-Dimensional Data Using P-Trees
Efficient Ranking of Keyword Queries Using P-trees
Efficient Ranking of Keyword Queries Using P-trees
Machine Learning With Python Sreejith.S Jaganadh.G.
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Yue (Jenny) Cui and William Perrizo North Dakota State University
Adopted from Bin UIC Recommender Systems Adopted from Bin UIC.
PTrees (predicate Trees) fast, accurate , DM-ready horizontal processing of compressed, vertical data structures Project onto each attribute (4 files)
Collaborative Filtering Nearest Neighbor Approach
Q4 : How does Netflix recommend movies?
Vertical K Median Clustering
Incremental Interactive Mining of Constrained Association Rules from Biological Annotation Data Imad Rahal, Dongmei Ren, Amal Perera, Hassan Najadat and.
Vertical K Median Clustering
Outline Introduction Background Our Approach Experimental Results
North Dakota State University Fargo, ND USA
Ensembles.
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
The Multi-hop closure theorem for the Rolodex Model using pTrees
Vertical K Median Clustering
Collaborative Filtering Non-negative Matrix Factorization
North Dakota State University Fargo, ND USA
The P-tree Structure and its Algebra Qin Ding Maleq Khan Amalendu Roy
Presentation transcript:

North Dakota State University Fargo, ND 58108 USA Extension Study on Item-Based P-Tree Collaborative Filtering for the Netflix Prize Tingda Lu, Yan Wang, William Perrizo, Gregory Wettstein, Amal S. Perera Computer Science North Dakota State University Fargo, ND 58108 USA

Agenda Introduction to the Recommendation Systems and Collaborative Filtering P-Trees Item-based P-Tree CF algorithm Similarity measurements Experimental results Conclusion

Recommendation System analyzes customer’s purchase (or rental) history and identifies customer’s satisfaction ratings recommends the most likely satisfying next purchases (rentals) increases customer satisfaction eventually leads to business success Add some examples

Amazon.com Book Recommendations amazon.com doesn’t know me, generic recommendations Make purchases, click items, rate items and make lists, recommendations get “better” Collaborative filtering similar users like similar things More choice necessitates better filters Recommendation engines

Netflix Movie Recommendation http://www.netflixprize.com/ “The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on previous ratings.” $1 million prize was given last Fall to Belcore team for a >10% improvement over Netflix’s current movie recommender, Cinematch

Collaborative Filtering Collaborative Filtering (CF) algorithm is widely used in recommender systems User-based CF algorithm is limited because of its computation complexity Item-based CF has fewer scalability concerns

P-Tree P-Tree is a lossless, compressed, and data-mining-ready vertical data structure P-trees are used for fast computation of counts and for masking specific phenomena Data is first converted to P-trees

But it is pure (pure0) so this branch ends Predicate tree technology: vertically project each attribute, Current practice: Structure data into horizontal records. Process vertically (scans) then vertically project each bit position of each attribute, then compress each bit slice into a basic Ptree. e.g., compression of R11 into P11 goes as follows: Base 2 Base 10 R(A1 A2 A3 A4) R[A1] R[A2] R[A3] R[A4] 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 011 010 001 100 010 010 001 101 111 000 001 100 2 7 6 1 6 7 6 0 3 7 5 1 2 7 5 7 3 2 1 4 2 2 1 5 7 0 1 4 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 011 010 001 100 010 010 001 101 111 000 001 100 = Horizontally structured records Scanned vertically R11 1 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 pure1? false=0 pure1? true=1 Top-down construction of the 1-dimensional Ptree representation of R11, denoted, P11, is built by recording the truth of the universal predicate pure 1 in a tree recursively on halves (1/21 subsets), until purity is achieved. pure1? false=0 pure1? false=0 pure1? false=0 Horizontally AND basic Ptrees 1. Whole is pure1? false  0 0 0 0 1 P11 P11 P12 P13 P21 P22 P23 P31 P32 P33 P41 P42 P43 0 0 0 1 1 0 0 0 1 01 10 1 0 1 0 01 0 1 0 0 0 1 0 0 10 01 ^ 2. Left half pure1? false  0 3. Right half pure1? false  0 0 0 P11 5. Rt half of right half? true1 0 0 0 1 4. Left half of rt half ? false0 0 0 But it is pure (pure0) so this branch ends

P-Tree API size() Get size of PTree get_count() Get bit count of PTree setbit() Set a single bit of PTree reset() Clear the bits of PTree & AND operation of PTree | OR operation of PTree ~ NOT operation of PTree dump() Print the binary representation of PTree load_binary() Load the binary representation of PTree

Item-based P-Tree CF PTree.load_binary(); // Calculate the similarity while i in I { while j in I { simi,j = sim(PTree[i], Ptree[j]); } // Get the top K nearest neighbors to item i pt=Ptree.get_items(u); sort(pt.begin(), pt.end(), simi,pt.get_index()); // Prediction of rating on item i by user u sum = 0.0, weight = 0.0; for (j=0; j<K; ++j) { sum += ru,pt[j] * simi,pt[j]; weight += simi,pt[j]; pred = sum/weight

Item-Based Similarity (I) Cosine based Pearson correlation

Item-Based Similarity (II) Adjusted Cosine SVD item-feature

Similarity Correction Two items are not similar if only a few customers purchased or rated both We include the co-support in item similarity

Prediction Weighted Average Item Effects

RMSE on Neighbor Size Cosine Pearson Adj. Cos SVD IF K=10 1.0742 1.0092 0.9786 0.9865 K=20 1.0629 1.0006 0.9685 0.9900 K=30 1.0602 1.0019 0.9666 0.9972 K=40 1.0592 1.0043 0.9960 1.0031 K=50 1.0589 1.0064 0.9658 1.0078

Neighbor Size

Similarity Algorithm

Analysis Adjusted Cosine similarity algorithm gets much lower RMSE The reason lies in the fact that other algorithms do not exclude based on user rating variance Adjusted Cosine based algorithm discards users with high variance hence gets better prediction accuracy

Similarity Correction All algorithms get better RMSE with similarity correction except Adjusted Cosine. Cosine Pearson Adj. Cos SVD IF Before 1.0589 1.0006 0.9658 0.9865 After 1.0588 0.9726 1.0637 0.9791 Improve 0.009% 2.798% -10.137% 0.750%

Item Effects Improves rmse for all algorithms. Cosine Pearson Adj. Cos SVD IF Before 1.0589 1.0006 0.9658 0.9865 After 0.95750 0.9450 0.9468 0.9381 Improve 9.576% 5.557% 1.967% 4.906%

Conclusion Experiments were carried out on Cosine, Pearson, Adjusted Cosine and SVD item-feature algorithms. Support corrections and item effects significantly improve the prediction accuracy. Pearson and SVD item-feature algorithms achieve better results when similarity correction and item effects are included.

Questions and Comments?