Analysis of Recommendation Algorithms for E-Commerce Badrul M. Sarwar, George Karypis*, Joseph A. Konstan, and John T. Riedl GroupLens Research/*Army HPCRC Department of Computer Science and Engineering University of Minnesota
Talk Outline n Recommender Systems for E-Commerce n Quality and Performance Challenges n Synopsis of Recommendation Process n Experimental Setup n Result Highlights n Conclusion
Recommender Systems n Problem –Information and commerce overload n Solution –Knowledge Discovery in Database (KDD) –Recommender Systems (RS) n Collaborative Filtering
Collaborative Filtering n Adds human judgement to the filtering process
Collaborative Filtering (contd.) n Major Tasks –Representation of input data n Customer-product rating matrix –Neighborhood formation –Output n Prediction n Top-N Recommendation
Challenges of RS n Sparsity –Enormous size of customer-product matrix –Affects neighborhood formation –Results in poor quality and reduced coverage n Scalability –Lots of customers and products –Affects neighborhood and output –Results in high response time
Challenges of RS n Synonymy –Similar products treated differently –Increases sparsity, loss of transitivity –Results in poor quality
Use of SVD for Collaborative Filtering m x m similarity Top-N Recommendation Prediction (CF algorithm) 3. Neighborhood Formation “.” 2. Direct Prediction 1. Low dimensional representation O(m+n) storage requirement m x n m x k k x n
Experimental Setup n Data sets –MovieLens Data ( n Size 943 x 1,682 n 100,000 ratings entry n Ratings are from 1-5 n Used for Prediction and Neighborhood experiments –E-Commerce Data n Size 6,502 x 23,554 n 97,045 purchase entry n Purchase entries are dollar amounts n Used for Neighborhood experiment –Train and Test Portions n Percentage of Training data, x
Experimental Setup n Benchmark Systems –CF-Predict –CF-Recommend n Metrics –Prediction n Mean Absolute Error (MAE) –Top-N Recommendation n Recall and Precision n Combined score F1
Results: Prediction Experiment
Results: Neighborhood Formation n Movie Dataset
Results: Neighborhood Formation n E-Commerce Dataset
Conclusion n SVD results are promising –Provides better Recommendation for Movie data –Provides better Prediction for x<0.5 –Not as good for the E-Commerce data n We only tried upto 400 dimensions n SVD provides better online performance n SVD is capable of meeting RS challenges –Sparsity –Scalability –Synonymy
Acknowledgements n National Science Foundation under grants IIS , IIS , IIS , CCR , EIA , ACI n Army Research Office DAAG , DOE ASCI program. Army High Performance Computing Research Center grant DAAH C-0008 n Thanks to Netperceptions Inc. for additional support. n Thanks to Fingerhut Inc. for the EC dataset.