Recommender Systems David M. Pennock NEC Research Institute contributions: John Riedl, GroupLens University of Minnesota
CW: scale vs. service 4 Wal-Mart +massive inventory +massive customer base +cheap –impersonal 4 General store –specialized products –few customers –expensive +knowledgeable about products about YOU
4 Wal-Mart.com +massive inventory +massive customer base +cheap –impersonal +knowledgeable about products about YOU The vision of automation: Mass personalization
Commerce: Matching buyers and sellers
Traditional: –browsing –ads –critics/editors –friends Technological facilitators: –World Wide Web –targeted ads –search engines/ shop bots –recommender systems
Research groups 4 University of Minnesota –Riedl, Konstan et al. –MovieLens; NetPerceptions; tutorial 4 Microsoft Research –Breese, Heckerman, Horvitz, et al. –SiteServer; Firefly 4 MIT –Maes et al.; Firefly 4 NEC Research, U. Penn –Pennock, Lawrence, Ungar, Popescul
Types of recommender systems Content-based information filter uses AI techniques A romantic comedy starring Julia Roberts in stock at BB A movie like Fargo
Types of recommender systems Community-based collaborative filter intelligence from people Hybrid systems A movie that people like me enjoyed
Collaborative filtering: How it works RatingsCorrelations Thanks: John Riedl & GroupLens ratings
Collaborative filtering: How it works RatingsCorrelations neighbors Fargo = 2 Thanks: John Riedl & GroupLens ratings
Examples and applications 4 News Movies: 4 Books 4 Websites: Alexa.com 4 Music, toys, … 4 Netperceptions.com 4 CDNow.com, Levis.com, … 4 Commerce Edition of Microsoft SiteServer
GroupLens: Usenet news ‘94 Thanks: John Riedl & GroupLens
MovieLens Thanks: John Riedl & GroupLens
Amazon.com
800.com accessories (for browsers)
Launch.com
Cdnow album advisor
Jester
Ecommerce success stories 4 Large international catalog retailer –17% hit rate, 23% acceptance rate in call center 4 Medium European outbound call center –17% hit rate, 6.7% acceptance rate from an outbound telemarketing call –$ price of average item sold –Items were in an electronics over-stocked category and were sold- out within 3 weeks 4 Medium American online toy store ( campaign) –19% click-thru rate vs. 10% industry average –14.3% conversion to sale vs. 2.5% industry average Thanks: John Riedl & GroupLens
Algorithms: Memory-based neighbors ratings R a (Fargo) = i w i R i (Fargo) for each movie where is over neighborhood (k-NN, k-radius); similarity metric w i is correlation, or vector similarity, or mean squared difference, or prob of same “personality” [Pennock et al.], or… GroupLens [Resnick et al. 94]; Ringo [Shardanand and Maes 95]; comparative study [Breese et al. 98]
Algorithms: Model-based ratings Build underlying model of user preferences; infer predictions from model Personality diagnosis [Pennock, Horvitz, Lawrence, & Giles 2000] Bayesian network [Breese et al. 98] variables are products; values are ratings; structure and probs learned Bayesian clustering Like-minded users grouped [Breese et al. 98] Users and products clustered [Ungar and Foster 1998] teenage, male action
Algorithms: Machine learning Black box machine learning or classification problem: Ripper [Basu et al. 98] Neural network Support vector machine [Billsus and Pazzani 98; Freund et al. 98; Nakamura and Abe 98]
State of the art 4 Weighted k-nearest neighbor! 4 Singular value decomposition [GroupLens] 4 Probabilistic SVD - Aspect model [Hofmann and Puzicha 99] [Popescul, Ungar, Pennock, and Lawrence] 4 Some problems/hurdles –data sparsity (one solution: smoothing) –implicit ratings (one solution: “boosting”) purchase history [Ungar] [Claypool] [Sarwar & Karypis] access history/time spent reading [Morita and Shen] [Pennock et al. 2000] [Popescul et al.] Thanks: John Riedl & GroupLens
Filtering content 4 ResearchIndex [Pennock et al. 2000] [Popescul et al.] 4 Personalized news [Claypool et al. 99] 4 Personalized search engines –Beyond keyword search 4 Adaptive web sites [Etzioni et al.] 4 Justifying subscriptions Thanks: John Riedl & GroupLens
Extensions 4 Incorporating content, links, other data –FilterBots [GroupLens] –Ripper [Basu et al. 98] –three-way aspect model [Popescul et al.] 4 Group recommendations 4 Temporal aspects 4 “schizophrenic” users –moods / changing and “ephemeral” tastes –buying for others
Multiuser, from movielens
Conclusion 4 Mass personalization –expensive or impossible without automation –large retailers act and “feel” small 4 Recommender systems –intelligence from leveraging community information, rather than just AI –can incorporate content, demographic information, etc. –can scale to millions of customers, millions of products, thousands of clicks per second –ideally adds value for both retailers & consumers Thanks: John Riedl & GroupLens