Download presentation
Presentation is loading. Please wait.
1
Customizable Bayesian Collaborative Filtering Denver Dash Big Data Reading Group 11/19/2007
2
Why Work on Recommender Systems? Generally: –Widespread interest from different industries. –Marketers save time/money by advertising to the right people. –People get less spam and useful suggestions. Big Data: –Computationally hard problem. –Large scales (many users) improves performance. –Complexity of models means that more data increasingly improves performance. “Everyday Sensing and Reasoning” (aka the Megabet): –Interesting future applications on collaborative sharing of information on mobile devices.
3
Crude Characterization of Recommender Systems Content Based: Build a user model based on user preferences for Genre, Director, Actors, etc to predict user rating. Collaborative Filtering: Brute-force statistical analysis. Clustering + dimensionality reduction.
4
Motivating Application: Predicting User Movie Ratings Netflix Prize Data: >17.7K Movies >480K Users Raw Training Data: 1.4 GB uncompressed sparse text. 123... 1 2 3 … Movies Users
5
Typical Approaches to Recommender Systems 1.Use linear models (e.g., take average over similar users, take average over similar movies) 2.Use dimensionality reduction techniques: SVD, PCA, regularized regression, etc. 3.Combine multiple approaches with more linear models. 4.A lot of engineering for special cases and efficiency.
6
Current Netflix Leaders (at least a few who have published their approaches) #1 Bell, Koren and Volinsky, AT&T Labs: Combine linear CF methods with Pseudo- Content-based methods. I.e., use approximate SVD to “learn” hidden content. #7 Salakhutdinov, Mnih and Hinton, UToronto: Combine latent variable graphical models with an approximate SVD. #8 Paterek, Warsaw University: Combined approximate regularized SVD with ridge regression, K-means and other models.
7
Desirable Properties 1.Create models customized to users and/or movies. 2.Avoid overfitting:10 6 -10 9 parameters and sparse data (use clustering and regularization). 3.Use latent-variable models to cluster users and/or movies. 4.Weigh users/parameters with more data support higher. 5.Take into account user bias relative to other users 6.When data is sparse or totally absent, estimated ratings should reduce to marginal estimates. 7.Take into account dependencies between items. 8.Take into account temporal trends.
8
Customized Bayesian Collaborative Filtering Incrementally expandable with new content-based features. Uses a principled framework with explicit assumptions. Exhibits most desirable properties. 1.Avoids overfitting: regularization built in. 2.Uses a latent-variable model. 3.Weighs parameters with more data higher. 4.Take into account user/movie bias relative to other users/movies 5.When data is sparse or totally absent, estimated ratings reduce to marginal estimates. 6.Can take into account dependencies between items. 7.Can take into account temporal trends.
9
Customized Bayesian Collaborative Filtering r j – rating of a particular user j. r M -weighted average of the user j ’s rating of similar movies. r U – weighted average of similar users’ ratings of the target movie. rjrj rUrU rMrM N learned from users’ data using weak Dirichlet priors based on marginal data over all users. Given a user j and a movie m, calculate r U (j,m) and r M (j,m) : “How often in the past I agreed with my own ratings on similar movies.” “How often in the past I agreed with my neighbors’ ratings on the same movie.” Neighbors of user j determined previously by clustering. Neighbors of movie m determined previously by clustering.
10
Nice Properties: Corner Cases Takes into account user bias (by calculating P(r i ) from user data). The expected rating of a new user is based on the expectation of the entire set of users (due to the Dirichlet priors). When a user has only a little data, his/her data is taken into account, but is smoothed by the remaining users. When a user has lots of data, it will overwhelm the Dirichlet priors and we can learn an accurate customized model for him/her.
11
Nice Property: Incremental Expandability rjrj N – weighted average rating of the target movie over r’s anti-cluster U. – weighted average rating of the target movie’s anti- cluster M. Genre DirectorActors Year Again Dirichlet priors allow us to smooth these parameters to avoid over-fitting.
12
Nice Property: Customization Without Over-fitting Naïve models can be sensitive to many redundant or unimportant features. Different features may be more informative for different users. Individualized Feature Selection may not work well if the individual user has not rated many movies. Solution: Structural Bayesian Model Averaging: 2 N structures.
13
Efficient BMA Under certain assumptions, averaging over all 2 N feature sets can be performed in O(N) time for a naïve BN structure. (Dash and Cooper, 2004) Re-parametrize network with: ML Parameters Structure Prior Marginal Likelihood of the Feature set Once parameters are calculated and cached, we can do BMA inference in O(N) time.
14
Overview of the Method D original database Web/Wiki Crawler D’ augmented database EM/MinHash Clustering of Features D’’ clustered database MAP Naïve Bayes Learning rjrj Genre DirectorActors Year Baseline model D’’’ Fully- specified database Calc r M and r U BMA Learning for all users rjrj Genre Director Actor s Year rjrj Genre Director Actor s Year rjrj Genre Director Actor s Year All user models
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.