Mining Utility Functions based on user ratings COMP5331 Sepanta Zeighami
Motivation A hotel booking website Different users provide ratings for different hotels Not all hotels are rated by all users There is information available on each hotel It’s size, price, location and etc. There is trade off between different attributes, i.e. a hotel in a better location has a higher price. Understand how the users’ ratings is affected by hotels’ attributes What’s the probability of a user choosing a hotel with a lower price compared to a bigger room in a better location. Useful for websites’ management to provide more suitable options for customers
Hotel example Hotel name Price Location Holiday Inn 8 5 Hilton 2 10 Shangri La 4 Name Hotel Rating Alex Holiday Inn 6.5 Hilton 6 Shangri La Sam 8.4 5.6 Nick 4.9 4.4 Name Price Location Alex 0.5 Sam 0.2 0.8 Nick 0.7 0.3
Introduction Given a set of ratings or scores provided by users on a set of points, find how the users’ judgement is affected by different attributes of the points. First need to understand the decision making process of each user How much value does each user attach to each attribute of the points Then, build a general probability distribution model based on that
Related works Recommender systems [1, 2] Preference learning [3, 4] predicting a new user's preferences based on the previous information. Grouping users based on their similarities and predicting users preferences by assigning them to a group. Preference learning [3, 4] Predicting a user’s preferences based on information available about the user. Does not provide any information on how much a user values different attributes of different items.
Understanding Each User Utility Functions The rating each user provides for a data point is called the utility of the user from that point. Utility quantifies the “satisfaction” a user derives from the data point. Assuming user’s satisfaction can be quantified. Consider a set of points 𝐷 for which the has provided ratings. I.e. for some points in 𝐷 we know the utility of the user. We associate with a user, a function, 𝑓 𝑝 :𝐷 →ℛ, where for a point p∈𝐷, 𝑓(𝑝) is a real number equal to the utility the user derives from the point 𝑝. Name Hotel Rating Alex Holiday Inn 6.5 Hilton 6 Shangri La Sam 8.4 5.6 Nick 4.9 4.4
Understanding Each User Linear Utility Functions Name Price Location Alex 0.5 Sam 0.2 0.8 Nick 0.7 0.3 Understanding Each User Linear Utility Functions Let 𝐷 =𝑛. The utility function of a user, 𝑓(𝑝), can be written as an 𝑛-dimensional vector whose 𝑖 th element is the utility the user derives from the 𝑖 𝑡ℎ point of the database. Consider 𝐷 as a 𝑑-dimensional database, where each point is a vector, and 𝑝 𝑖 is its value in the 𝑖 𝑡ℎ dimension. We call a utility function linear if there exists a a 𝑑- dimensional vector 𝑤 consisting of 𝑤 1 , 𝑤 2 ,…, 𝑤 𝑑 for which 𝑓 𝑝 = 𝑖=1 𝑑 𝑤 𝑖 × 𝑝 𝑖 . Alternatively, we can write 𝑓 𝑝 =𝑝⋅ 𝑤. We call 𝑤 𝑖 the weight the user attaches to the 𝑖 𝑡ℎ dimension. We can use 𝑤 to refer to the utility function 𝑓.
Understanding Each User Modeling User’s Utility We propose to use a linear model to capture the value a user attaches to each dimension of a data point. It provides an understanding of the user’s behavior, although a linear model might not perfectly fit the data. The model assumes that a linear utility functions can express the relationship between the points and the utility of the users. Note that there might exist utility functions that are completely independent of the points’ attributes, but in general we expect to see a correlation. A customer will usually consider the price of a hotel before booking it.
Using Linear Models for Utility Functions For a utility function vector 𝑓, a matrix 𝑋 where each row is a point vector 𝑥 𝑖 and a weight vector 𝑤, we set 𝑋𝑤=𝑓. We want to find a 𝑤 for which the above holds. The equations might be inconsistent, as users’ utility may not be perfectly linear. We might have observations on the value of 𝑓 for only a few points in database. Least squares linear solution Find a solution with the least square error. Linear regression with Gaussian noise Finding 𝑤 so as to maximize the likelihood of 𝑓.
Building a Probability Distribution Using Gaussian Mixture Model Create a utility distribution based on the inferred utility function Gaussian Mixture Model (GMM) It divides customers into 𝑘 groups each having a multivariate Gaussian distribution. A value for 𝑘 needs to be found through trial and error. It assumes each group of customers can be modeled by a multivariate Gaussian distribution.
Building a Probability Distribution Based on distance from samples We assume the probability of a utility function getting a specific set of values changes based on its distance from samples. That is, given a utility function vector, 𝑣, we assume the probability of utility functions existing in a region follows a multivariate normal distribution 𝑁 𝑣, 𝐼 , where 𝐼 is the identity matrix. If we have 𝑛 samples, then, we use a mixture distribution consisting of 𝑛 Gaussian distributions 𝑁( 𝑣 𝑖 , 1 𝑛 ×𝐼), each with probability 1 𝑛 . The cumulative distribution function (cdf) will be: 𝑓 𝑥 1 , 𝑥 2 ,… 𝑥 𝑑 = 1 𝑛 1 𝑛 𝜙 𝑖 ( 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 ) Where 𝜙 𝑖 is the cdf of 𝑁( 𝑣 𝑖 , 𝑛×𝐼) normal distribution.
The Hotel Example Price Price Low Prob. High Prob. location location Clustering Gaussian mixture model Distance based model
Experiments We did experiments on data sampled from uniform distribution in 2 dimensions with 20 samples. GMM with 2 components returns 2 multivariate Gaussian distributions with means (7.99404513, 2.36299952) and (3.1791763 , 6.74665699) and each component with prob. 0.5. In this model, Pr(5≤𝑥≤6 𝑎𝑛𝑑 0≤𝑦≤1) is less than Pr 7.5≤𝑥≤8.5 𝑎𝑛𝑑 2≤𝑦≤3 With the distance based model, the probability of getting each of these 1×1 squares is the same, but for more samples is needed for smaller units. 10 5 5 10 Original uniform distribution
Summary We’ve provided methods to understand how different users evaluate different characteristics of different products. By assuming a linear model for utility functions, we’ve provided 2 methods to find out how much value different users attach to different attributes of different items. Using these weights, we proposed two methods to model the distribution of utility functions of users.
Thank you!
Reference [1] A. M. Rashid, G. Karypis, and J. Riedl, Learning preferences of new users in recommender systems: An information theoretic approach," SIGKDD Explor. Newsl., vol. 10, pp. 90{100, Dec. 2008. [2] R. Burke, Hybrid recommender systems: Survey and experiments," User modeling and user-adapted interaction, 2002. [3] W. Chu and Z. Ghahramani, Preference learning with gaussian processes," in Proceedings of the 22Nd International Conference on Machine Learning, ICML '05, (New York, NY, USA), pp. 137{144, ACM, 2005. [4] N. Houlsby, J. M. Hernandez-Lobato, F. Huszar, and Z. Ghahramani, Collaborative gaussian processes for preference learning," in Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS'12, (USA), pp. 2096{2104, Curran Associates Inc., 2012.