Mining Utility Functions based on user ratings

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Brief introduction on Logistic Regression
Fast Algorithms For Hierarchical Range Histogram Constructions
Computer vision: models, learning and inference Chapter 8 Regression.
Dimension reduction (1)
Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.
An introduction to Principal Component Analysis (PCA)
Classification and risk prediction
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Curve-Fitting Regression
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Lecture II-2: Probability Review
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Introduction to Regression Analysis. Dependent variable (response variable) Measures an outcome of a study  Income  GRE scores Dependent variable =
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Introduction to Matrices and Matrix Approach to Simple Linear Regression.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Data Modeling Patrice Koehl Department of Biological Sciences
Applied statistics Usman Roshan.
CSE 4705 Artificial Intelligence
Chapter 7. Classification and Prediction
Principal Component Analysis
LECTURE 11: Advanced Discriminant Analysis
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Data Mining K-means Algorithm
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Machine Learning Basics
Roberto Battiti, Mauro Brunato
Outlier Discovery/Anomaly Detection
Matrices Definition: A matrix is a rectangular array of numbers or symbolic elements In many applications, the rows of a matrix will represent individuals.
Learning with information of features
Q4 : How does Netflix recommend movies?
ECE539 final project Instructor: Yu Hen Hu Fall 2005
SMEM Algorithm for Mixture Models
Linear regression Fitting a straight line to observations.
10701 / Machine Learning Today: - Cross validation,
OVERVIEW OF LINEAR MODELS
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
OVERVIEW OF LINEAR MODELS
Generally Discriminant Analysis
Parametric Methods Berlin Chen, 2005 References:
Biointelligence Laboratory, Seoul National University
Multivariate Methods Berlin Chen
Principal Component Analysis
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Multivariate Methods Berlin Chen, 2005 References:
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
GhostLink: Latent Network Inference for Influence-aware Recommendation
Probabilistic Surrogate Models
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Mining Utility Functions based on user ratings COMP5331 Sepanta Zeighami

Motivation A hotel booking website Different users provide ratings for different hotels Not all hotels are rated by all users There is information available on each hotel It’s size, price, location and etc. There is trade off between different attributes, i.e. a hotel in a better location has a higher price. Understand how the users’ ratings is affected by hotels’ attributes What’s the probability of a user choosing a hotel with a lower price compared to a bigger room in a better location. Useful for websites’ management to provide more suitable options for customers

Hotel example Hotel name Price Location Holiday Inn 8 5 Hilton 2 10 Shangri La 4 Name Hotel Rating Alex Holiday Inn 6.5 Hilton 6 Shangri La Sam 8.4 5.6 Nick 4.9 4.4 Name Price Location Alex 0.5 Sam 0.2 0.8 Nick 0.7 0.3

Introduction Given a set of ratings or scores provided by users on a set of points, find how the users’ judgement is affected by different attributes of the points. First need to understand the decision making process of each user How much value does each user attach to each attribute of the points Then, build a general probability distribution model based on that

Related works Recommender systems [1, 2] Preference learning [3, 4] predicting a new user's preferences based on the previous information. Grouping users based on their similarities and predicting users preferences by assigning them to a group. Preference learning [3, 4] Predicting a user’s preferences based on information available about the user. Does not provide any information on how much a user values different attributes of different items.

Understanding Each User Utility Functions The rating each user provides for a data point is called the utility of the user from that point. Utility quantifies the “satisfaction” a user derives from the data point. Assuming user’s satisfaction can be quantified. Consider a set of points 𝐷 for which the has provided ratings. I.e. for some points in 𝐷 we know the utility of the user. We associate with a user, a function, 𝑓 𝑝 :𝐷 →ℛ, where for a point p∈𝐷, 𝑓(𝑝) is a real number equal to the utility the user derives from the point 𝑝. Name Hotel Rating Alex Holiday Inn 6.5 Hilton 6 Shangri La Sam 8.4 5.6 Nick 4.9 4.4

Understanding Each User Linear Utility Functions Name Price Location Alex 0.5 Sam 0.2 0.8 Nick 0.7 0.3 Understanding Each User Linear Utility Functions Let 𝐷 =𝑛. The utility function of a user, 𝑓(𝑝), can be written as an 𝑛-dimensional vector whose 𝑖 th element is the utility the user derives from the 𝑖 𝑡ℎ point of the database. Consider 𝐷 as a 𝑑-dimensional database, where each point is a vector, and 𝑝 𝑖 is its value in the 𝑖 𝑡ℎ dimension. We call a utility function linear if there exists a a 𝑑- dimensional vector 𝑤 consisting of 𝑤 1 , 𝑤 2 ,…, 𝑤 𝑑 for which 𝑓 𝑝 = 𝑖=1 𝑑 𝑤 𝑖 × 𝑝 𝑖 . Alternatively, we can write 𝑓 𝑝 =𝑝⋅ 𝑤. We call 𝑤 𝑖 the weight the user attaches to the 𝑖 𝑡ℎ dimension. We can use 𝑤 to refer to the utility function 𝑓.

Understanding Each User Modeling User’s Utility We propose to use a linear model to capture the value a user attaches to each dimension of a data point. It provides an understanding of the user’s behavior, although a linear model might not perfectly fit the data. The model assumes that a linear utility functions can express the relationship between the points and the utility of the users. Note that there might exist utility functions that are completely independent of the points’ attributes, but in general we expect to see a correlation. A customer will usually consider the price of a hotel before booking it.

Using Linear Models for Utility Functions For a utility function vector 𝑓, a matrix 𝑋 where each row is a point vector 𝑥 𝑖 and a weight vector 𝑤, we set 𝑋𝑤=𝑓. We want to find a 𝑤 for which the above holds. The equations might be inconsistent, as users’ utility may not be perfectly linear. We might have observations on the value of 𝑓 for only a few points in database. Least squares linear solution Find a solution with the least square error. Linear regression with Gaussian noise Finding 𝑤 so as to maximize the likelihood of 𝑓.

Building a Probability Distribution Using Gaussian Mixture Model Create a utility distribution based on the inferred utility function Gaussian Mixture Model (GMM) It divides customers into 𝑘 groups each having a multivariate Gaussian distribution. A value for 𝑘 needs to be found through trial and error. It assumes each group of customers can be modeled by a multivariate Gaussian distribution.

Building a Probability Distribution Based on distance from samples We assume the probability of a utility function getting a specific set of values changes based on its distance from samples. That is, given a utility function vector, 𝑣, we assume the probability of utility functions existing in a region follows a multivariate normal distribution 𝑁 𝑣, 𝐼 , where 𝐼 is the identity matrix. If we have 𝑛 samples, then, we use a mixture distribution consisting of 𝑛 Gaussian distributions 𝑁( 𝑣 𝑖 , 1 𝑛 ×𝐼), each with probability 1 𝑛 . The cumulative distribution function (cdf) will be: 𝑓 𝑥 1 , 𝑥 2 ,… 𝑥 𝑑 = 1 𝑛 1 𝑛 𝜙 𝑖 ( 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑑 ) Where 𝜙 𝑖 is the cdf of 𝑁( 𝑣 𝑖 , 𝑛×𝐼) normal distribution.

The Hotel Example Price Price Low Prob. High Prob. location location Clustering Gaussian mixture model Distance based model

Experiments We did experiments on data sampled from uniform distribution in 2 dimensions with 20 samples. GMM with 2 components returns 2 multivariate Gaussian distributions with means (7.99404513, 2.36299952) and (3.1791763 , 6.74665699) and each component with prob. 0.5. In this model, Pr⁡(5≤𝑥≤6 𝑎𝑛𝑑 0≤𝑦≤1) is less than Pr 7.5≤𝑥≤8.5 𝑎𝑛𝑑 2≤𝑦≤3 With the distance based model, the probability of getting each of these 1×1 squares is the same, but for more samples is needed for smaller units. 10 5 5 10 Original uniform distribution

Summary We’ve provided methods to understand how different users evaluate different characteristics of different products. By assuming a linear model for utility functions, we’ve provided 2 methods to find out how much value different users attach to different attributes of different items. Using these weights, we proposed two methods to model the distribution of utility functions of users.

Thank you!

Reference [1] A. M. Rashid, G. Karypis, and J. Riedl, Learning preferences of new users in recommender systems: An information theoretic approach," SIGKDD Explor. Newsl., vol. 10, pp. 90{100, Dec. 2008. [2] R. Burke, Hybrid recommender systems: Survey and experiments," User modeling and user-adapted interaction, 2002. [3] W. Chu and Z. Ghahramani, Preference learning with gaussian processes," in Proceedings of the 22Nd International Conference on Machine Learning, ICML '05, (New York, NY, USA), pp. 137{144, ACM, 2005. [4] N. Houlsby, J. M. Hernandez-Lobato, F. Huszar, and Z. Ghahramani, Collaborative gaussian processes for preference learning," in Proceedings of the 25th International Conference on Neural Information Processing Systems, NIPS'12, (USA), pp. 2096{2104, Curran Associates Inc., 2012.