Jia-Bin Huang Virginia Tech Recommender Systems Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019
Administrative HW 4 due April 10
Unsupervised Learning Clustering, K-Mean Expectation maximization Dimensionality reduction Anomaly detection Recommendation system
Motivating example: Monitoring machines in a data center 𝑥 2 (Memory use) 𝑥 1 (CPU load) 𝑥 2 (Memory use) 𝑥 1 (CPU load)
Multivariate Gaussian (normal) distribution 𝑥∈ 𝑅 𝑛 . Don’t model 𝑝 𝑥 1 ,𝑝 𝑥 2 , ⋯ separately Model 𝑝 𝑥 all in one go. Parameters: 𝜇∈ 𝑅 𝑛 , Σ∈ 𝑅 𝑛×𝑛 (covariance matrix) 𝑝 𝑥;𝜇, Σ = 1 2𝜋 𝑛/2 Σ 1/2 exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇)
Multivariate Gaussian (normal) examples Σ = 1 0 0 1 Σ = 0.6 0 0 0.6 Σ = 2 0 0 2 𝑥 2 𝑥 2 𝑥 2 𝑥 1 𝑥 1 𝑥 1
Multivariate Gaussian (normal) examples Σ = 1 0 0 1 Σ = 0.6 0 0 1 Σ = 2 0 0 1 𝑥 2 𝑥 2 𝑥 2 𝑥 1 𝑥 1 𝑥 1
Multivariate Gaussian (normal) examples Σ = 1 0 0 1 Σ = 1 0.5 0.5 1 Σ = 1 0.8 0.8 1 𝑥 2 𝑥 2 𝑥 2 𝑥 1 𝑥 1 𝑥 1
Anomaly detection using the multivariate Gaussian distribution Fit model 𝑝 𝑥 by setting 𝜇= 1 𝑚 𝑖=1 𝑚 𝑥 (𝑖) Σ= 1 𝑚 𝑖=1 𝑚 (𝑥 (𝑖) −𝜇)(𝑥 (𝑖) − 𝜇) ⊤ 2 Give a new example 𝑥, compute 𝑝 𝑥;𝜇, Σ = 1 2𝜋 𝑛/2 Σ 1/2 exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇) Flag an anomaly if 𝑝 𝑥 <𝜖
Automatically captures correlations between features Original model 𝑝 𝑥 1 ; 𝜇 1 , 𝜎 1 2 𝑝 𝑥 2 ; 𝜇 2 , 𝜎 2 2 ⋯𝑝 𝑥 𝑛 ; 𝜇 𝑛 , 𝜎 𝑛 2 Manually create features to capture anomalies where 𝑥 1 , 𝑥 2 take unusual combinations of values Computationally cheaper (alternatively, scales better) OK even if training set size is small Original model 𝑝 𝑥;𝜇, Σ = 1 2𝜋 𝑛/2 Σ 1/2 exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇) exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇) Automatically captures correlations between features Computationally more expensive Must have 𝑚>𝑛 or else Σ is non- invertible
Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering Mean normalization
Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering Mean normalization
You may also like..?
Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering Mean normalization
Example: Predicting movie ratings User rates movies using zero to five stars 𝑛 𝑢 = no. users 𝑛 𝑚 = no. movies 𝑟 𝑖,𝑗 =1 if user 𝑗 has rated movie 𝑖 𝑦 (𝑖,𝑗) = rating given by user 𝑗 to movie 𝑖 Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate
Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering Mean normalization
Content-based recommender systems Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 0.9 Romance forever ? 1.0 0.01 Cute puppies of love 4 0.99 Nonstop car chases 0.1 Swords vs. karate For each user 𝑗, learn a parameter 𝜃 (𝑗) ∈ 𝑅 3 . Predict user 𝑗 as rating movie 𝑖 with (𝜃 𝑗 ) ⊤ 𝑥 (𝑖) stars.
Content-based recommender systems Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 0.9 Romance forever ? 1.0 0.01 Cute puppies of love 4 0.99 Nonstop car chases 0.1 Swords vs. karate 𝑥 (3) = 1 0.99 0 𝜃 1 = 0 5 0 (𝜃 1 ) ⊤ 𝑥 (3) =5∗0.99=4.95 For each user 𝑗, learn a parameter 𝜃 (𝑗) ∈ 𝑅 3 . Predict user 𝑗 as rating movie 𝑖 with (𝜃 𝑗 ) ⊤ 𝑥 (𝑖) stars.
Problem formulation 𝑟 𝑖,𝑗 =1 if user 𝑗 has rated movie 𝑖 𝑦 (𝑖,𝑗) = rating given by user 𝑗 to movie 𝑖 𝜃 (𝑗) = parameter vector for user 𝑗 𝑥 (𝑖) = feature vector for user 𝑖 For each user 𝑗, predicted rating: (𝜃 𝑗 ) ⊤ 𝑥 (𝑖) 𝑚 (𝑗) = no. of movies rated by user j Goal: learn 𝜃 (𝑗) : min 𝜃 (𝑗) 1 2 𝑚 (𝑗) 𝑖:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑚 (𝑗) 𝑘=1 𝑛 𝜃 𝑘 𝑗 2
Optimization objective Learn 𝜃 𝑗 (parameter for user 𝑗): min 𝜃 (𝑗) 1 2 𝑖:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Learn 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 : min 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 1 2 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2
Optimization algorithm min 𝜃 (𝑗) 1 2 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Gradient descent update: 𝜃 𝑘 𝑗 ≔ 𝜃 𝑘 𝑗 −𝛼 𝑖:𝑟 𝑖,𝑗 =1 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝑥 𝑘 𝑖 (for 𝑘=0) 𝜃 𝑘 𝑗 ≔ 𝜃 𝑘 𝑗 −𝛼 𝑖:𝑟 𝑖,𝑗 =1 ( 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 ) 𝑥 𝑘 𝑖 +𝜆 𝜃 𝑘 (𝑗) (for 𝑘≠0)
Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering Mean normalization
Problem motivation 5 0.9 ? 1.0 0.01 4 0.99 0.1 Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 0.9 Romance forever ? 1.0 0.01 Cute puppies of love 4 0.99 Nonstop car chases 0.1 Swords vs. karate
Problem motivation 𝜃 1 = 0 5 0 𝜃 2 = 0 5 0 𝜃 3 = 0 0 5 𝜃 4 = 0 0 5 Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 ? Romance forever Cute puppies of love 4 Nonstop car chases Swords vs. karate 𝜃 1 = 0 5 0 𝜃 2 = 0 5 0 𝜃 3 = 0 0 5 𝜃 4 = 0 0 5 𝑥 1 = ? ? ?
Optimization algorithm Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , to learn 𝑥 (𝑖) : min 𝑥 (𝑖) 1 2 𝑗:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , to learn 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) : min 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) 1 2 𝑖=1 𝑛 𝑚 𝑗:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2
Collaborative filtering Given 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 (and movie ratings), Can estimate 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 Can estimate 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚
Collaborative filtering optimization objective Given 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , estimate 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 min 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 1 2 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , estimate 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 min 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) 1 2 𝑖=1 𝑛 𝑚 𝑗:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2
Collaborative filtering optimization objective Given 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , estimate 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 min 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 1 2 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , estimate 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 min 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) 1 2 𝑖=1 𝑛 𝑚 𝑗:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 Minimize 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 and 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 simultaneously 𝐽= 1 2 𝑗:𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2
Collaborative filtering optimization objective 𝐽( 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 )= 1 2 𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2
Collaborative filtering algorithm Initialize 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 to small random values Minimize 𝐽( 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 ) using gradient descent (or an advanced optimization algorithm). For every 𝑗= 1⋯ 𝑛 𝑢 , 𝑖=1, ⋯, 𝑛 𝑚 : 𝑥 𝑘 𝑗 ≔ 𝑥 𝑘 𝑗 −𝛼 𝑗:𝑟 𝑖,𝑗 =1 ( 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 ) 𝜃 𝑘 𝑖 +𝜆 𝑥 𝑘 (𝑖) 𝜃 𝑘 𝑗 ≔ 𝜃 𝑘 𝑗 −𝛼 𝑖:𝑟 𝑖,𝑗 =1 ( 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 ) 𝑥 𝑘 𝑖 +𝜆 𝜃 𝑘 (𝑗) For a user with parameter 𝜃 and movie with (learned) feature 𝑥, predict a star rating of 𝜃 ⊤ 𝑥
Collaborative filtering Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate
Collaborative filtering Predicted ratings: 𝑋= − 𝑥 1 ⊤ − − 𝑥 2 ⊤ − ⋮ − 𝑥 𝑛 𝑚 ⊤ − Θ= − 𝜃 1 ⊤ − − 𝜃 2 ⊤ − ⋮ − 𝜃 𝑛 𝑢 ⊤ − Y=X Θ ⊤ Low-rank matrix factorization
Finding related movies/products For each product 𝑖, we learn a feature vector 𝑥 (𝑖) ∈ 𝑅 𝑛 𝑥 1 : romance, 𝑥 2 : action, 𝑥 3 : comedy, … How to find movie 𝑗 relate to movie 𝑖? Small 𝑥 (𝑖) − 𝑥 (𝑗) movie j and I are “similar”
Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering Mean normalization
Users who have not rated any movies Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 ? Romance forever Cute puppies of love 4 Nonstop car chases Swords vs. karate 1 2 𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 𝜃 (5) = 0 0
Users who have not rated any movies Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate 1 2 𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 𝜃 (5) = 0 0
Mean normalization Learn 𝜃 (𝑗) , 𝑥 (𝑖) For user 𝑗, on movie 𝑖 predict: 𝜃 𝑗 ⊤ 𝑥 (𝑖) + 𝜇 𝑖 User 5 (Eve): 𝜃 5 = 0 0 𝜃 5 ⊤ 𝑥 (𝑖) + 𝜇 𝑖 Learn 𝜃 (𝑗) , 𝑥 (𝑖)
Recommender Systems Motivation Problem formulation Content-based recommendations Collaborative filtering Mean normalization