Jia-Bin Huang Virginia Tech

Jia-Bin Huang Virginia Tech
Recommender Systems Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019

Administrative HW 4 due April 10

Unsupervised Learning
Clustering, K-Mean Expectation maximization Dimensionality reduction Anomaly detection Recommendation system

Motivating example: Monitoring machines in a data center
𝑥 2 (Memory use) 𝑥 1 (CPU load) 𝑥 2 (Memory use) 𝑥 1 (CPU load)

Multivariate Gaussian (normal) distribution
𝑥∈ 𝑅 𝑛 . Don’t model 𝑝 𝑥 1 ,𝑝 𝑥 2 , ⋯ separately Model 𝑝 𝑥 all in one go. Parameters: 𝜇∈ 𝑅 𝑛 , Σ∈ 𝑅 𝑛×𝑛 (covariance matrix) 𝑝 𝑥;𝜇, Σ = 1 2𝜋 𝑛/2 Σ 1/2 exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇)

Multivariate Gaussian (normal) examples
Σ = Σ = Σ = 𝑥 2 𝑥 2 𝑥 2 𝑥 1 𝑥 1 𝑥 1

Anomaly detection using the multivariate Gaussian distribution
Fit model 𝑝 𝑥 by setting 𝜇= 1 𝑚 𝑖=1 𝑚 𝑥 (𝑖) Σ= 1 𝑚 𝑖=1 𝑚 (𝑥 (𝑖) −𝜇)(𝑥 (𝑖) − 𝜇) ⊤ 2 Give a new example 𝑥, compute 𝑝 𝑥;𝜇, Σ = 1 2𝜋 𝑛/2 Σ 1/2 exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇) Flag an anomaly if 𝑝 𝑥 <𝜖

Automatically captures correlations between features
Original model 𝑝 𝑥 1 ; 𝜇 1 , 𝜎 1 2 𝑝 𝑥 2 ; 𝜇 2 , 𝜎 2 2 ⋯𝑝 𝑥 𝑛 ; 𝜇 𝑛 , 𝜎 𝑛 2 Manually create features to capture anomalies where 𝑥 1 , 𝑥 2 take unusual combinations of values Computationally cheaper (alternatively, scales better) OK even if training set size is small Original model 𝑝 𝑥;𝜇, Σ = 1 2𝜋 𝑛/2 Σ 1/2 exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇) exp − 𝑥−𝜇 ⊤ Σ −1 (𝑥−𝜇) Automatically captures correlations between features Computationally more expensive Must have 𝑚>𝑛 or else Σ is non- invertible

Recommender Systems Motivation Problem formulation
Content-based recommendations Collaborative filtering Mean normalization

You may also like..?

Example: Predicting movie ratings
User rates movies using zero to five stars 𝑛 𝑢 = no. users 𝑛 𝑚 = no. movies 𝑟 𝑖,𝑗 =1 if user 𝑗 has rated movie 𝑖 𝑦 (𝑖,𝑗) = rating given by user 𝑗 to movie 𝑖 Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate

Content-based recommender systems
Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 0.9 Romance forever ? 1.0 0.01 Cute puppies of love 4 0.99 Nonstop car chases 0.1 Swords vs. karate For each user 𝑗, learn a parameter 𝜃 (𝑗) ∈ 𝑅 3 . Predict user 𝑗 as rating movie 𝑖 with (𝜃 𝑗 ) ⊤ 𝑥 (𝑖) stars.

Content-based recommender systems
Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 0.9 Romance forever ? 1.0 0.01 Cute puppies of love 4 0.99 Nonstop car chases 0.1 Swords vs. karate 𝑥 (3) = 𝜃 1 = (𝜃 1 ) ⊤ 𝑥 (3) =5∗0.99=4.95 For each user 𝑗, learn a parameter 𝜃 (𝑗) ∈ 𝑅 3 . Predict user 𝑗 as rating movie 𝑖 with (𝜃 𝑗 ) ⊤ 𝑥 (𝑖) stars.

Problem formulation 𝑟 𝑖,𝑗 =1 if user 𝑗 has rated movie 𝑖 𝑦 (𝑖,𝑗) = rating given by user 𝑗 to movie 𝑖 𝜃 (𝑗) = parameter vector for user 𝑗 𝑥 (𝑖) = feature vector for user 𝑖 For each user 𝑗, predicted rating: (𝜃 𝑗 ) ⊤ 𝑥 (𝑖) 𝑚 (𝑗) = no. of movies rated by user j Goal: learn 𝜃 (𝑗) : min 𝜃 (𝑗) 𝑚 (𝑗) 𝑖:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑚 (𝑗) 𝑘=1 𝑛 𝜃 𝑘 𝑗 2

Optimization objective
Learn 𝜃 𝑗 (parameter for user 𝑗): min 𝜃 (𝑗) 𝑖:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Learn 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 : min 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2

Optimization algorithm
min 𝜃 (𝑗) 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Gradient descent update: 𝜃 𝑘 𝑗 ≔ 𝜃 𝑘 𝑗 −𝛼 𝑖:𝑟 𝑖,𝑗 = 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝑥 𝑘 𝑖 (for 𝑘=0) 𝜃 𝑘 𝑗 ≔ 𝜃 𝑘 𝑗 −𝛼 𝑖:𝑟 𝑖,𝑗 =1 ( 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 ) 𝑥 𝑘 𝑖 +𝜆 𝜃 𝑘 (𝑗) (for 𝑘≠0)

Problem motivation 5 0.9 ? 1.0 0.01 4 0.99 0.1 Movie Alice (1) Bob (2)
Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 0.9 Romance forever ? 1.0 0.01 Cute puppies of love 4 0.99 Nonstop car chases 0.1 Swords vs. karate

Problem motivation 𝜃 1 = 0 5 0 𝜃 2 = 0 5 0 𝜃 3 = 0 0 5 𝜃 4 = 0 0 5
Movie Alice (1) Bob (2) Carol (3) Dave (4) 𝑥 1 (romance) 𝑥 2 (action) Love at last 5 ? Romance forever Cute puppies of love 4 Nonstop car chases Swords vs. karate 𝜃 1 = 𝜃 2 = 𝜃 3 = 𝜃 4 = 𝑥 1 = ? ? ?

Optimization algorithm
Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , to learn 𝑥 (𝑖) : min 𝑥 (𝑖) 𝑗:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , to learn 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) : min 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) 𝑖=1 𝑛 𝑚 𝑗:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2

Collaborative filtering
Given 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 (and movie ratings), Can estimate 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 Can estimate 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚

Collaborative filtering optimization objective
Given 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , estimate 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 min 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , estimate 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 min 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) 𝑖=1 𝑛 𝑚 𝑗:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2

Given 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , estimate 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 min 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 𝑗=1 𝑛 𝑢 𝑖:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 Given 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 , estimate 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 min 𝑥 (1) , 𝑥 (2) , ⋯, 𝑥 ( 𝑛 𝑚 ) 𝑖=1 𝑛 𝑚 𝑗:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 Minimize 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 and 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 simultaneously 𝐽= 𝑗:𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2

𝐽( 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 )= 1 2 𝑟 𝑖,𝑗 =1 (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 2 + 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 2 + 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2

Collaborative filtering algorithm
Initialize 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 to small random values Minimize 𝐽( 𝑥 1 , 𝑥 2 , ⋯, 𝑥 𝑛 𝑚 , 𝜃 1 , 𝜃 2 , ⋯, 𝜃 𝑛 𝑢 ) using gradient descent (or an advanced optimization algorithm). For every 𝑗= 1⋯ 𝑛 𝑢 , 𝑖=1, ⋯, 𝑛 𝑚 : 𝑥 𝑘 𝑗 ≔ 𝑥 𝑘 𝑗 −𝛼 𝑗:𝑟 𝑖,𝑗 =1 ( 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 ) 𝜃 𝑘 𝑖 +𝜆 𝑥 𝑘 (𝑖) 𝜃 𝑘 𝑗 ≔ 𝜃 𝑘 𝑗 −𝛼 𝑖:𝑟 𝑖,𝑗 =1 ( 𝜃 𝑗 ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 ) 𝑥 𝑘 𝑖 +𝜆 𝜃 𝑘 (𝑗) For a user with parameter 𝜃 and movie with (learned) feature 𝑥, predict a star rating of 𝜃 ⊤ 𝑥

Movie Alice (1) Bob (2) Carol (3) Dave (4) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate

Predicted ratings: 𝑋= − 𝑥 ⊤ − − 𝑥 ⊤ − ⋮ − 𝑥 𝑛 𝑚 ⊤ − Θ= − 𝜃 ⊤ − − 𝜃 ⊤ − ⋮ − 𝜃 𝑛 𝑢 ⊤ − Y=X Θ ⊤ Low-rank matrix factorization

Finding related movies/products
For each product 𝑖, we learn a feature vector 𝑥 (𝑖) ∈ 𝑅 𝑛 𝑥 1 : romance, 𝑥 2 : action, 𝑥 3 : comedy, … How to find movie 𝑗 relate to movie 𝑖? Small 𝑥 (𝑖) − 𝑥 (𝑗) movie j and I are “similar”

Users who have not rated any movies
Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 ? Romance forever Cute puppies of love 4 Nonstop car chases Swords vs. karate 1 2 𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 𝜃 (5) = 0 0

Users who have not rated any movies
Alice (1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 Romance forever ? Cute puppies of love 4 Nonstop car chases Swords vs. karate 1 2 𝑟 𝑖,𝑗 = (𝜃 𝑗 ) ⊤ 𝑥 𝑖 − 𝑦 𝑖,𝑗 𝜆 2 𝑗=1 𝑛 𝑢 𝑘=1 𝑛 𝜃 𝑘 𝑗 𝜆 2 𝑖=1 𝑛 𝑚 𝑘=1 𝑛 𝑥 𝑘 (𝑖) 2 𝜃 (5) = 0 0

Mean normalization Learn 𝜃 (𝑗) , 𝑥 (𝑖) For user 𝑗, on movie 𝑖 predict:
𝜃 𝑗 ⊤ 𝑥 (𝑖) + 𝜇 𝑖 User 5 (Eve): 𝜃 5 = 𝜃 ⊤ 𝑥 (𝑖) + 𝜇 𝑖 Learn 𝜃 (𝑗) , 𝑥 (𝑖)

Jia-Bin Huang Virginia Tech

Similar presentations

Presentation on theme: "Jia-Bin Huang Virginia Tech"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jia-Bin Huang Virginia Tech

Similar presentations

Presentation on theme: "Jia-Bin Huang Virginia Tech"— Presentation transcript:

Similar presentations

About project

Feedback