Groove Radio: A Bayesian Hierarchical Model for Personalized Playlist Generation Shay Ben-Elazar, Gal Lavee, Noam Koenigstein, Oren Barkan, Hilik Berezin, Ulrich Paquet, Tal Zaccai ACM Conference on Web Search and Data Mining (WSDM'17), Cambridge UK, February 2017. Presented by: Noam Koenigstein
Groove Radio
Confidential Microsoft Corporation The Task Goal: Given a seed artist, generate a track playlist Millions of users, tens of millions of tracks Support different type of similarities Personalization Real world online execution Confidential Microsoft Corporation
How can we choose the next track? Goal: Given a seed artist, generate a tracks playlist. context Seed artist …. Track 1 Track 2 Track i-1 Track i Track i+1 label 𝑟 𝑖 ∈ 0,1 model 𝑃 𝑟 𝑖 | 𝐱 𝑖 𝐱 𝑖 = 𝑥 𝑖,1 , 𝑥 𝑖,2 ,…, 𝑥 𝑖,𝑑
Creating Playlists – A Classification Problem Let 𝐱 𝑖 = 𝑥 𝑖,1 , 𝑥 𝑖,2 ,…, 𝑥 𝑖,𝑑 denote a feature vector encoding the proposition of appending a particular track 𝑖 to a playlist. Feature are defined relative to a “context” which includes the seed artist and previously chosen tracks. The label 𝑟 𝑖 ∈ 0,1 indicates the success/ failure of the proposition encoded by the feature vector. We build a generative model to predict the success of a proposition.
Types of Similarity - Usage
Types of Similarity - Audio Audio Features: Spectral distribution with GMMs: Defining acoustic similarity:
Types of similarity – Meta-data
Types of similarity – Meta-data Warm Provocative
Types of Similarity - Popularity Number of users who consumed a track by 𝑎 1 Total users in the dataset
The classification problem context Seed artist …. Track 1 Track 2 Track i-1 Track i Track i+1 label 𝑟 𝑖 ∈ 0,1 model 𝑃 𝑟 𝑖 | 𝐱 𝑖 𝐱 𝑖 = 𝑥 𝑖,1 , 𝑥 𝑖,2 ,…, 𝑥 𝑖,𝑑
The classification problem context Previous tracks in Playlist: Seed artist: Candidate Track: Candidate artist to seed artist similarity Candidate artist to previous artist similarity Candidate track to previous track similarity
A Naïve Solution Simple logistic regression model: 𝑃 𝑟 𝑖 =1 𝐱 𝑖 =𝜎 𝐰 T 𝐱 𝑖 where 𝜎 𝑧 = 1 1+ exp −𝑧 We can create a playlist by choosing the candidate track with the largest 𝑃 𝑟 𝑖 =1 𝐱 𝑖 . Each weight 𝑤 𝑗 indicates the relative importance of the feature 𝑥 𝑖,𝑗 in determining the success of the candidate track 𝑖.
Different models for different artists
Different models for different artists
Different models for different users
Our Approach We want to construct a model with the following properties: Affords music domain heterogeneity Affords user personalization Deals gracefully with “coldness” We achieve this by using the following: Leveraging the well-understood hierarchical taxonomy of the music domain A generative Bayesian approach with informative priors Variational Bayes inference to model uncertainty
The Music Domain Taxonomy
The Music Domain Taxonomy
Hierarchical Model Naïve model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 Pr 𝐰| 𝜏 w =𝑁 𝐰;𝟎, 1 𝜏 w 𝐈 Genre model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑔 𝑖 (𝑔) Pr 𝐰 𝑔 𝑖 (𝑔) 𝐰, 𝜏 g =𝑁 𝐰 𝑔 𝑖 (𝑔) ;𝐰, 1 𝜏 g 𝐈 Sub-genre model: Artist model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑠 𝑖 (𝑠) Pr 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) , 𝜏 s =𝑁 𝐰 𝑠 𝑖 (𝑠) ; 𝐰 𝑔 𝑖 (𝑔) , 1 𝜏 s 𝐈 Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑎 𝑖 (𝑎) Pr 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) , 𝜏 a =𝑁 𝐰 𝑎 𝑖 (𝑎) ; 𝐰 𝑠 𝑖 (𝑠) , 1 𝜏 a 𝐈
Hierarchical Model Cont. Fully hierarchical model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑎 𝑖 (𝑎) Pr 𝐰| 𝜏 w =𝑁 𝐰;𝟎, 1 𝜏 w 𝐈 Pr 𝐰 𝑔 𝑖 (𝑔) 𝐰, 𝜏 g =𝑁 𝐰 𝑔 𝑖 (𝑔) ;𝐰, 1 𝜏 g 𝐈 Pr 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) , 𝜏 s =𝑁 𝐰 𝑠 𝑖 (𝑠) ; 𝐰 𝑔 𝑖 (𝑔) , 1 𝜏 s 𝐈 Pr 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) , 𝜏 a =𝑁 𝐰 𝑎 𝑖 (𝑎) ; 𝐰 𝑠 𝑖 (𝑠) , 1 𝜏 a 𝐈
Personalized Model Per user parameters: 𝒘 𝑢𝑎 = 𝐰 𝑎 + 𝐰 𝑢 Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑎 𝑖 (𝑎) + 𝐰 𝑢 𝑖 (𝑢) Pr 𝐰 𝑢 𝑖 (𝑢) | 𝜏 u =𝑁 𝐰 𝑢 𝑖 (𝑢) ;𝟎, 1 𝜏 a 𝐈
Graphical Model x 𝒊 𝐰 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) #𝐺𝑒𝑛𝑟𝑒𝑠 𝐰 𝑠 𝑖 (𝑠) #𝑆𝑢𝑏𝑔𝑒𝑛𝑟𝑒𝑠 𝐰 𝑢 𝑖 (𝑢) #𝑈𝑠𝑒𝑟𝑠 #𝐴𝑟𝑡𝑖𝑠𝑡𝑠 𝐰 𝑎 𝑖 (𝑎) Label 𝑟 𝑖 #𝐷𝑎𝑡𝑎 x 𝒊
The Joint Probability
Expectation Propagation (EP) Inference Approaches 𝜽 𝜽* MAP (maximum a posteriori) Mean field / Variational Bayes (VB) Expectation Propagation (EP) Laplace Markov chain Monte Carlo (MCMC)
Learning Artists
Learning Users
Learning Sub-Genres
Learning Genres
The Global Prior
The Precision Parameters
Practical Considerations We wish to ensure different playlists even for similar activations. We pre-compute a candidate list of 𝑀=1000 tracks for each seed artist. Discrete multinomial transition probabilities using the softmax function: Parameter 𝑠 tunes the desired degree of divrersity. 𝑝 𝑚 = 𝑒 𝑠⋅ 𝑟 𝑚 𝑖=1 𝑀 𝑒 𝑠⋅ 𝑟 𝑖
Datasets Groove Music- a proprietary dataset from Groove music service. Positive labels are assigned to ‘true’ transitions in a user’s listening history when both tracks were played till completion. Negative labels indicate transitions where the second track was skipped in mid-play. 30Music- a publicly available dataset of user playlists. Positive labels are assigned to tracks appearing in a playlist. Negatively labeled examples were obtained by uniformly sampling from tracks that did not appear.
Dataset Statistics
Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰
Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝐰 𝑔 𝑖 (𝑔)
Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝑠 𝑖 (𝑠)
Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑎 𝑖 (𝑎)
Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑢 𝑖 (𝑢) 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽
Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑢 𝑖 (𝑢) 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽
30Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑢 𝑖 (𝑢) 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽
Feature Contribution
Conclusions We described a real world playlist generation algorithm Account for the heterogeneity across artists and genres Support personalization Graceful handling of “coldness” A Bayesian model that utilizes the domain’s taxonomy Efficient variational Bayes inference
Thank You!