Shay Ben-Elazar, Gal Lavee, Noam Koenigstein,

Slides:

Advertisements

Similar presentations

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.

Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Linear Regression.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.

TNO orbit computation: analysing the observed population Jenni Virtanen Observatory, University of Helsinki Workshop on Transneptunian objects - Dynamical.

Model Assessment, Selection and Averaging

1 Graphical Diagnostic Tools for Evaluating Latent Class Models: An Application to Depression in the ECA Study Elizabeth S. Garrett Department of Biostatistics.

Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.

Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.

Bayesian Learning Rong Jin.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Scalable Text Mining with Sparse Generative Models

Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.

Thanks to Nir Friedman, HU

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Ensemble Learning (2), Tree and Forest

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.

Randomized Algorithms for Bayesian Hierarchical Clustering

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.

MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Reducing MCMC Computational Cost With a Two Layered Bayesian Approach

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach Aaron Wilson, Alan Fern, Prasad Tadepalli School of EECS Oregon State.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Monte Carlo Sampling to Inverse Problems Wojciech Dębski Inst. Geophys. Polish Acad. Sci. 1 st Orfeus workshop: Waveform inversion.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Chapter 7. Classification and Prediction

MCMC Output & Metropolis-Hastings Algorithm Part I

Online Multiscale Dynamic Topic Models

KDD CUP 2001 Task 1: Thrombin Jie Cheng (

Bayesian data analysis

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Multimodal Learning with Deep Boltzmann Machines

Dynamical Statistical Shape Priors for Level Set Based Tracking

Markov Networks.

Multidimensional Integration Part I

Generative Models and Naïve Bayes

iSRD Spam Review Detection with Imbalanced Data Distributions

Michal Rosen-Zvi University of California, Irvine

Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang

Parameter Learning 2 Structure Learning 1: The good

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Generative Models and Naïve Bayes

Wellington Cabrera Advisor: Carlos Ordonez

Markov Networks.

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

1.7.2 Multinomial Naïve Bayes

Outline Texture modeling - continued Markov Random Field models

Presentation transcript:

Groove Radio: A Bayesian Hierarchical Model for Personalized Playlist Generation Shay Ben-Elazar, Gal Lavee, Noam Koenigstein, Oren Barkan, Hilik Berezin, Ulrich Paquet, Tal Zaccai ACM Conference on Web Search and Data Mining (WSDM'17), Cambridge UK, February 2017. Presented by: Noam Koenigstein

Groove Radio

Confidential Microsoft Corporation The Task Goal: Given a seed artist, generate a track playlist Millions of users, tens of millions of tracks Support different type of similarities Personalization Real world online execution Confidential Microsoft Corporation

How can we choose the next track? Goal: Given a seed artist, generate a tracks playlist. context Seed artist …. Track 1 Track 2 Track i-1 Track i Track i+1 label 𝑟 𝑖 ∈ 0,1 model 𝑃 𝑟 𝑖 | 𝐱 𝑖 𝐱 𝑖 = 𝑥 𝑖,1 , 𝑥 𝑖,2 ,…, 𝑥 𝑖,𝑑

Creating Playlists – A Classification Problem Let 𝐱 𝑖 = 𝑥 𝑖,1 , 𝑥 𝑖,2 ,…, 𝑥 𝑖,𝑑 denote a feature vector encoding the proposition of appending a particular track 𝑖 to a playlist. Feature are defined relative to a “context” which includes the seed artist and previously chosen tracks. The label 𝑟 𝑖 ∈ 0,1 indicates the success/ failure of the proposition encoded by the feature vector. We build a generative model to predict the success of a proposition.

Types of Similarity - Usage

Types of Similarity - Audio Audio Features: Spectral distribution with GMMs: Defining acoustic similarity:

Types of similarity – Meta-data

Types of similarity – Meta-data Warm Provocative

Types of Similarity - Popularity Number of users who consumed a track by 𝑎 1 Total users in the dataset

The classification problem context Seed artist …. Track 1 Track 2 Track i-1 Track i Track i+1 label 𝑟 𝑖 ∈ 0,1 model 𝑃 𝑟 𝑖 | 𝐱 𝑖 𝐱 𝑖 = 𝑥 𝑖,1 , 𝑥 𝑖,2 ,…, 𝑥 𝑖,𝑑

The classification problem context Previous tracks in Playlist: Seed artist: Candidate Track: Candidate artist to seed artist similarity Candidate artist to previous artist similarity Candidate track to previous track similarity

A Naïve Solution Simple logistic regression model: 𝑃 𝑟 𝑖 =1 𝐱 𝑖 =𝜎 𝐰 T 𝐱 𝑖 where 𝜎 𝑧 = 1 1+ exp −𝑧 We can create a playlist by choosing the candidate track with the largest 𝑃 𝑟 𝑖 =1 𝐱 𝑖 . Each weight 𝑤 𝑗 indicates the relative importance of the feature 𝑥 𝑖,𝑗 in determining the success of the candidate track 𝑖.

Different models for different artists

Different models for different artists

Different models for different users

Our Approach We want to construct a model with the following properties: Affords music domain heterogeneity Affords user personalization Deals gracefully with “coldness” We achieve this by using the following: Leveraging the well-understood hierarchical taxonomy of the music domain A generative Bayesian approach with informative priors Variational Bayes inference to model uncertainty

The Music Domain Taxonomy

The Music Domain Taxonomy

Hierarchical Model Naïve model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 Pr 𝐰| 𝜏 w =𝑁 𝐰;𝟎, 1 𝜏 w 𝐈 Genre model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑔 𝑖 (𝑔) Pr 𝐰 𝑔 𝑖 (𝑔) 𝐰, 𝜏 g =𝑁 𝐰 𝑔 𝑖 (𝑔) ;𝐰, 1 𝜏 g 𝐈 Sub-genre model: Artist model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑠 𝑖 (𝑠) Pr 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) , 𝜏 s =𝑁 𝐰 𝑠 𝑖 (𝑠) ; 𝐰 𝑔 𝑖 (𝑔) , 1 𝜏 s 𝐈 Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑎 𝑖 (𝑎) Pr 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) , 𝜏 a =𝑁 𝐰 𝑎 𝑖 (𝑎) ; 𝐰 𝑠 𝑖 (𝑠) , 1 𝜏 a 𝐈

Hierarchical Model Cont. Fully hierarchical model: Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑎 𝑖 (𝑎) Pr 𝐰| 𝜏 w =𝑁 𝐰;𝟎, 1 𝜏 w 𝐈 Pr 𝐰 𝑔 𝑖 (𝑔) 𝐰, 𝜏 g =𝑁 𝐰 𝑔 𝑖 (𝑔) ;𝐰, 1 𝜏 g 𝐈 Pr 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) , 𝜏 s =𝑁 𝐰 𝑠 𝑖 (𝑠) ; 𝐰 𝑔 𝑖 (𝑔) , 1 𝜏 s 𝐈 Pr 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) , 𝜏 a =𝑁 𝐰 𝑎 𝑖 (𝑎) ; 𝐰 𝑠 𝑖 (𝑠) , 1 𝜏 a 𝐈

Personalized Model Per user parameters: 𝒘 𝑢𝑎 = 𝐰 𝑎 + 𝐰 𝑢 Pr 𝑟 𝑖 =1 𝑐𝑜𝑛𝑡𝑒𝑥𝑡 𝐱 𝑖 )=𝜎 𝐱 𝑖 T 𝐰 𝑎 𝑖 (𝑎) + 𝐰 𝑢 𝑖 (𝑢) Pr 𝐰 𝑢 𝑖 (𝑢) | 𝜏 u =𝑁 𝐰 𝑢 𝑖 (𝑢) ;𝟎, 1 𝜏 a 𝐈

Graphical Model x 𝒊 𝐰 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) #𝐺𝑒𝑛𝑟𝑒𝑠 𝐰 𝑠 𝑖 (𝑠) #𝑆𝑢𝑏𝑔𝑒𝑛𝑟𝑒𝑠 𝐰 𝑢 𝑖 (𝑢) #𝑈𝑠𝑒𝑟𝑠 #𝐴𝑟𝑡𝑖𝑠𝑡𝑠 𝐰 𝑎 𝑖 (𝑎) Label 𝑟 𝑖 #𝐷𝑎𝑡𝑎 x 𝒊

The Joint Probability

Expectation Propagation (EP) Inference Approaches 𝜽 𝜽* MAP (maximum a posteriori) Mean field / Variational Bayes (VB) Expectation Propagation (EP) Laplace Markov chain Monte Carlo (MCMC)

Learning Artists

Learning Users

Learning Sub-Genres

Learning Genres

The Global Prior

The Precision Parameters

Practical Considerations We wish to ensure different playlists even for similar activations. We pre-compute a candidate list of 𝑀=1000 tracks for each seed artist. Discrete multinomial transition probabilities using the softmax function: Parameter 𝑠 tunes the desired degree of divrersity. 𝑝 𝑚 = 𝑒 𝑠⋅ 𝑟 𝑚 𝑖=1 𝑀 𝑒 𝑠⋅ 𝑟 𝑖

Datasets Groove Music- a proprietary dataset from Groove music service. Positive labels are assigned to ‘true’ transitions in a user’s listening history when both tracks were played till completion. Negative labels indicate transitions where the second track was skipped in mid-play. 30Music- a publicly available dataset of user playlists. Positive labels are assigned to tracks appearing in a playlist. Negatively labeled examples were obtained by uniformly sampling from tracks that did not appear.

Dataset Statistics

Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰

Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝐰 𝑔 𝑖 (𝑔)

Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝑠 𝑖 (𝑠)

Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑎 𝑖 (𝑎)

Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑢 𝑖 (𝑢) 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽

Groove Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑢 𝑖 (𝑢) 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽

30Music Dataset Label 𝑟 𝑖 x 𝒊 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑎 𝑖 (𝑎) 𝐰 𝑠 𝑖 (𝑠) 𝐰 𝑔 𝑖 (𝑔) 𝐰 𝐰 𝑢 𝑖 (𝑢) 𝜏 𝑢 𝜏 𝑎 𝜏 𝑠 𝜏 𝑔 𝜏 𝑤 𝛼,𝛽

Feature Contribution

Conclusions We described a real world playlist generation algorithm Account for the heterogeneity across artists and genres Support personalization Graceful handling of “coldness” A Bayesian model that utilizes the domain’s taxonomy Efficient variational Bayes inference

Thank You!