Bayesian Ranking using Expectation Propagation and Factor Graphs

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Jose-Luis Blanco, Javier González, Juan-Antonio Fernández-Madrigal University of Málaga (Spain) Dpt. of System Engineering and Automation May Pasadena,
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.
Dynamic Bayesian Networks (DBNs)
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Graphical models: approximate inference and learning CA6b, lecture 5.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Variational Inference and Variational Message Passing
Belief Propagation in a Continuous World Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth TexPoint fonts used in EMF. Read the TexPoint.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
1 Distributed localization of networked cameras Stanislav Funiak Carlos Guestrin Carnegie Mellon University Mark Paskin Stanford University Rahul Sukthankar.
Extending Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Global Approximate Inference Eran Segal Weizmann Institute.
Belief Propagation, Junction Trees, and Factor Graphs
Understanding Belief Propagation and its Applications Dan Yuan June 2004.
Third Generation Machine Intelligence Christopher M. Bishop Microsoft Research, Cambridge Microsoft Research Summer School 2009.
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.
The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.
Computer vision: models, learning and inference
Extensions to message-passing inference S. M. Ali Eslami September 2014.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.
Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley
Belief Propagation. What is Belief Propagation (BP)? BP is a specific instance of a general class of methods that exist for approximate inference in Bayes.
Frontiers in Applications of Machine Learning Chris Bishop Microsoft Research
High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford and Young labs Dana-Farber Cancer Institute Jan 2007.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
Randomized Algorithms for Bayesian Hierarchical Clustering
Bayesian inference for Plackett-Luce ranking models
David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge.
CS Statistical Machine learning Lecture 24
Probabilistic Machine Learning in Computational Advertising Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela Online Services.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Pattern Recognition and Machine Learning
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Today Graphical Models Representing conditional dependence graphically
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Conditional Random Fields using Power EP Tom Minka Joint work with Yuan Qi and Martin Szummer.
Stochasticity and Probability. A new approach to insight Pose question and think of the answer needed to answer it. Ask: How do the data arise? What is.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Introduction of BP & TRW-S
Learning Deep Generative Models by Ruslan Salakhutdinov
Extending Expectation Propagation for Graphical Models
Particle Filtering for Geometric Active Contours
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Exact Inference Continued
Latent Variables, Mixture Models and EM
Bucket Renormalization for Approximate Inference
A Latent Space Approach to Dynamic Embedding of Co-occurrence Data
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Bayesian Models in Machine Learning
Markov Random Fields Presented by: Vladan Radosavljevic.
Expectation-Maximization & Belief Propagation
Extending Expectation Propagation for Graphical Models
GANG: Detecting Fraudulent Users in OSNs
Mean Field and Variational Methods Loopy Belief Propagation
Presentation transcript:

Bayesian Ranking using Expectation Propagation and Factor Graphs Dumitru Erhan LISA/DIRO @ Université de Montréal 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Preface Not my work (at all) EP: Tom Minka TrueSkillTM: Ralf Herbrich & Thore Graepel @ MSR Cambridge (UK) “TrueChess”: Pierre Dangauthier @ INRIA Rhône-Alpes (France) Slides, plots, and results taken with permission 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Outline Problem setting Xbox Live Factor Graphs Exact inference in Factor Graphs Approximate inference using EP Loopy schedules and chess ratings Results 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking The Ranking Problem Vaguely speaking: Input: ordered subsets of data Output: a ranking function For example: Chess Online games Movie ratings Internet search 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Modelling Ranking Ordinal regression: Order learning: f (x) Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 rank (x) f (x) f (c) f (b) f (a) 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Xbox Live 19/11/2018 Dumitru Erhan - Bayesian Ranking

Modelling the Bayesian Way I Track belief distributions: Allow performance variations: Model game outcome: P ( s i ) = N ; ¹ ¾ 2 P ( s j ) = N ; ¹ ¾ 2 P ( x i j s ) = N ; ¯ 2 P ( x j s ) = N ; ¯ 2 P ( p l a y e r i w n s ) = I x ¡ j ¸ 19/11/2018 Dumitru Erhan - Bayesian Ranking

Modelling the Bayesian Way II This leads to a probit-based likelihood Posterior is not Gaussian! Implications for inference, tracking, etc. What if we could obtain a nice visualization of the model and stay in the Gaussian/exponential family, and perform the approximations efficiently? Factor Graphs + Expectation Propagation! P ( p l a y e r i w n s j ; ) = © ³ ¡ 2 ¯ ¾ ´ 19/11/2018 Dumitru Erhan - Bayesian Ranking

Factor Graphs mini intro A bi-partite graph that represents the factorization of a mathematical function Nodes: = Factors = Variables Function = product of all factors Edges: Dependencies of factors on variables z x y 19/11/2018 Dumitru Erhan - Bayesian Ranking

Factor Graphs continued Used for modelling joint PDFs Interested in marginals of the type P(hidden | observed) Use the sum-product algorithm/belief propagation to compute them 19/11/2018 Dumitru Erhan - Bayesian Ranking

Sum-Product Algorithm I y f3(x,y) v w x f1(v,w) f2(w,x) z f4(x,z) Observation: Sum of products becomes product of sums of all messages from neighboring factors to variable! 19/11/2018 Dumitru Erhan - Bayesian Ranking

Sum-Product Algorithm II y f3(x,y) w x f2(w,x) z f4(x,z) Observation: Factors only need to sum out all their local variables! 19/11/2018 Dumitru Erhan - Bayesian Ranking

Sum-Product Algorithm III y f3(x,y) x f2(w,x) z f4(x,z) Observation: Variables pass on the product of all incoming messages! 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Belief Propagation Concept of a message from node X to node Y: X tells Y what state Y should be in First “propagate” observed data Then nodes exchange messages (start with leaves) Messages + priors + conditional probabilities  updates of beliefs Belief(x) = product of incoming messages Basically, unnormalized marginals Pass messages until convergence If graph is tree – guaranteed If not… 19/11/2018 Dumitru Erhan - Bayesian Ranking

Approximate message passing Problem: The exact messages from factors to variables may not be closed under products TrueSkillTM: Gaussian x Step-fun Gaussian Solution: Approximate the marginal as well as possible in the sense of minimal KL divergence Expectation Propagation: Approximate the marginal by so-called “moment-matching” 6 = 19/11/2018 Dumitru Erhan - Bayesian Ranking

Expectation Propagation Message Old marginal New marginal Exact = * Approx = * 19/11/2018 Dumitru Erhan - Bayesian Ranking

Tom Minka’s thesis in two lines Approximate By Iterate Pick a factor Remove its influence Project and refine p ( x ) = f 1 2 : n ^ p ( x ) = f 1 2 : n ^ p k + 1 ( x ) = a r g m i n q 2 f l y K L ³ j ´ ^ f k + 1 i ( x ) = p 19/11/2018 Dumitru Erhan - Bayesian Ranking

Formal Problem Setting k teams of n1,…,nk many players The outcome is a ranking among the teams (including draws) Questions: Skill si of each player such that the higher the skill the more likely the win Global ranking among all players. High quality of match among k teams. 19/11/2018 Dumitru Erhan - Bayesian Ranking

TrueSkillTM Factor Graph Player 1 wins over Player 2 + 3 draws with Player 4 s1 s2 s3 s4 Individual Skills t1 t2 t3 Team Performances Performances Differences d1 d2 19/11/2018 Dumitru Erhan - Bayesian Ranking

TrueSkillTM Model Details Priors: Hidden variables Performance : Team performance: Likelihood: Win: Draw: Skill evolution: P ( s i ) = N ; ¹ ¾ 2 P ( x i j s ) = N ; ¯ 2 t j = P i x P ( t e a m 1 w i n s j ; 2 ) = I ¡ > " P ( t e a m 1 n d 2 r w j ; ) = I ¡ · " P ( s t i j ¡ 1 ) = N ; ¹ ¾ + ° 2 19/11/2018 Dumitru Erhan - Bayesian Ranking

More details and assumptions Specifies an order on the real line OK if we agree that 1-d is good enough Draws + transitivity = not good Assume and A “mini-FG” is generated each time! EP updates can be done efficiently Moments of a truncated Gaussian Information flows forward only No updates in the light of future data j t 1 ¡ 2 · " j t 2 ¡ 3 · " 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking The Alternative – ELO Quite similar: Performances distributed around fixed skills Win probability: Skill updates: Linear update: Differences: No uncertainty tracking Linearized updates No notion of teams, multiple players/teams, etc. Not a generative model TrueSkillTM is a generalization of ELO P ( p l a y e r 1 w i n s ) = © ³ ¡ 2 ¯ ´ s 1 = + y ¢ ; 2 ¡ ¢ ¼ ® © ³ s 1 ¡ 2 p ¯ ´ 19/11/2018 Dumitru Erhan - Bayesian Ranking

Data: Halo 2 Multiplayer Beta Publicly available Real one is much larger Number of Games: 60022 Number of Players: 5943 Parameters in all experiments: Performance variation factor: 60% Draw Probability: 5% Dynamics variation factor: 2% 19/11/2018 Dumitru Erhan - Bayesian Ranking

Convergence properties 40 35 30 25 Level 20 15 Player 1 (TrueSkill) 10 Player 2 (TrueSkill) Player 1 (ELO) 5 Player 2 (ELO) 100 200 300 400 19/11/2018 Dumitru Erhan - Bayesian Ranking 25

Dumitru Erhan - Bayesian Ranking Win probability 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Other results TrueSkillTM better at predicting tight matches The “additive team performance” assumption does not hold in some cases (Capture-the-Flag) There are some feedback loop issues 19/11/2018 Dumitru Erhan - Bayesian Ranking

TrueSkillTM conclusions Every Xbox 360 Live game uses TrueSkillTM Service launched in November 2005. Distinguishing properties is a generalization of ELO tracks a belief distribution can deal with multiple teams/players/draws First “real-world” implementation of EP However: Draws are handled somewhat strangely (hack) Information “flows” only forward in time 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking What if… we created a schedule that passes messages “back in time”? Effectively, this means that future information is used for updating the current beliefs! However, the FG is not a tree now Loopy message passing schedule Too much data in case of Xbox Live Let’s do chess instead! Makes sense: the “game graph” is not very connected in time Hard to have a fair comparison between players 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Chess Factor Graph S1 S2 Performance noise P1 P2 D = P1 - P2 D > eps Morphy > Paulsen Morphy = Paulsen Morphy > Paulsen Games in 1857 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Chess dataset Characteristics: 88 players 15 664 games 300 games per player 60 000 hidden variables 150 000 edges Priors set to match ELO: Mean = 2704 Stddev = 100 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Results 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Chess results Inflation over time? Who’s the best player of all time? Kasparov? Fischer? Morphy? Data set limitations: No individual game results, only tournaments Runs up to 1991 88 best players only 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking Final words TrueSkillTM for Xbox Live: mature tech “TrueChess”: quite experimental Inference in loopy graphs is hard Other applications: Ranking Go moves (ICML ’06, Snowbird) Social matchmaking (Future Best NIPS paper ) Oral presentation @ NIPS this year 19/11/2018 Dumitru Erhan - Bayesian Ranking

Dumitru Erhan - Bayesian Ranking That’s it Thank you! 19/11/2018 Dumitru Erhan - Bayesian Ranking