Extensions to message-passing inference S. M. Ali Eslami September 2014.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Basic Steps 1.Compute the x and y image derivatives 2.Classify each derivative as being caused by either shading or a reflectance change 3.Set derivatives.
Introduction to Monte Carlo Markov chain (MCMC) methods
Bayesian Belief Propagation
Preliminary Results (Synthetic Data) We generate a random 4-ary MRF and we sample training and test data. We forget the structure and start learning with.
Dialogue Policy Optimisation
Face Alignment by Explicit Shape Regression
Exact Inference in Bayes Nets
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
AdaBoost & Its Applications
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Variational Inference and Variational Message Passing
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
1 © 1998 HRL Laboratories, LLC. All Rights Reserved Construction of Bayesian Networks for Diagnostics K. Wojtek Przytula: HRL Laboratories & Don Thompson:
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Third Generation Machine Intelligence Christopher M. Bishop Microsoft Research, Cambridge Microsoft Research Summer School 2009.
Active Appearance Models Suppose we have a statistical appearance model –Trained from sets of examples How do we use it to interpret new images? Use an.
Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Robert M. Saltzman © DS 851: 4 Main Components 1.Applications The more you see, the better 2.Probability & Statistics Computer does most of the work.
The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.
Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Probabilistic Mechanism Analysis. Outline Uncertainty in mechanisms Why consider uncertainty Basics of uncertainty Probabilistic mechanism analysis Examples.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Chapter 9 – Classification and Regression Trees
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Dynamic Tree Block Coordinate Ascent Daniel Tarlow 1, Dhruv Batra 2 Pushmeet Kohli 3, Vladimir Kolmogorov 4 1: University of Toronto3: Microsoft Research.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
David Stern, Thore Graepel, Ralf Herbrich Online Services and Advertising Group MSR Cambridge.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
CSC321 Introduction to Neural Networks and Machine Learning Lecture 3: Learning in multi-layer networks Geoffrey Hinton.
Wei Sun and KC Chang George Mason University March 2008 Convergence Study of Message Passing In Arbitrary Continuous Bayesian.
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Markov Random Fields & Conditional Random Fields
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Distributed cooperation and coordination using the Max-Sum algorithm
Global Illumination (3) Path Tracing. Overview Light Transport Notation Path Tracing Photon Mapping.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Markov Chain Monte Carlo in R
Usman Roshan Dept. of Computer Science NJIT
Stochastic tree search and stochastic games
Bayesian Ranking using Expectation Propagation and Factor Graphs
Neural Networks Geoff Hulten.
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Graduate School of Information Sciences, Tohoku University
Sequential Learning with Dependency Nets
Presentation transcript:

Extensions to message-passing inference S. M. Ali Eslami September 2014

Outline Just-in-time learning for message-passing with Daniel Tarlow, Pushmeet Kohli, John Winn Deep RL for ATARI games with Arthur Guez, Thore Graepel Contextual initialisation for message-passing with Varun Jampani, Daniel Tarlow, Pushmeet Kohli, John Winn Hierarchical RL for automated driving with Diana Borsa, Yoram Bachrach, Pushmeet Kohli and Thore Graepel Team modelling for learning of traits with Matej Balog, James Lucas, Daniel Tarlow, Pushmeet Kohli and Thore Graepel 2

Probabilistic programming Programmer specifies a generative model Compiler automatically creates code for inference in the model 3

Probabilistic graphics programming? 4

Challenges Specifying a generative model that is accurate and useful Compiling an inference algorithm for it that is efficient 5

Generative probabilistic models for vision 6 Manually designed inference FSA BMVC 2011 SBM CVPR 2012 MSBM NIPS 2013

Why is inference hard? Sampling Inference can mix slowly Active area of research Message-passing Computation of messages can be slow (e.g. if using quadrature or sampling) Just-in-time learning (part 1) Inference can require many iterations and may converge to bad fixed points Contextual initialisation (part 2) 7

Just-In-Time Learning for Inference with Daniel Tarlow, Pushmeet Kohli, John Winn 8 NIPS 2014

Motivating example Ecologists have strong empirical beliefs about the form of the relationship between temperature and yield. It is important for them that the relationship is modelled faithfully. We do not have a fast implementation of the Yield factor in Infer.NET. 9

Problem overview Implementing a fast and robust factor is not always trivial. Approach 1.Use general algorithms (e.g. Monte Carlo sampling or quadrature) to compute message integrals. 2.Gradually learn to increase the speed of computations by regressing from incoming to outgoing messages at run-time. 10

Message-passing 11 Incoming message group Outgoing message

Belief and expectation propagation 12

13

Learning to pass messages Heess, Tarlow and Winn (2013) 14

Learning to pass messages Before inference Create a dataset of plausible incoming message groups. Compute outgoing messages for each group using oracle. Employ regressor to learn the mapping. During inference Given a group of incoming messages: Use regressor to predict parameters of outgoing message. Heess, Tarlow and Winn (2013) 15

Logistic regression 16

Logistic regression 17 4 random UCI datasets

Learning to pass messages – an alternative approach Before inference Do nothing. During inference Given a group of incoming messages: If unsure: Consult oracle for answer and update regressor. Otherwise: Use regressor to predict parameters of outgoing message. 18 Just-in-time learning

Learning to pass messages Need an uncertainty aware regressor: Then: 19 Just-in-time learning

Random decision forests for JIT learning 20 Tree 1Tree 2Tree T

Random decision forests for JIT learning 21 Parameterisation

Random decision forests for JIT learning 22 Prediction model Tree 1Tree 2Tree T

Random decision forests for JIT learning Could take the element-wise average of the parameters and reverse to obtain outgoing message. Sensitive to chosen parameterisation. Instead, compute the moment average of the distributions. 23 Ensemble model

Random decision forests for JIT learning Use degree of agreement in predictions as a proxy for uncertainty. If all trees predict the same output, it means that their knowledge about the mapping is similar despite the randomness in their structure. Conversely, if there is large disagreement between the predictions, then the forest has high uncertainty. 24 Uncertainty model

Random decision forests for JIT learning 25 2 feature samples per node – maximum depth 4 – regressor degree 2 – 1,000 trees

Random decision forests for JIT learning Compute the moment average of the distributions. Use degree of agreement in predictions as a proxy for uncertainty: 26 Ensemble model

Random decision forests for JIT learning 27 Training objective function How good is a prediction? Consider effect on induced belief on target random variable: Focus on the quantity of interest: accuracy of posterior marginals. Train trees to partition training data in a way that the relationship between incoming and outgoing messages is well captured by regression, as measured by symmetrised marginal KL.

Results

Logistic regression 29

Uncertainty aware regression of a logistic factor 30 Are the forests accurate?

Uncertainty aware regression of a logistic factor 31 Are the forests uncertain when they should be?

Just-in-time learning of a logistic factor 32 Oracle consultation rate

Just-in-time learning of a logistic factor 33 Inference time

Just-in-time learning of a logistic factor 34 Inference error

Just-in-time learning of a compound gamma factor 35

A model of corn yield 36

USDA National Agricultural Statistics Service (2011 – 2013) 37 Inference works

Just-in-time learning of a yield factor 38

Summary Speed up message passing inference using JIT learning: Savings in human time (no need to implement factor operators). Savings in computer time (reduce the amount of computation). JIT can even accelerate hand-coded message operators. Open questions Better measure of uncertainty? Better methods for choosing u max ? 39

Contextual Initialisation Machines With Varun Jampani, Daniel Tarlow, Pushmeet Kohli, John Winn 40

Gauss and Ceres 41 A deceptively simple problem

A point model of circles 42

43

44

45

46

A point model of circles 47 Initialisation makes a big difference

What’s going on? 48 A common motif in vision models Global variables in each layer Multiple layers Many variables per layer

Possible solutions 49 Structured inference Messages easy to compute Fully-factorised representation Lots of loops No loops (within layers) Lots of loops (across layers) Messages difficult to compute No loops Messages difficult to compute Complex messages between layers

Contextual initialisation 50 Structured accuracy without structured cost Observations Beliefs about global variables are approximately predictable from layer below. Stronger beliefs about global variables leads to increased quality of messages to layer above. Strategy Learn to send global messages in first iteration. Keep using fully factorised model for layer messages.

A point model of circles 51

A point model of circles 52 Accelerated inference using contextual initialisation CentreRadius

A pixel model of squares 53

A pixel model of squares 54 Robustified inference using contextual initialisation

A pixel model of squares 55 Robustified inference using contextual initialisation

A pixel model of squares 56 Robustified inference using contextual initialisation Side lengthCenter

A pixel model of squares 57 Robustified inference using contextual initialisation FG ColorBG Color

A generative model of shading 58 With Varun Jampani Image X Reflectance R Shading S Normal N Light L

A generative model of shading 59 Inference progress with and without context

A generative model of shading 60 Fast and accurate inference using contextual initialisation

Summary Bridging the gap between Infer.NET and generative computer vision. Initialisation makes a big difference. The inference algorithm can learn to initialise itself. Open questions What is the best formulation of this approach? What are the trade-offs between inference and prediction? 61

Questions