Contextual models for object detection using boosted random fields by Antonio Torralba, Kevin P. Murphy and William T. Freeman
Quick Introduction What is this? Now can you tell?
Belief Propagation (BP) Network (Pairwise Markov Random Fields) observed nodes ( y i )
Belief Propagation (BP) Network (Pairwise Markov Random Fields) observed nodes ( y i ) hidden nodes ( x i )
Belief Propagation (BP) Network (Pairwise Markov Random Fields) observed nodes ( y i ) hidden nodes ( x i ) Statistical dependency, called local evidence: Shord-hand
Belief Propagation (BP) Statistical dependency: Local evidence Shord-hand Statistical dependency: Compatibility function
Belief Propagation (BP) Joint probability
Belief Propagation (BP) Joint probability x x1x1 x2x2 xixi …. x5x5 x3x3 x1x1 x4x4 xjxj x 12 y1y1 y2y2 yiyi
Belief Propagation (BP) Joint probability x x1x1 x2x2 xixi …. x5x5 x3x3 x1x1 x4x4 xjxj x 12 y1y1 y2y2 yiyi
Belief Propagation (BP) The belief b at a node i is represented by the local evidence of the node all the messages coming in from neighbors xixi xjxj ∏ NiNi yiyi
Belief Propagation (BP) The belief b at a node i is represented by the local evidence of the node all the messages coming in from neighbors xixi xjxj ∏ NiNi yiyi
Belief Propagation (BP) Messages m between hidden nodes How likely node j thinks it is that node i will be in the corresponding state. xixi xjxj m ji (x i )
Belief Propagation (BP) xixi xjxj xkxk xixi xjxj m ji (x i )
Conditional Random Field Distribution of the form:
Conditional Random Field Distribution of the form:
Boosted Random Field Basic Idea: Use BP to estimate P(x|y) Use boosting to maximize Log Likelihood of each node wrt to
Algorithm: BP Minimize negative log likelihood of training data ( y i ). Label Loss function to minimize:
Algorithm: BP Minimize negative log likelihood of training data ( y i ). Label Loss function to minimize:
Algorithm: BP Minimize negative log likelihood of training data ( y i ). Label Loss function to minimize:
Algorithm: BP xixi xjxj NiNi ∏ yiyi
xixi xjxj NiNi ∏ yiyi
xixi xjxj NiNi ∏
xixi xjxj
xixi F : a function of the input data yiyi
Algorithm: BP xixi xjxj with yiyi
Algorithm: BP xixi xjxj with yiyi
Function F xixi yiyi Boosting! f is the weak learner: weighted decision stumps.
Minimization of loss L
where
Local Evidence: algorithm For t=1..T Iterate N boost times find the best basis function h update local evidence with update the beliefs update the weights Iterate N BP times update messages update the beliefs xixi xjxj yiyi
Local Evidence: algorithm For t=1..T Iterate N boost times find the best basis function h update local evidence with update the beliefs update the weights Iterate N BP times update messages update the beliefs xixi xjxj yiyi
Local Evidence: algorithm For t=1..T Iterate N boost times find the best basis function h update local evidence with update the beliefs update the weights Iterate N BP times update messages update the beliefs xixi xjxj yiyi
Local Evidence: algorithm For t=1..T Iterate N boost times find the best basis function h update local evidence with update the beliefs update the weights Iterate N BP times update messages update the beliefs xixi xjxj yiyi
Local Evidence: algorithm For t=1..T Iterate N boost times find the best basis function h update local evidence with update the beliefs update the weights Iterate N BP times update messages update the beliefs xixi xjxj yiyi
Local Evidence: algorithm For t=1..T Iterate N boost times find the best basis function h update local evidence with update the beliefs update the weights Iterate N BP times update messages update the beliefs xixi xjxj yiyi
Function G By assuming that the graph is densely connected we can make the approximation: Now G is a non-linear additive function of the beliefs:
Function G Instead of learningthe function can be learnt with an additive model: weighted regression stumps
Function G The weak learner is chosen by minimizing the loss:
The Boosted Random Field Algorithm For t=1..T find the best basis function h for f find the best basis function for compute local evidence compute compatibilities update the beliefs update weights xixi xjxj yiyi
The Boosted Random Field Algorithm For t=1..T find the best basis function h for f find the best basis function for compute local evidence compute compatibilities update the beliefs update weights xixi b1b1 b2b2 bjbj …
Final classifier For t=1..T update local evidences F update compatibilities G compute current beliefs Output classification:
Multiclass Detection U: Dictionary of ~2000 images patches V: Same number of image masks
Multiclass Detection U: Dictionary of ~2000 images patches V: Same number of image masks At each round t, for each class c for each dictionary entry d there is a weak learner:
Function f To take into account different sizes, we first downsample the image and then upsample and OR the scales: which is our function for computing the local evidence.
Function g The compatibily function has a similar form:
Function g The compatibily function has a similar form: W represent a kernel with all the messages directed to node x, y, c
Kernels W Example of incoming messages:
Function G The overall incoming messages function is given by:
Learning… Labeled dataset of office and street scenes, with each ~100 images In the first 5 round updated only the local evidence After the 5th iteration update also the compatibility functions At each round update only F and G of the single object class that reduces the most the multiclass cost.
Learning… Biggest objects are detected first because they reduce the error of all classes the fastest:
The End
Introduction Observed: Picture Dictionary: Dog P(Dog|Pic)
Introduction P(Head|Pic i ) P(Tail|Pic i ) P(Front Legs|Pic i ) P(Back Legs|Pic i )
Introduction Comp(Head, Legs) Comp(Head, Tail) Comp(F. Legs, B. Legs) Comp(Tail, Legs) Dog!
Introduction P(Piraña|Pic i ) Comp(Piraña, Legs)
Graphical Models Observation nodes y i Y y i can be a pixel or a patch
Graphical Models Hidden Nodes Local Evidence: X Dictionary Shord-hand
Graphical Models Compatibility Function: X