Latent (S)SVM and Cognitive Multiple People Tracker
EM: the “latent” you know EM is an optimization algorithm to fit a mixture of Gaussian on a set of data points. When the algorithm starts, there is no clue about which points belong to which Gaussian But the parameters of a Gaussian can be learned only by disposing of a subset of points defining it LOOP –gently guess how much each point contributes to each Gaussian –use maximum likelihood to re-estimate the optimal parameter set (slightly more complex but that’s the idea)
Guess what? True membership is latent! Can you give a definition of latent variable? What can we learn from EM? It cannot be observed during training The iterative one is still the best approach to solve latent problems
Mathematical framework We still want to learn some prediction function but together with the solution we now also have to infer the latent variable which best explains the i/o pair. Forgive us for the strong change in notation! First line argument of the argmax is the score function and second line multiplier of the parameter is the feature map. We formulate the problem as a regularized minimization of the empirical loss: Of course the structured hinge loss will be different. Why do you think so?
Mathematical framework The loss is also going to incorporate the latent variables, as we jointly care to learn how to predict solutions and latent variables. So that the structured hinge loss reduces to Can you find the contradiction? We said latent variables are not observed during training!
Latent completion Latent completion is the crucial step designed to infer, given an input/output pair, the best latent variable which explains it! Note it’s different from the process we want to achieve in the prediction step where we only dispose of the input there and we want to jointly estimate the output and the latent variable. In the EM example, if we have a set of points and a mixture of gaussians fitted on those points, latent completion would be a function capable of assigning a responsibility score to each Gaussian for the existence of each point.
Summing up, we need: A new feature map able to consider latent variables too A new loss function able to account for differences in the latent explanation as well A new oracle call able to solve the new version of the structured hinge loss A latent completion procedure able to provide a latent explanation given an input and its associated output Don’t be worried if this is a bit too much – it may take some time to gain confidence with this stuff. Revisited version of the required functions of the SSVM
Remember the association problem Where the similarity function was parameterized and the is a parameter governing the reward from perceiving a different number of object in the scene w.r.t. the previous frame. The problem can be solved in O(n 3 ) with the Hungarian method which also helps us define the feature map, besides the hamming loss employed is linear and thus the max oracle can be easily solved (again with the hungarian). Ideally we could extend the similarity matrix to employ also more complex features… can you see the problem?
Object File Theory One of the first and most influential approaches to the problem of object correspondence is known as Object-File theory. According to this theory, when an object is firstly perceived in the scene, a position marker, or spatial index, is assigned to the location occupied by that object. From then on, whenever an object is found nearby that particular location, both spatial and perceptual properties of the object are activated and become bound to the spatial index. The index become thus a pointer to the object higher level features. The central role of spatial information in the object file theory has long been known as spatiotemporal dominance and can be synthesized in the following two corollaries: object correspondence is computed on the basis of spatiotemporal continuity, and object correspondence computation does not consult non-spatial properties of the object. The direct consequence of those claims is that a currently viewed object is treated as corresponding to a previously viewed object if the object's position over time is consistent with the interpretation of a continuous, persisting entity. A more subtle intuition is that if spatiotemporal information is consistent with the interpretation of a continuous object, object correspondence will be established even if surface feature and identity information are inconsistent with the interpretation of correspondence. Example: Superman (1941) - "Up in the sky, look: It's a bird. It's a plane. It's Superman!"
And computationally? Cognitive Visual Tracking Based on 3 decades of empirical results Our brain finds distance is the only reliable feature Motion prediction and appearance is a plus when useful How can we exploit humans’ way of coping with multiple target tracking? (we are so good at it!) 1.Split the crowd in influence zones (latent knowledge) 2.Decide whether those zones are ambiguous (also latent) 3.Solve unambiguous associations with distance only 4.Employ higher level features in ambiguous cases no one will ever say that disposing of color or motion is bad. The problem is teaching the classifier when he can trust these features! CAN WE LEARN 1-4 IN A UNIFIED FRAMEWORK?
Influence zones inference Background They model human’s visual attention beams Help in reducing the complexity of the task as targets appearing in different influence zones do not need to be tested for association We use them to localize where distance alone isn’t enough How do we compute these influence zones? Again, it’s based on the Hungarian algorithm evaluating spatial information only, followed by a iterative clustering procedure. Start with Munkres solution: - if it is given then we are doing latent completion - if it is predicted we are predicting influence zones
Influence zones inference The procedure is similar to the correlation clustering but we extended it to work with asymmetric matrices as well (H is C).
Theory in practice all the latent stuffocclusion handling OF are updated here find correspondence, review and impletion original meanings in the supplementary material
… and back to latent SSVM As always we need to define: a feature map! (always start from the prediction function if you can) a loss function (super easy) a max oracle (try to reduce it to a modified prediction step) AND a latent completion step (already done!) Instead of starting with Munkres solution, initialize the algorithm with
Feature Map
Loss function and Max Oracle
What about FW?