Latent (S)SVM and Cognitive Multiple People Tracker.

Slides:



Advertisements
Similar presentations
Active Appearance Models
Advertisements

Lecture 9 Support Vector Machines
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Support Vector Machines
K Means Clustering , Nearest Cluster and Gaussian Mixture
Infinite Horizon Problems
Visual Recognition Tutorial
EE-148 Expectation Maximization Markus Weber 5/11/99.
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
Support Vector Machines and Kernel Methods
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Expectation Maximization Algorithm
Support Vector Machines
Maximum Likelihood (ML), Expectation Maximization (EM)
Algorithmic Problems in Algebraic Structures Undecidability Paul Bell Supervisor: Dr. Igor Potapov Department of Computer Science
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Introduction to Machine Learning Approach Lecture 5.
Radial Basis Function Networks
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Gaussian Mixture Model and the EM algorithm in Speech Recognition
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Algorithms and Algorithm Analysis The “fun” stuff.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
#MOTION ESTIMATION AND OCCLUSION DETECTION #BLURRED VIDEO WITH LAYERS
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Number Theory Project The Interpretation of the definition Andre (JianYou) Wang Joint with JingYi Xue.
Handover and Tracking in a Camera Network Presented by Dima Gershovich.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Robust Object Tracking by Hierarchical Association of Detection Responses Present by fakewen.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
HMM - Part 2 The EM algorithm Continuous density HMM.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Lecture 2: Statistical learning primer for biologists
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Classification of unlabeled data:
Non-Parametric Models
Clustering (3) Center-based algorithms Fuzzy k-means
Latent Variables, Mixture Models and EM
Hidden Markov Models Part 2: Algorithms
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Chapter 11 Limitations of Algorithm Power
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Exploratory Factor Analysis. Factor Analysis: The Measurement Model D1D1 D8D8 D7D7 D6D6 D5D5 D4D4 D3D3 D2D2 F1F1 F2F2.
EM Algorithm and its Applications
Presentation transcript:

Latent (S)SVM and Cognitive Multiple People Tracker

EM: the “latent” you know EM is an optimization algorithm to fit a mixture of Gaussian on a set of data points. When the algorithm starts, there is no clue about which points belong to which Gaussian But the parameters of a Gaussian can be learned only by disposing of a subset of points defining it LOOP –gently guess how much each point contributes to each Gaussian –use maximum likelihood to re-estimate the optimal parameter set (slightly more complex but that’s the idea)

Guess what? True membership is latent! Can you give a definition of latent variable? What can we learn from EM? It cannot be observed during training The iterative one is still the best approach to solve latent problems

Mathematical framework We still want to learn some prediction function but together with the solution we now also have to infer the latent variable which best explains the i/o pair. Forgive us for the strong change in notation! First line argument of the argmax is the score function and second line multiplier of the parameter is the feature map. We formulate the problem as a regularized minimization of the empirical loss: Of course the structured hinge loss will be different. Why do you think so?

Mathematical framework The loss is also going to incorporate the latent variables, as we jointly care to learn how to predict solutions and latent variables. So that the structured hinge loss reduces to Can you find the contradiction? We said latent variables are not observed during training!

Latent completion Latent completion is the crucial step designed to infer, given an input/output pair, the best latent variable which explains it! Note it’s different from the process we want to achieve in the prediction step where we only dispose of the input there and we want to jointly estimate the output and the latent variable. In the EM example, if we have a set of points and a mixture of gaussians fitted on those points, latent completion would be a function capable of assigning a responsibility score to each Gaussian for the existence of each point.

Summing up, we need: A new feature map able to consider latent variables too A new loss function able to account for differences in the latent explanation as well A new oracle call able to solve the new version of the structured hinge loss A latent completion procedure able to provide a latent explanation given an input and its associated output Don’t be worried if this is a bit too much – it may take some time to gain confidence with this stuff. Revisited version of the required functions of the SSVM

Remember the association problem Where the similarity function was parameterized and the is a parameter governing the reward from perceiving a different number of object in the scene w.r.t. the previous frame. The problem can be solved in O(n 3 ) with the Hungarian method which also helps us define the feature map, besides the hamming loss employed is linear and thus the max oracle can be easily solved (again with the hungarian). Ideally we could extend the similarity matrix to employ also more complex features… can you see the problem?

Object File Theory One of the first and most influential approaches to the problem of object correspondence is known as Object-File theory. According to this theory, when an object is firstly perceived in the scene, a position marker, or spatial index, is assigned to the location occupied by that object. From then on, whenever an object is found nearby that particular location, both spatial and perceptual properties of the object are activated and become bound to the spatial index. The index become thus a pointer to the object higher level features. The central role of spatial information in the object file theory has long been known as spatiotemporal dominance and can be synthesized in the following two corollaries: object correspondence is computed on the basis of spatiotemporal continuity, and object correspondence computation does not consult non-spatial properties of the object. The direct consequence of those claims is that a currently viewed object is treated as corresponding to a previously viewed object if the object's position over time is consistent with the interpretation of a continuous, persisting entity. A more subtle intuition is that if spatiotemporal information is consistent with the interpretation of a continuous object, object correspondence will be established even if surface feature and identity information are inconsistent with the interpretation of correspondence. Example: Superman (1941) - "Up in the sky, look: It's a bird. It's a plane. It's Superman!"

And computationally? Cognitive Visual Tracking Based on 3 decades of empirical results Our brain finds distance is the only reliable feature Motion prediction and appearance is a plus when useful How can we exploit humans’ way of coping with multiple target tracking? (we are so good at it!) 1.Split the crowd in influence zones (latent knowledge) 2.Decide whether those zones are ambiguous (also latent) 3.Solve unambiguous associations with distance only 4.Employ higher level features in ambiguous cases no one will ever say that disposing of color or motion is bad. The problem is teaching the classifier when he can trust these features! CAN WE LEARN 1-4 IN A UNIFIED FRAMEWORK?

Influence zones inference Background They model human’s visual attention beams Help in reducing the complexity of the task as targets appearing in different influence zones do not need to be tested for association We use them to localize where distance alone isn’t enough How do we compute these influence zones? Again, it’s based on the Hungarian algorithm evaluating spatial information only, followed by a iterative clustering procedure. Start with Munkres solution: - if it is given then we are doing latent completion - if it is predicted we are predicting influence zones

Influence zones inference The procedure is similar to the correlation clustering but we extended it to work with asymmetric matrices as well (H is C).

Theory in practice all the latent stuffocclusion handling OF are updated here find correspondence, review and impletion original meanings in the supplementary material

… and back to latent SSVM As always we need to define: a feature map! (always start from the prediction function if you can) a loss function (super easy) a max oracle (try to reduce it to a modified prediction step) AND a latent completion step (already done!) Instead of starting with Munkres solution, initialize the algorithm with

Feature Map

Loss function and Max Oracle

What about FW?