Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006
Myself Master’s thesis on gesture recognition at the University of Padova Visiting student, ESSRL, Washington University in St. Louis Ph.D. thesis on the theory of evidence Young researcher in Milan with the Image and Sound Processing group Post-doc at UCLA in the Vision Lab
My research research Computer vision Discrete mathematics object and body tracking data association gesture and action recognition Discrete mathematics linear independence on lattices research Belief functions and imprecise probabilities geometric approach algebraic analysis combinatorial analysis
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Approach Problem: recognizing an example of a known category of gestures from image sequences Combination of HMMs (for dynamics) and size functions (for pose representation) Continuous hidden Markov models EM algorithm for parameter learning (Moore)
Example transition matrix A -> gesture dynamics state-output matrix C -> collection of hand poses The gesture is represented as a sequence of transitions between a small set of canonical poses
Size functions Hand poses are represented through their contours real image measuring function family of lines size function table
Gesture classification EM algorithm is used to learn HMM parameters from an input feature sequence the new sequence is fed to the learnt gesture models they produce a likelihood the most likely model is chosen (if above a threshold) HMM 1 HMM 2 … HMM n
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Composition of HMMs Compositional behavior of HMMS: the model of the action of interest is embedded in the overall model Clustering: states of the original model are grouped in clusters, and the transition matrix recomputed accordingly:
State clustering Effect of clustering on HMM topology “Cluttered” model for the two overlapping motions Reduced model for the “fly” gesture extracted through clustering
Kullback-Leibler comparison KL distances between “fly” (solid) and “fly from clutter” (dash) KL distances between “fly” and “cycle” We used the K-L distance to measure the similarity between models extracted from clutter and in absence of it
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Volumetric action recognition problem: recognizing the action performed by a person viewed by a number of cameras 2D approaches: features are extracted from single views -> viewpoint dependence volumetric approach: features are extracted from a volumetric reconstruction of the moving body
k-means clustering to separate bodyparts 3D feature extraction k-means clustering to separate bodyparts Linear discriminant analysis (LDA) to estimate the direction of motion as the direction of maximal separation between the legs Locally linear embedding to find topological representation of the moving body
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Uncertainty descriptions A number of formalisms have been proposed to extend or replace classical probability: e.g. possibilities, fuzzy sets, random sets, monotone capacities, gambles, upper and lower previsions theory of evidence (A. Dempster, G. Shafer) Probabilities are replaced by belief functions Bayes’ rule is replaced by Dempster’s rule families of domains for multiple representation of evidence
belief function s: 2Θ ->[0,1] Probability on a finite set: function p: 2Θ -> [0,1] with p(A)=x m(x), where m: Θ -> [0,1] is a mass function which meets the normalization constraint Probabilities are additive: if AB= then p(AB)=p(A)+p(B) A B2 B1 ..where m is a mass function on 2Θ s.t. Belief functions are not additive belief function s: 2Θ ->[0,1]
Dempster’s rule in the theory of evidence, new information encoded as a belief function is combined with old beliefs in a revision process belief functions are combined through Dempster’s rule Ai Bj AiÇBj=A intersection of focal elements
Example of combination s1: m({a1})=0.7, m({a1 ,a2})=0.3 a1 a2 a3 a4 s2: m()=0.1, m({a2 ,a3 ,a4})=0.9 s1 s2 : m({a1}) = 0.7*0.1/0.37 = 0.19 m({a2}) = 0.3*0.9/0.37 = 0.73 m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08
JPDA with shape info JPDA model: independent targets shape model: rigid links Dempster’s fusion robustness: clutter does not meet shape constraints occlusions: occluded targets can be estimated
Body tracking Application: tracking of feature points on a moving human body
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Pose estimation estimating the “pose” (internal configuration) of a moving body from the available images t=0 t=T salient image measurements: features CAMERA t=0 t=T
Model-based estimation if you have an a-priori model of the object .. .. you can exploit it to help (or drive) the estimation example: kinematic model
Model-free estimation if you do not have any information about the body.. the only way to do inference is to learn a map between features and poses directly from the data this can be done in a training stage
Collecting training data motion capture system 3D locations of markers = pose
Training data when the object performs some “significant” movements in front of the camera … … a finite collection of configuration values are provided by the motion capture system q q 1 T y y 1 T … while a sequence of features is computed from the image(s)
Learning feature-pose maps Hidden Markov models provide a way to build feature-pose maps from the training data a Gaussian density for each state is set up on the feature space -> approximate feature space map between each region and the set of training poses qk with feature value yk inside it
Evidential model approximate feature spaces .. .. and approximate parameter space .. .. form a family of compatible frames: the evidential model
Human body tracking two experiments, two views four markers on the right arm six markers on both legs two experiments, two views
Feature extraction 185 94 161 38 185 94 161 38 three steps: original image, color segmentation, bounding box
Performances comparison of three models: left view only, right view only, both views estimate associated with the “right” model “left” model ground truth pose estimation yielded by the overall model
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
GaitID The problem: recognizing the identity of humans from their gait Typical approaches: PCA on image features, HMMs People typically use silhouette data Issue: view-invariance Can be addressed via 3D representations 3D tracking: difficult and sensitive
Bilinear models From view-invariance to “style” invariance In a dataset of sequences, each motion possess several labels: action, identity, viewpoint, emotional state, etc. Bilinear models (Tenenbaum) can be used to separate the influence of two of those factors, called “style” and “content” (the label to classify) ySC is a training set of k-dimensional observations with labels S and C bC is a parameter vector representing content, while AS is a style-specific linear map mapping the content space onto the observation space
Content classification of unknown style Consider a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint) an asymmetric bilinear model can learned from it through the SVD of a stacked observation matrix when new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)… … an iterative EM procedure can be set up to classify the content (identity) E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for the unknown style s
Three-layer model Three layer model Feature representation: projection of the contour of the silhouette on a sheaf of lines passing through the center Three layer model each sequence is encoded as a Markov model, its C matrix is stacked in an observation vector, and a bilinear model is trained over those vectors
MOBO database Mobo database: 25 people performing 4 different walking actions, from 6 cameras Each sequence has three labels: action, id, view We set up four experiments in which one label was chosen as content, another one as style, and the remaining is considered as a nuisance factor Content = id, style = view -> view-invariant gaitID Content = id, style = action -> action-invariant gaitID Content = action, style = view -> view-invariant action recogntion Content = action, style = id -> style-invariant action recognition
Results Compared performances with “baseline” algorithm and straight k-NN on sequence HMMs
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Distances between dynamical models Problem: motion classification Approach: representing each movement as a linear dynamical model for instance, each image sequence can be mapped to an ARMA, or AR linear model Classification is then reduced to find a suitable distance function in the space of dynamical models We can use this distance in any of the popular classification schemes: k-NN, SVM, etc.
Riemannian metrics Some distances have been proposed: Martin’s distance, subspace angles, gap-metric, Fisher metric However, it makes no sense to choose a single distance for all possible classification problems When some a-priori info is available (training set).. .. we can learn in a supervised fashion the “best” metric for the classification problem! Feasible approach: volume minimization of pullback metrics
Learning pullback metrics many unsupervised algorithms take in input dataset and map it to an embedded space they fail to learn a full metric Consider than a family of diffeomorphisms Fl between the original space M and a metric space N The diffeomorphism F induces on M a pullback metric Fl D N M
Space of AR(2) models Given an input sequence, we can identify the parameters of the linear model which better describes it We chose the class of autoregressive models of order 2 AR(2) Fisher metric on AR(2) Compute the geodesics of the pullback metric on M
Results scalar feature, AR(2) and ARMA models NN algorithm to classify new sequences Identity recognition Action recognition
Results -2 Recognition performance of the second-best distance and the optimal pull-back metric The whole dataset is considered, regardless the view View 1 View 5
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Geometric approach to the ToE Belief functions can be seen as points of an Euclidean space of dimension 2n-2 Belief space: the space of all the belief functions on a given frame it has the shape of a simplex each subset A A-th coordinate s(A)
Geometry of Dempster’s rule conditional subspaces Dempster’s rule can be studied in the geometric setup too Geometric operator mapping pairs of points onto another point of the belief space
Probabilistic approximation Problem: given a belief function s, finding the “best” probabilistic approximation of s this can be solved in the geometric setup compositional criterion the approximation behaves like s when combined through Dempster’s rule comparative study of all the proposed probabilstic approximations
Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis
Lattice structure 1F maximal coarsening Q Å W Q W minimal refinement families of frames have the algebraic structure of a lattice order relation: existence of a refining F is a locally Birkhoff (semimodular with finite length) lattice bounded below
Total belief theorem generalization of the total probability theorem a-priori constraint conditional constraint whole graph of candidate solutions, connections with combinatorics and linear systems