Machine learning and imprecise probabilities for computer vision

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

The geometry of of relative plausibilities
01/18 Lab meeting Fabio Cuzzolin
Lectureship Early Career Fellowship School of Technology, Oxford Brookes University 19/6/2008 Fabio Cuzzolin INRIA Rhone-Alpes.
On the credal structure of consistent probabilities Department of Computing School of Technology, Oxford Brookes University 19/6/2008 Fabio Cuzzolin.
Learning Riemannian metrics for motion classification Fabio Cuzzolin INRIA Rhone-Alpes Computational Imaging Group, Pompeu Fabra University, Barcellona.
Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.
Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation.
1 Gesture recognition Using HMMs and size functions.
We consider situations in which the object is unknown the only way of doing pose estimation is then building a map between image measurements (features)
Bilinear models for action and identity recognition Oxford Brookes Vision Group 26/01/2009 Fabio Cuzzolin.
Simplicial complexes of finite fuzzy sets Fabio Cuzzolin Dipartimento di Elettronica e Informazione – Politecnico di Milano Image and Sound Processing.
Department of Engineering Math, University of Bristol A geometric approach to uncertainty Oxford Brookes Vision Group Oxford Brookes University 12/03/2009.
Machine Learning for Vision-Based Motion Analysis Learning pullback metrics for linear models Oxford Brookes Vision Group Oxford Brookes University 17/10/2008.
Evidential modeling for pose estimation Fabio Cuzzolin, Ruggero Frezza Computer Science Department UCLA.
On the properties of relative plausibilities Computer Science Department UCLA Fabio Cuzzolin SMC05, Hawaii, October
Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.
FEATURE PERFORMANCE COMPARISON FEATURE PERFORMANCE COMPARISON y SC is a training set of k-dimensional observations with labels S and C b C is a parameter.
Robust spectral 3D-bodypart segmentation along time Fabio Cuzzolin, Diana Mateus, Edmond Boyer, Radu Horaud Perception project meeting 24/4/2007 Submitted.
Bilinear models and Riemannian metrics for motion classification Fabio Cuzzolin Microsoft Research, Cambridge, UK 11/7/2006.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Unsupervised Learning
Modeling the Shape of People from 3D Range Scans
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Chapter 4: Linear Models for Classification
Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.
Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.
Lecture 5: Learning models using EM
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Computer vision: models, learning and inference
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)
Non-Euclidean Example: The Unit Sphere. Differential Geometry Formal mathematical theory Work with small ‘patches’ –the ‘patches’ look Euclidean Do calculus.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Introduction to Pattern Recognition (การรู้จํารูปแบบเบื้องต้น)
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group.
Intrinsic Data Geometry from a Training Set
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
René Vidal and Xiaodong Fan Center for Imaging Science
Machine Learning Basics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Multilinear Analysis
Hidden Markov Models Part 2: Algorithms
LECTURE 15: REESTIMATION, EM AND MIXTURES
Introduction to Object Tracking
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Machine learning and imprecise probabilities for computer vision Fabio Cuzzolin IDIAP, Martigny, 19/4/2006

Myself Master’s thesis on gesture recognition at the University of Padova Visiting student, ESSRL, Washington University in St. Louis Ph.D. thesis on the theory of evidence Young researcher in Milan with the Image and Sound Processing group Post-doc at UCLA in the Vision Lab

My research research Computer vision Discrete mathematics object and body tracking data association gesture and action recognition Discrete mathematics linear independence on lattices research Belief functions and imprecise probabilities geometric approach algebraic analysis combinatorial analysis

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Approach Problem: recognizing an example of a known category of gestures from image sequences Combination of HMMs (for dynamics) and size functions (for pose representation) Continuous hidden Markov models EM algorithm for parameter learning (Moore)

Example transition matrix A -> gesture dynamics state-output matrix C -> collection of hand poses The gesture is represented as a sequence of transitions between a small set of canonical poses

Size functions Hand poses are represented through their contours real image measuring function family of lines size function table

Gesture classification EM algorithm is used to learn HMM parameters from an input feature sequence the new sequence is fed to the learnt gesture models they produce a likelihood the most likely model is chosen (if above a threshold) HMM 1 HMM 2 … HMM n

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Composition of HMMs Compositional behavior of HMMS: the model of the action of interest is embedded in the overall model Clustering: states of the original model are grouped in clusters, and the transition matrix recomputed accordingly:

State clustering Effect of clustering on HMM topology “Cluttered” model for the two overlapping motions Reduced model for the “fly” gesture extracted through clustering

Kullback-Leibler comparison KL distances between “fly” (solid) and “fly from clutter” (dash) KL distances between “fly” and “cycle” We used the K-L distance to measure the similarity between models extracted from clutter and in absence of it

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Volumetric action recognition problem: recognizing the action performed by a person viewed by a number of cameras 2D approaches: features are extracted from single views -> viewpoint dependence volumetric approach: features are extracted from a volumetric reconstruction of the moving body

k-means clustering to separate bodyparts 3D feature extraction k-means clustering to separate bodyparts Linear discriminant analysis (LDA) to estimate the direction of motion as the direction of maximal separation between the legs Locally linear embedding to find topological representation of the moving body

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Uncertainty descriptions A number of formalisms have been proposed to extend or replace classical probability: e.g. possibilities, fuzzy sets, random sets, monotone capacities, gambles, upper and lower previsions theory of evidence (A. Dempster, G. Shafer) Probabilities are replaced by belief functions Bayes’ rule is replaced by Dempster’s rule families of domains for multiple representation of evidence

belief function s: 2Θ ->[0,1] Probability on a finite set: function p: 2Θ -> [0,1] with p(A)=x m(x), where m: Θ -> [0,1] is a mass function which meets the normalization constraint Probabilities are additive: if AB= then p(AB)=p(A)+p(B) A B2 B1 ..where m is a mass function on 2Θ s.t. Belief functions are not additive belief function s: 2Θ ->[0,1]

Dempster’s rule in the theory of evidence, new information encoded as a belief function is combined with old beliefs in a revision process belief functions are combined through Dempster’s rule Ai Bj AiÇBj=A intersection of focal elements

Example of combination s1: m({a1})=0.7, m({a1 ,a2})=0.3 a1 a2 a3 a4 s2: m()=0.1, m({a2 ,a3 ,a4})=0.9 s1  s2 : m({a1}) = 0.7*0.1/0.37 = 0.19 m({a2}) = 0.3*0.9/0.37 = 0.73 m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08

JPDA with shape info JPDA model: independent targets shape model: rigid links Dempster’s fusion robustness: clutter does not meet shape constraints occlusions: occluded targets can be estimated

Body tracking Application: tracking of feature points on a moving human body

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Pose estimation estimating the “pose” (internal configuration) of a moving body from the available images t=0 t=T salient image measurements: features CAMERA t=0 t=T

Model-based estimation if you have an a-priori model of the object .. .. you can exploit it to help (or drive) the estimation example: kinematic model

Model-free estimation if you do not have any information about the body.. the only way to do inference is to learn a map between features and poses directly from the data this can be done in a training stage

Collecting training data motion capture system 3D locations of markers = pose

Training data when the object performs some “significant” movements in front of the camera … … a finite collection of configuration values are provided by the motion capture system q q 1 T y y 1 T … while a sequence of features is computed from the image(s)

Learning feature-pose maps Hidden Markov models provide a way to build feature-pose maps from the training data a Gaussian density for each state is set up on the feature space -> approximate feature space map between each region and the set of training poses qk with feature value yk inside it

Evidential model approximate feature spaces .. .. and approximate parameter space .. .. form a family of compatible frames: the evidential model

Human body tracking two experiments, two views four markers on the right arm six markers on both legs two experiments, two views

Feature extraction 185 94 161 38 185 94 161 38 three steps: original image, color segmentation, bounding box

Performances comparison of three models: left view only, right view only, both views estimate associated with the “right” model “left” model ground truth pose estimation yielded by the overall model

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

GaitID The problem: recognizing the identity of humans from their gait Typical approaches: PCA on image features, HMMs People typically use silhouette data Issue: view-invariance Can be addressed via 3D representations 3D tracking: difficult and sensitive

Bilinear models From view-invariance to “style” invariance In a dataset of sequences, each motion possess several labels: action, identity, viewpoint, emotional state, etc. Bilinear models (Tenenbaum) can be used to separate the influence of two of those factors, called “style” and “content” (the label to classify) ySC is a training set of k-dimensional observations with labels S and C bC is a parameter vector representing content, while AS is a style-specific linear map mapping the content space onto the observation space

Content classification of unknown style Consider a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint) an asymmetric bilinear model can learned from it through the SVD of a stacked observation matrix when new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)… … an iterative EM procedure can be set up to classify the content (identity) E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for the unknown style s

Three-layer model Three layer model Feature representation: projection of the contour of the silhouette on a sheaf of lines passing through the center Three layer model each sequence is encoded as a Markov model, its C matrix is stacked in an observation vector, and a bilinear model is trained over those vectors

MOBO database Mobo database: 25 people performing 4 different walking actions, from 6 cameras Each sequence has three labels: action, id, view We set up four experiments in which one label was chosen as content, another one as style, and the remaining is considered as a nuisance factor Content = id, style = view -> view-invariant gaitID Content = id, style = action -> action-invariant gaitID Content = action, style = view -> view-invariant action recogntion Content = action, style = id -> style-invariant action recognition

Results Compared performances with “baseline” algorithm and straight k-NN on sequence HMMs

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Distances between dynamical models Problem: motion classification Approach: representing each movement as a linear dynamical model for instance, each image sequence can be mapped to an ARMA, or AR linear model Classification is then reduced to find a suitable distance function in the space of dynamical models We can use this distance in any of the popular classification schemes: k-NN, SVM, etc.

Riemannian metrics Some distances have been proposed: Martin’s distance, subspace angles, gap-metric, Fisher metric However, it makes no sense to choose a single distance for all possible classification problems When some a-priori info is available (training set).. .. we can learn in a supervised fashion the “best” metric for the classification problem! Feasible approach: volume minimization of pullback metrics

Learning pullback metrics many unsupervised algorithms take in input dataset and map it to an embedded space they fail to learn a full metric Consider than a family of diffeomorphisms Fl between the original space M and a metric space N The diffeomorphism F induces on M a pullback metric Fl D N M

Space of AR(2) models Given an input sequence, we can identify the parameters of the linear model which better describes it We chose the class of autoregressive models of order 2 AR(2) Fisher metric on AR(2) Compute the geodesics of the pullback metric on M

Results scalar feature, AR(2) and ARMA models NN algorithm to classify new sequences Identity recognition Action recognition

Results -2 Recognition performance of the second-best distance and the optimal pull-back metric The whole dataset is considered, regardless the view View 1 View 5

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Geometric approach to the ToE Belief functions can be seen as points of an Euclidean space of dimension 2n-2 Belief space: the space of all the belief functions on a given frame it has the shape of a simplex each subset A  A-th coordinate s(A)

Geometry of Dempster’s rule conditional subspaces Dempster’s rule can be studied in the geometric setup too Geometric operator mapping pairs of points onto another point of the belief space

Probabilistic approximation Problem: given a belief function s, finding the “best” probabilistic approximation of s this can be solved in the geometric setup compositional criterion the approximation behaves like s when combined through Dempster’s rule comparative study of all the proposed probabilstic approximations

Imprecise probabilities Computer Vision HMMs and size functions for gesture recognition Compositional behavior of hidden Markov models Volumetric action recognition Data association with shape information Evidential models for pose estimation Bilinear models for view-invariant gaitID Riemannian metrics for motion classification Imprecise probabilities Geometric approach Algebraic analysis

Lattice structure 1F maximal coarsening Q Å W Q W minimal refinement families of frames have the algebraic structure of a lattice order relation: existence of a refining F is a locally Birkhoff (semimodular with finite length) lattice bounded below

Total belief theorem generalization of the total probability theorem a-priori constraint conditional constraint whole graph of candidate solutions, connections with combinatorics and linear systems