Presented by Pat Chan Pik Wah 28/04/2005 Qualifying Examination An Object Tacking Paradigm with Active Appearance Models for Augmented Reality Presented by Pat Chan Pik Wah 28/04/2005 Qualifying Examination
Outline Research Objective Introduction Augmented Reality Object Tracking Active Appearance Models (AAMs) Proposed Object Tracking Paradigm Paradigm Architecture Experiments Research Issues Conclusion
Research Objective Object tracking is an essential component for Augmented Reality. There is a lack of good object tracking paradigm. Active Appearance Models is promising. Propose a new object tracking paradigm with AAMs in order to provide a real-time and accurate registration for Augmented Reality. Nature of the paradigm: Effective Accurate Robust Which is robust, accurate and perform in real time. AAMs is a powerful tools for modeling the images
Augmented Reality An Augmented Reality system supplements the real world with virtual objects that appear to coexist in the same space as the real world Properties : Combine real and virtual objects in a real environment Runs interactively, and in real time Registers (aligns) real and virtual objects with each other
Augmented Reality Projects related to AR Augmented Reality for Visualization – The collaborative design system developed at ECRC (European Computer-Industry Research Center) interactive graphics and real-time video for the purpose of interior design. The system combines the use of a heterogeneous database system of graphical models, an augmented reality system, and the distribution of 3D graphics events over a computer network Augmented Reality for Maintenance/Repair Instructions – overlays a graphical representation of portions of the building's structural systems over a user's view of the room in which they are standing. A see-through head-mounted display provides the user with monocular augmented graphics and tracks the position and orientation of their head with an ultrasonic tracking system. Augmented Reality for Outdoor Applications – acts as a campus information system, assisting a user in finding places and allowing to query information about items of interest, like buildings, statues, etc. The user carries a backpack computer with a wireless network and wears a head-mounted display. The position of the user is tracked by differential GPS while orientation data is provided by the head-mounted display itself.
Augmented Reality Display Tracking 3D Modeling Registration Presenting virtual objects on real environment Tracking Following user’s and virtual object’s movements by means of a special device or techniques 3D Modeling Forming virtual object Registration Blending real and virtual objects There are different components for augmented reality
Object Tracking Visual content can be modeled as a hierarchy of abstractions. At the first level are the raw pixels with color or brightness information. Further processing yields features such as edges, corners, lines, curves, and color regions. A higher abstraction layer may combine and interpret these features as objects and their attributes. Object edges, corners, lines, curves, and color regions Object Tracking in image processing is usually based on reference image of the object, or properties of the objects. Pixels
Object Tracking Accurately tracking the user’s position is crucial for AR registration The objective is to obtain an accurate estimate of the position (x,y) of the object tracked Tracking = correspondence + constraints + estimation Based on reference image of the object, or properties of the objects. Two main stages for tracking object in video: Isolation of objects from background in each frames Association of objects in successive frames in order to trace them For prepared indoor environments, systems employ hybrid-tracking technique such as magnetic and video sensors to exploit strengths and compensate weaknesses of individual tracking technologies. In outdoor and mobile AR application, it generally isn’t practical to cover the environment with markers. Network-based tracking method for Indoor/outdoor
Object Tracking Object Tracking can be briefly divides into following stages: Input (object and camera) Detecting the Objects Motion Estimation Corrective Feedback Occlusion Detection
Object Tracking Expectation Maximization Kalman Filter Condensation Find the local maximum likelihood solution Some variables are hidden or incomplete Kalman Filter Optimal linear predict the state of a model Condensation Combines factored sampling with learned dynamical models propagate an entire probability of object position and shape In order to have more robust tracking, we discuss briefly three probability visual tracking algorithms – EM, Kalman Filter and Condensation. Expectation Maximization (EM) algorithm is used to estimate the probability density of a set of given data. In order to model the probability density of the data, Gaussian Mixture Model is used. The probability density of the data modeled as the weighted sum of a number of gaussian distributions. Expectation Maximization 1. Set i to 0 and choose theta_i arbitrarily. 2. Compute Q(theta | theta_i) 3. Choose theta_i+1 to maximize Q(theta | theta_i) 4. If theta_i != theta_i+1, then set i to i+1 and return to Step 2.
Object Tracking Pervious Work : Marker-based Tracking Feature-based Tracking Template-based object tracking Correlation-based tracking Change-based tracking 2D layer tracking tracking of articulated objects
Pervious Work Marker-based Tracking Marker-less based Tracking Feature-based Tracking Shape-based approaches Color-based approaches Marker-based – Markers are placed in the real environment and easy for detecting the object. Feature-based Tracking – standardization of image features and registration (alignment) of reference points are important. The images may need to be transformed to another space for handling changes in illumination, size and orientation. Shape based segmenting objects of interest in the images. Different object types such as persons, flowers, and airplanes may require different algorithms. Color-based it is easy to be acquired. Although color is not always appropriate as the sole means of detecting and tracking objects, but the low computational cost
Pervious Work Template-based object tracking Fixed template matching Image subtraction Correlation Deformable template matching Template-based object tracking is used when a template describing a specific object is available. Temporal template - a vector-valued image where each of a pixel’s components is a function of the motion at that pixel. Made of two components: Motion Energy Images (MEIs): where motion occurs Motion History Images (MHIs): time period of motion Pixel intensity corresponds to its sequential motion history. The brighter a pixel, the more recently it was moved. Fixed -- Fixed templates are useful when object shapes do not change subtraction -- the template position is determined from minimizing the distance function between the template and various positions in the image. Less computational cost Correlation -- the position of the normalized cross-correlation peak between a template and an image to locate the best match is used. This technique is insensitive to noise and illumination effects in the images, but suffers from high computational complexity caused by summations over the entire template. more suitable for cases where objects vary due to rigid and non-rigid deformations. Because of the deformable nature of objects in most video, deformable models are more appealing in tracking tasks. A probabilistic transformation on the prototype contour is applied to deform the template to fit salient edges in the input image
Pervious Work Object tracking using motion information Motion-based approaches Model-based approaches Boundary-based approaches Snakes Geodesic active contour models Region-based approaches Motion-based approaches rely on robust methods for grouping visual motion consistencies over time. These methods are relatively fast but have considerable difficulties in dealing with non-rigid movements and objects. Model-based approaches also explore the usage of high-level semantics and knowledge of the objects. Boundary-based rely on the information provided by the object boundaries. It has been widely adopted in object tracking because the boundary-based features (edges) provide reliable information which does not depend on the motion type, or object shape. Usually, the boundary-based tracking algorithms employ active contour models Region-based rely on information provided by the entire region such as texture and motion-based properties using a motion estimation/segmentation technique. In this case, the estimation of the target's velocity is based on the correspondence between the associated target regions at different time instants.
Active Appearance Models The Active Appearance Model (AAM) algorithm is a powerful tool for modeling images of deformable objects. AAM combines a subspace-based deformable model of an object’s appearance Fit the model to a previously unseen image.
Timeline for development of AAMs and ASMs
Active Appearance Models (AAMs) 2D linear shape is defined by 2D triangulated mesh and in particular the vertex locations of the mesh. Shape s can be expressed as a base shape s0. pi are the shape parameter. s0 is the mean shape and the matrices si are the eigenvectors corresponding to the m largest eigenvalues 68 vertices
Active Appearance Models (AAMs) The appearance of an independent AAM is defined within the base mesh s0. A(u) defined over the pixels u ∈ s0 A(u) can be expressed as a base appearance A0(u) plus a linear combination of l appearance Coefficients λi are the appearance parameters. A0(u) A1(u) A2(u) A3(u)
Active Appearance Models (AAMs) The AAM model instance with shape parameters p and appearance parameters λ is then created by warping the appearance A from the base mesh s0 to the model shape s. Piecewise affine warp W(u; p): (1) for any pixel u in s0 find out which triangle it lies in, (2) warp u with the affine warp for that triangle. M(W(u;p))
Fitting AAMs Minimize the error between I (u) and M(W(u; p)) = A(u). If u is a pixel in s0, then the corresponding pixel in the input image I is W(u; p). At pixel u the AAM has the appearance At pixel W(u; p), the input image has the intensity I (W(u; p)). Minimize the sum of squares of the difference between these two quantities: 1. For each pixel x in the base mesh s0, we compute the corresponding pixel W(x; p) in the input image by warping x with the piecewise affine warp W. 2. The input image I is then sampled at the pixel W(x; p); typically it is bi-linearly interpolated at this pixel. 3. The resulting value is then subtracted from the appearance at that pixel and the result stored in E u
DEMO Video – 2D AAMs
DEMO Video – 2D AAMs
Recent Work for Improving AAMs Combine 2D+3D AAMs
Combined 2D + 3D AAMs At time t, we have 2D AAM shape vector in all N images into a matrix: Represent as a 3D linear shape modes W = MB = P[s0 + sum(pi*si)] = Ps0 + Pp1s1 + Pp2s2 + ……+ Ppmsm M = 2(N+1)*3(m^+1) = scaled projected matrix B = 3(m^+1)*n = shape matrix (setting the number of 3D vertices n^ = the number of AAM n) W = 3(m^+1) M = M^ . G B = G-1 . B G = 3(m^+1)* 3(m^+1) = corrective matrix
Compute the 3D Model AAM shapes AAM appearance First three 3D shapes modes
Constraining an AAM with 3D Shape Constraints on the 2D AAM shape parameters p = (p1, … , pm) that force the AAM to only move in a way that is consistent with the 3D shape modes: and the 2D shape variation of the 3D shape modes over all imaging condition is: Legitimate values of P and p such that the 2D projected 3D shape equals the 2D shape of AAM. The constraint is written as: The sum of the squares of the elements of the matrix. Constrains on p, q
An Object Tacking Paradigm with Active Appearance Models
Proposed Object Tracking Paradigm Training Images Paradigm Architecture Occlusion Detection Training Active Appearance Model Video Shape Model Appearance Model Motion Modeling Initialization Kalman Filter
Steps in Object Tracking Paradigm Preporcessing Training the Active Appearance Model. Get the shape model and the appearance model for the object to be tracked. Initialization Locating the object position in the video. In our scheme, we make use of AAMs. Motion Modeling Estimate the motion of the object Modeling the AAMs as a problem in the Kalman filter to perform the prediction. Occlusion Detection Preventing the lost of position of the object by occluding of other objects. Wt can be done and wt can be improve…
Enhancing Active Appearance Models Shape Appearance Combine the shape and the appearance parameters for optimization In video, shape and appearance may not enough, there are many characteristics and features, such as lightering, brightness, etc… L=[L1, L2, ……, Lm]T
Iterative Search for Fitting Active Appearance Model
Iterative Search for Fitting Active Appearance Model Can be improved by: Prediction matrix Searching space
Initialization for AAMs
Motion Modeling Initial estimate in a frame should be better predicted than just the adaptation from the previous frame. Can be achieved by motion estimation AAMs can do the modeling part Kalman filter can do the prediction part
Kalman Filter Adaptive filter Model the state of a discrete dynamic system. Originally developed in 1960 Filter out noise in electronic signals.
Kalman Filter Formally, we have the model For our tracking system,
Kalman Filter
Occlusion Detection WHY? HOW? Positioning of objects To perform cropping When a real object overlays a virtual one, the virtual object should be cropped before the overlay HOW? High resolution and sharp object boundaries Right occluding boundaries of objects Camera matrix for video capturing
Proposed Object Tracking Paradigm Training Images Paradigm Architecture Occlusion Detection Training Active Appearance Model Video Shape Model Appearance Model Active Appearance Model Fitting Initialization Kalman Filter
Experimental Setup AAM-api from DTU OpenCV Pentium 4 CPU 2.00GHz and 512MB RAM
Experiment on AAMs (1) Training Image
Experiment on AAMs (1) Shape Texture
Experiment on AAMs (1) Initialization After optimized
Demo Video
Demo Video
Demo Video
Demo Video
Experiment on AAMs (2) Training Images
Experiment on AAMs Shape Texture
Experiment on AAMs Trapped in local minimum After optimized Initialization After optimized
Experiment on AAMs
Experiment on AAMs Fit to the face Initialization After optimized
Experiment on AAMs
Object Tracking with AAMs
Experiment on Kalman Filter
Demo Video
Experiment on Kalman Filter
Demo Video
Research Issues AAMs tracking is accurate Very slow Cannot perform real-time tracking Kalman filter help is to increase the speed in prediction Modeling the problem from AAMs to Kalman Filter Improving the fitting algorithm in the AAMs Occlusion detection Important to object tracking Preventing the lost of the position
Conclusion We have done a survey on object tracking and active appearance model is done We proposed a paradigm on video object tracking with active appearance models Goal: Robust Real-time Good performance We have done some initial experiments: Experiments on AAMs Experiments on Kalman filter for object tracking
Q & A