1 Motion Analysis Mike Knowles January 2006
2 Introduction So far you have seen techniques for analysing static images – 2 Dimensional information Now we shall consider time-varying images – video. Variations in a scene through time are caused by motion
3 Motion Motion analysis allows us to extract much useful information from a scene: –Object locations and tracks –Camera Motion –3D Geometry of the scene
4 Contents Perspective and motion geometry Optical flow Estimation of flow Feature point detection, matching and tracking Techniques for tracking moving objects Structure from motion
5 Perspective Geometry An image can be modelled as a projection of the scene at a distance of f from the optical centre O Note convention: capital letters denote scene properties, lowercase for image properties
6 Perspective Geometry The position of our point in the image is related to the position in 3D space by the perspective projection:
7 Motion Geometry If the point is moving in space then it will also move in the image Thus we have a set if vectors v(x,y) describing the motion present in the image at a given position – This is the Optical flow
8 Optical Flow An optical flow field is simply a set of vectors describing the image motion at any point in the image.
9 Estimating Optical Flow In order to estimate optical flow we need to study adjacent frame pairs There are 2 approaches we can take to this: –Greylevel gradient based methods –‘Interesting’ feature matching
10 Greylevel conservation If we have a perfect optical flow field:
11 Greylevel conservation Generally we measure time in frames so dt = 1 This leaves us with
12 Greylevel Conservation Taking a Taylor expansion and eliminating the higher order terms:
13 Greylevel Conservation Tidying up we are left with: This is the standard form of the greylevel constraint equation But…..
14 Limitations of the greylevel constraint equation The greylevel constraint equation only allows us to measure the flow in the direction of the greylevel image gradient
15 The aperture problem Consider a plot in (v x, v y ) space, at a single point in space - the greylevel conservation equation gives a line on which the true flow lies
16 The aperture problem
17 The aperture problem Thus we cannot generate a flow vector for a single point – we have to use a window The larger the window is the better the chance of overcoming this problem But the larger the window the greater the chance of the motion being different This is called the aperture problem
18 Overcoming the aperture problem Several solutions have been proposed: –Assume v(x,y) is smooth (Horn and Schunck’s algorithm) –Assume v(x,y) is locally piecewise linear or constant (Lucas and Kanade’s algorithm) –Assume v(x,y) obeys some simple model (Black and Annandan’s algorithm) We shall consider the latter two solutions
19 Assuming a locally constant field This algorithm assumes that the flow field around some point is constant
20 The model Model: This model is valid for points in some -point neighbourhood where the optical flow is assumed constant.
21 Noise n(x,y,t) is noise corrupting the true greylevel values and is assumed zero- mean and uncorrelated with variance:
22 We can linearise our model: Where:
23 For each point we have an equation: We can write this in matrix form:
24 Matrix A and vector v are:
25 We can solve for using a least squares technique: The result is:
26 We are also interested in the quality of the estimate as measured by the covariance matrix of: It can be shown that: Thus we can determine the variances of the estimates of the components v x and v y
27 We can use the covariance matrix to determine a confidence ellipse at a certain probability (e.g. 99%) that the flow lies in that ellipse
28 It can be seen from the expression for the variance estimates that the accuracy of the algorithm depends on: –Noise variance –Size of the neighbourhood –Edge business
29 Modelling Flow An alternative to assuming constant flow is to use a model of the flow field One such model is the Affine model:
30 Estimating motion models Black and Annandan propose an algorithm for estimating the parameters of the the model parameters This uses robust estimation to separate different classes of motion
31 Minimisation of Error Function Once again, if we are to find the optimum parameters we need an error function to minimise: But this is not in a form that is easy to minimise…
32 Gradient-based Formulation Applying Taylor expansion to the error function: This is the greylevel constraint equation again
33 Gradient-descent Minimisation If we know how the error changes with respect to the parameters, we can home in on the minimum error
34 Applying Gradient Descent We need: Using the chain rule:
35 Robust Estimation What about points that do not belong to the motion we are estimating? These will pull the solution away from the true one
36 Robust Estimators Robust estimators decrease the effect of outliers on estimation
37 Error w.r.t. parameters The complete function is:
38 Aside – Influence Function It can be seen that the first derivative of the robust estimator is used in the minimisation:
39 Pyramid Approach Trying to estimate the parameters form scratch at full scale can be wasteful Therefore a ‘pyramid of resolutions’ or ‘Gaussian pyramid’ is used The principle is to estimate the parameters on a smaller scale and refine until full scale is reached
40 Pyramid of Resolutions Each level in the pyramid is half the scale of the one below – i.e. a quarter of the area
41 Out pops the solution…. –When combined with a suitable gradient based minimisation scheme… Black and Annadan suggest the use of Graduated Non-convexity
42 Feature Matching Feature point matching offers an alternative to gradient based techniques for finding optical flow The principle is to extract the locations of particular features from the frame and track their position in subsequent frames
43 Feature point selection Feature points must be : Local (extended line segments are no good, we require local disparity) Distinct (a lot ‘different’ from neighbouring points) Invariant (to rotation, scale, illumination) The matching process must be : Local (thus limiting the search area) Consistent (leading to ‘smooth’ disparity estimates)
44 Approaches to Feature point selection Previous approaches to feature point selection have been –Moravec interest operator, this is based on thresholding local greylevel squared differences –Symmetric features e.g. circular features, spirals –Line segment endpoints –Corner points
45 EG OF FEATURE POINT STUFF
46 Motion and 3D Structure From Optical Flow This area of computer vision attempts to reconstruct the structure of the 3D environment and the motion of objects within it using optical flow Applications are many, the dominant one is autonomous navigation
47 As we saw previously, the relationship between image plane motion and the 3D motion that it describes is summed up by the perspective projection
48 The perspective projection is described as: We can differentiate this w.r.t. time:
49 Substituting in the original perspective projection equation: We can invert this by solving for
50 This gives us two components – one parallel to the image plane and one along our line of sight
51 Focus of Expansion From the expression for optical flow we can determine a simple structure for the flow vectors in an image corresponding to a rigid body translation:
52 is called the Focus of Expansion (FOE) For towards the camera (negative) the flow vectors point away from the FOE (expansion) and for away from the camera (positive) the flow vectors point towards the FOE (contraction).
53 The FOE provides important 3D information: Thus the direction of translational motion can be determined:
54 We can also estimate the time to impact from flow measurements close to the FOE Both the FOE and time to impact can be estimated using least squares on the optical flow field at a number of image points
55 Structure from Motion Here we shall discuss a simple structure from motion algorithm, which uses optical flow to estimate the 3D structure of the scene. We shall be looking at a simplified situation where the camera is assumed to be fixed – i.e. no pan or tilt
56 The starting point is the optical flow equation: Thus, since is the vector sum of and then the vector product of these 2 vectors is orthogonal to
57
58 But: So:
59 This equation applies to all points Obviously a trivial solution to this equation would be Also if some non-zero vector is a solution then so is the vector for any scalar constant. This confirms that we cannot determine the absolute magnitude of the velocity vector, we can only determine it to a multiplicative scale constant.
60 The solution to this is to solve the equation using a least squares formulation subject to the condition that:
61 We can re-write the orthgonality constraint for all points in matrix form: Where:
62 Thus the problem is This is a classic optimisation problem, the solution is that the opitmal value is given by the eigenvector of that is produced by the minimum eigenvalue.
63 Once we have our estimate we can compute our scene depths using the original optical flow equation:
64 We can estimate each depth using a least squares formulation: The solution of which is: The scene co-ordinates can be found using perspective projection
65 Summary 3D Geometry Optical Flow Flow estimation and the aperture problem Focus of Expansion Structure from Motion
66 Recap Geometry Flow Flow estimation Feature trackers FOE and structure from motion
67 Tracking Goal – to detect and track objects moving independently to the background Two situations to be considered: –Static Background –Moving Background
68 Applications of Motion Tracking Control Applications –Object Avoidance –Automatic Guidance –Head Tracking for Video Conferencing Surveillance/Monitoring Applications –Security Cameras –Traffic Monitoring –People Counting
69 My Work Started by tracking moving objects in a static scene Develop a statistical model of the background Mark all regions that do not conform to the model as moving object
70 My Work Now working on object detection and classification from a moving camera Current focus is motion compensated background filtering Determine motion of background and apply to the model.
71 Detecting moving objects in a static scene Simplest method: –Subtract consecutive frames. –Ideally this will leave only moving objects. –This is not an ideal world….
72 Using a background model Lack of texture in objects mean incomplete object masks are produced. In order to obtain complete object masks we must have a model of the background as a whole.
73 Adapting to variable backgrounds In order to cope with varying backgrounds it is necessary to make the model dynamic A statistical system is used to update the model over time
74 Background Filtering My algorithm based on: “Learning Patterns of Activity using Real-Time Tracking” C. Stauffer and W.E.L. Grimson. IEEE Trans. On Pattern Analysis and Machine Intelligence. August 2000 The history of each pixel is modelled by a sequence of Gaussian distributions
75 Multi-dimensional Gaussian Distributions Described mathematically as: More easily visualised as: (2-Dimensional)
76 Simplifying…. Calculating the full Gaussian for every pixel in frame is very, very slow Therefore I use a linear approximation
77 How do we use this to represent a pixel? Stauffer and Grimson suggest using a static number of Gaussians for each pixel This was found to be inefficient – so the number of Gaussians used to represent each pixel is variable
78 Weights Each Gaussian carries a weight value This weight is a measure of how well the Gaussian represents the history of the pixel If a pixel is found to match a Gaussian then the weight is increased and vice-versa If the weight drops below a threshold then that Gaussian is eliminated
79 Matching Each incoming pixel value must be checked against all the Gaussians at that location If a match is found then the value of that Gaussian is updated If there is no match then a new Gaussian is created with a low weight
80 Updating If a Gaussian matches a pixel, then the value of that Gaussian is updated using the current value The rate of learning is greater in the early stages when the model is being formed
81 Static Scene Object Detection and Tracking Model the background and subtract to obtain object mask Filter to remove noise Group adjacent pixels to obtain objects Track objects between frames to develop trajectories
82 Moving Camera Sequences Basic Idea is the same as before –Detect and track objects moving within a scene BUT – this time the camera is not stationary, so everything is moving
83 Motion Segmentation Use a motion estimation algorithm on the whole frame Iteratively apply the same algorithm to areas that do not conform to this motion to find all motions present Problem – this is very, very slow
84 Motion Compensated Background Filtering Basic Principle –Develop and maintain background model as previously –Determine global motion and use this to update the model between frames
85 Advantages Only one motion model has to be found –This is therefore much faster Estimating motion for small regions can be unreliable Not as easy as it sounds though…..
86 Motion Models Trying to determine the exact optical flow at every point in the frame would be ridiculously slow Therefore we try to fit a parametric model to the motion
87 Affine Motion Model The affine model describes the vector at each point in the image Need to find values for the parameters that best fit the motion present
88 Background Motion Estimation Uses the framework described earlier by Black and Annadan
89 Examples
90 Other approaches to Tracking Many approaches using active contours – a.k.a. snakes –Parameterised curves –Fitted to the image by minimising some cost function – often based on fitting the contour to edges
91 Constraining shape To avoid the snake being influenced by point we aren’t interested in, use a model to constrain its shape.
92 CONDENSATION No discussion on tracking can omit the CONDENSATION algorithm developed by Isard and Blake. CONditional DENSity propagATION Non-gaussian substitute for the Kalman Filter Uses factored sampling to model non- gaussian probabiltiy densities and estimate propogate them though time.
93 CONDENSATION Thus we can take a set of parameters and estimate them from frame to frame, using current information from the frames These parameters may be positions or shape parameters from a snake.
94 CONDENSATION - Algorithm Randomly take samples from the previous distribution. Apply a random drift and deterministic diffusion based on a model of how the parameters behave to the samples. Weight each sample on the basis of the current information. Estimate of actual value can be either a weighted average or a peak value from the distribution
95 Tracking Summary Static-scene background subtraction methods Extensions to moving camera systems Use of model-constrained active contour systems CONDENSATION