Tracking through Optical Snow

Tracking through Optical Snow
Michael Langer Richard Mann School of Computer Science School of Computer Science McGill U U. Waterloo

Optical snow e.g. falling snow
Today I’m going to tell you about a new category of motion that Richard Mann and I have been trying to understand. We call the motion Optical snow. An example is what you see during a snow fall. Take the ideal case that each snowflake is falling at the same 3D velocity. In this case, dense motion parallax occurs. Snowflakes that are closer to the eye move with a faster image speed than snow flakes that are further from the eye. e.g. falling snow

Optical snow Here we show a synthetic image sequence. The scene is a set of randomly placed spheres in a view volume. The spheres are obviously falling downwards. This is a very complex motion. Many depth discontinuities are present, as well as many image speeds which are due to the motion parallax. Despite all the depth discontinuities, the sequence produces a rich motion percept. Note that the percept has a very rich layered structure. Any model that is based on only two layers is not going to do well here.

Optical snow Moving observer in a 3D cluttered scene
You might be skeptical and say that falling snow doesn’t come up very often in nature and that it is not so interesting to study. We argue differently. Optical snow arises whenever an observer moves relative to a cluttered 3D scene such as a tree or bush. Walking through a forest or any 3D cluttered scene gives you dense motion parallax, or optical snow. Points that are near to you move with a different speed than points that are farther away. WHETHER IT IS A RIGID SCENE MOVING W.R.T. A FIXED OBSERVER LIKE IN FALLING SNOW, OR A STATIC SCENE AS SEEN BY A MOVING OBSERVER, THE MOTION IS THE SAME. Moving observer in a 3D cluttered scene

Optical snow We are all familiar with this type of motion. We have a rich motion percept, even though the image sequence itself is enormously complex. I am certainly not claiming that we have an accurate percept of the depths of the objects. Rather we perceive the basic statistical properties of the motion. Namely, this is a dense 3D scene and we can estimate the direction of motion and perhaps the range of image. We are trying to understand how this is possible. What sort of computations are necessary or sufficient to achieve these percepts?. What sort of computational problem is the visual system solving and how does it solve it?

Related work Computation of Image Flow: Psychophysics of Heading:
“The Fox and the Forest” (Steve Zucker ’80’s) Psychophysics of Heading: “3D cloud of dots” (Bill Warren ’80s-’90s) Ecological Optics (J. J. Gibson) monkeys in a forest, cats in the tall grass. But how ? WHAT WORDS TO USE HERE? We are certainly not the first people to point out that 3D cluttered scenes are egologically interesting and important. Helmoltz discussed 3d cluttered scenes. More recently, Bill Warren and others have pointed out that cluttered scenes such as grasslands and forests do not produce smooth motion fields, and that such optical want to emphasize the ecological importance of this optical snow. Among the animals that inhabit such scenes are the ones most commonly studied in visual neuroscience, namely rabbit, cat, and monkey. Today I am going to go over some of the basic first steps in answering this question. MENTION THAT IT HAS BEEN USED BY HEADING PEOPLE e.g. Warren, “3D cloud of dots”

Goal of this talk How to model and compute image velocities in a 3D cluttered scene ?

Overview of Talk Fourier analysis of optical snow (Langer & Mann, ICCV ’01) Generalized optical snow Biologically motivated computational model (sketch only)

v f + v f + f = 0 Fourier model of image translation
(Fahle & Poggio ’81, Watson & Ahumada ’85) f t t f y f x We first consider an observation that was made by Watson and Ahumada back in If an image is translating with some uniform velociy (vx,vy) then the power spectrum of the motion has a very simple property. Here’s the idea. Each of the spatial frequency components of the image translates with this velocity as well. But any spatial frequency component that is translating with velocity (vx,vy) will induce a temporal frequency. The relation between spatial frequencies and the temporal frequency is a linear one and is shown here. It follows that if you take the 3d fourier transform of a translating image, then all the power will lie along a plane in the frequency domain. This is not an obvious result. But its true. v f + v f + f = 0 x x y y t

Optical Snow (v , v ) = (α t , α t ) x y x y
Now lets relate this to optical snow. The key property of optical snow is that rather than having a single velocity vector (vx, vy) you have a family of velocity vectors all of which have the same direction. Mathematically, instead of having a velocity (vx,vy) we now have a set of velocities scaled by speeds alpha. All motion is in the same direction but there are many speeds. In the case of vertically falling snow, the direction of motion tau is (0,1). (v , v ) = (α t , α t ) x y x y

Optical Snow (v , v ) = (α t , α t ) x y x y
In the case of an observer moving laterally through a 3D cluttered scene, the direction of motion is horizontal. The speeds will vary from point to point but all the velocity directions are horizontal. (v , v ) = (α t , α t ) x y x y

Optical snow (v , v ) = (α t , α t ) v v y x x y x y
In the model of optical snow that I’ve presented up to now, all the object surfaces are moving in the same image directions but there is a range of speeds. Let’s now generalize this. We do so by adding a constant (omegax, omegay) to each velocity vector. Mathematically, what we are doing is taking the line of velocities that is in direction (taux, tauy) and shifting this line off the origin. But what does this mean? Let me look at two problems in which this model is relevant. (v , v ) = (α t , α t ) x y x y

Fourier model of optical snow
“bowtie” When the optical snow is in a general direction (taux, tauy) and there is a range of speeds, then we can directly apply Watson and Ahumada’s motion plane. But Instead of having one motion plane, we now have a family of planes. Each speed alpha gives rise to its own plane. Interesting, all these planes intersect at a common axis and this axis in the (fx,fy) plane. So, I am claiming that if you look at the 3D power spectrum of optical snow, then you don’t get a plane, but you don’t get junk either. What you get is a bowtie pattern. α t f + α t f + f = 0 x x y y t

Example of bowtie in power spectrum
This is not just a mathematical theory. Here is the bowtie that you get from the bush sequence that I showed you a few slides back. Because the motion is horizontal, we know we can project the 3D power spectrum along the appropriate axis in order to see it. If we project the power spectrum along different axes, then this bowtie pattern disappears. That is, we need to view the power spectrum from the correct direction in order to see the bowtie. bush sequence

Overview of Talk Fourier analysis of optical snow (Langer & Mann, ICCV ’01) Generalized optical snow Biologically motivated computational model (sketch only)

Moving observer in 3D cluttered scene
rotation In particular, if the observer is moving laterally through the scene, then the translation direction is constant over the image. What I will do today is to generalize the model of optical snow that we presented in earlier papers to the case in which the observer makes a general eye rotation while moving laterally through the scene. This eye rotation is illustrated in the figure shown here. A good approximation of the effect of an eye rotation is that it adds a constant velocity to each point in the image. We call this constant velocity term (omegax, omegay). There are really two assumptions being made here. The first is that the rotation is a combination of a pan and tilt. The camera does not roll. The second assumption is that the field of view is relatively small, so that second order effects of rotation can be ignored. translation

(Longuet-Higgins and Prazdney 1980) Velocity field is the sum of two fields : translation of camera - depends on 3D scene geometry (depth) rotation of camera - independent of 3D scene geometry The first problem is the one we saw earlier. How can a moving observer judge its heading in a 3D cluttered scene. Longuet-Higgins and Prazdny and others showed years ago, the instanteous motion field seen by a moving observer in a static 3d scene is the sum of a translation and a rotation field, that is, the camera translation and its rotation. The model makes no assumption about smoothness of the scene geometry. The model holds fine for 3D cluttered scenes. I’m not going to drag you through the equations. The important point is that the translation field depends on depth and the rotation field is independent of depth.

rotation (pan + tilt) In particular, if the observer is moving laterally through the scene, then the translation direction is constant over the image. What I will do today is to generalize the model of optical snow that we presented in earlier papers to the case in which the observer makes a general eye rotation while moving laterally through the scene. This eye rotation is illustrated in the figure shown here. A good approximation of the effect of an eye rotation is that it adds a constant velocity to each point in the image. We call this constant velocity term (omegax, omegay). There are really two assumptions being made here. The first is that the rotation is a combination of a pan and tilt. The camera does not roll. The second assumption is that the field of view is relatively small, so that second order effects of rotation can be ignored. translation (lateral)

vertical translation + pan to left
Here is an example of what we mean. Take the vertically falling snowfall sequence from earlier and add to it a camera rotation which is a pan to the left. The result is a rather odd motion field. It is the sum of vertical parallel snow, plus a constant horizontal drift which is due to camera pan. vertical translation + pan to left

Tracking through optical snow
Here is an example of what we mean. Take the vertically falling snowfall sequence from earlier and add to it a camera rotation which is a pan to the left. The result is a rather odd motion field. It is the sum of vertical parallel snow, plus a constant horizontal drift which is due to camera pan. vertical translation + pan to left

Generalized optical snow
v y v x In the model of optical snow that I’ve presented up to now, all the object surfaces are moving in the same image directions but there is a range of speeds. Let’s now generalize this. We do so by adding a constant (omegax, omegay) to each velocity vector. Mathematically, what we are doing is taking the line of velocities that is in direction (taux, tauy) and shifting this line off the origin. But what does this mean? Let me look at two problems in which this model is relevant. (v , v ) = ( α t , α t ) + (w , w ) x y x y x y translation rotation

Fourier model of generalized optical snow
“tilted bowtie” When the optical snow is in a general direction (taux, tauy) and there is a range of speeds, then we can directly apply Watson and Ahumada’s motion plane. But Instead of having one motion plane, we now have a family of planes. Each speed alpha gives rise to its own plane. Interesting, all these planes intersect at a common axis and this axis in the (fx,fy) plane. So, I am claiming that if you look at the 3D power spectrum of optical snow, then you don’t get a plane, but you don’t get junk either. What you get is a bowtie pattern. (α t + w ) f (α t + w ) f + f = 0 x x x y y y t

Tilted bowtie in power spectrum
Since we generated the sequence, we know what is the axis of the bowtie. Here is the 3D power spectrum projected along the bowtie axis. You will notice that there are some funny aliasing effects that are visible in this plot namely the wraparound at the boundary of the frequency domain. These aliasing effects are due to the high precision of the rendering. For real images, motion blur occurs at the boundaries of the spheres and these aliasing effects are not present. They were not present in the power spectrum plot I showed earlier which showed the bush sequence. vertical translation + pan to left

Overview of Talk Fourier analysis of optical snow
Generalized optical snow Biologically motivated computational model (sketch only)

Oriented, directionally tuned cells in V1.
f t - f - + y - - + - + - f x Neuroscientists tell us that each complex cells in V1 is sensitive to a particular region of the visual field, and to a particular combination of spatial and temporal frequencies. That is, complex cells are orientation and directionally tuned. The blue sphere that I’ve shown here shows the region of the 3D frequency domain for which one V1 cell has its peak sensitivity.

Oriented, directionally tuned cells in V1.
f t y x Now consider a family of these complex cells in V1 cells and suppose that they cover the3D frequency domain. The tiling I’ve shown here is just a cartoon.

Pure image translation (v , v )
x y f t y x f t t f y f x If we take the case of pure image translation, then we get single motion plane. On the right, I’ve marked in red those cells that have peak response to this motion plane. Heeger and Simoncelli and others have used this idea to propose a model of detecting pure image translation. The idea is to have a higher level cell, such as an MT cell, look for a particular pattern of responses in V1 that indicates a particular motion plane. Heeger and Simoncelli say that for a given motion plane, you get a distributed code of cells that overlap with the motion plane, and from this distrubted code you can build template cells. If you are familiar with this stuff already, GREAT. If not, the point to take away is that since pure translation motion gives rise to a plane in the frequency domain, it also gives rise to a distribution of responses of complex cells. (see Heeger ’87, Yuille and Grzywacz ’90, Simoncelli and Heeger ‘97)

Generalized optical snow
f t f x f y The same holds for non-parallel snow. When you add an arbitrary pan or tilt of the camera, for example in tracking an object, the bowtie may tilt out of the fx, fy plane. In this case, the distributed code follows the bowtie. A crude sketch of the distributed code is shown on the right. Let me be clear about what we are and are not claiming here. We are not claiming that there are template cells in MT that are sensitive to particular distributed codes. I personally have no idea what is going on in MT – I am not a neuroscientist. What I am claiming is that if you believe the textbook description of complex cells in V1, which the neuroscientists tell us, then non-parallel optical snow will give us a distributed code over these cells. How this distributed code is processed by higher levels of the brain is an open problem. (Langer and Mann, in preparation)

Summary Goal: how to model and compute image velocities in a 3D cluttered scene ? Generalized optical snow: lateral motion + pan and tilt → tilted bowtie in frequency domain. Many algorithms possible for fitting bowtie

Computational models of heading
- Longuet-Higgins and Prazdney 1980 Rieger and Lawton 1994 Heeger and Jepson 1992 Hildreth 1992, Lappe and Rauschecker 1993 Royden 1997, …. These models assume “the image velocity field” can be pre-computed. But this assumption is problematic in a 3D cluttered scene.

Tracking through Optical Snow

Similar presentations

Presentation on theme: "Tracking through Optical Snow"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tracking through Optical Snow

Similar presentations

Presentation on theme: "Tracking through Optical Snow"— Presentation transcript:

Similar presentations

About project

Feedback