Presentation is loading. Please wait.

Presentation is loading. Please wait.

Object recognition using shading

Similar presentations


Presentation on theme: "Object recognition using shading"— Presentation transcript:

1 Object recognition using shading
For images such as shown here, we can easily recognize the objects that are present. The main source of information about the object in these images is shading. So, we recognize the objects using shading. How does our visual system do this?

2 Object recognition using shading
Strong claim #1: to recognize an object using shading, we reconstruct the object’s 3D shape and then we recognize this 3D shape. There have been two strong claims made over the years. The first is that to recognize a shaded object using shading, we reconstruct the object’s 3D shape using a shape-from-shading process, and then we recognize this 3D shape. That is, we take this internal representation of 3D shape (whether it’s a depth map or a solid model) and we match it to previous 3D shape representations in memory. This claim (which is what Marr argued in his book) is too strong. The reason its too strong is that shape from shading is a ill posed problem. Since a visual system typically do not know the exact lighting conditions and surface reflectance, it is impossible (in principle) to reconstruct an accurate 3D shape model. We can’t compute accurate enough 3D shapes to explain human recognition.

3 Object recognition using shading
Strong claim #2: shape from shading is not used in object recognition Strong claim # 2 is just the opposite. It says that shape from shading is so unreliable that its not used at all in object recognition. Besides, (claim #2 goe), we don’t need shape from shading to do recognition. We can recognize objects from images alone. Claim#2 may be correct, and it would be fascinating if it were correct. But I think it is too strong a claim. Humans do perceive shape from shading to some extent. There have been lots of psychophysical experiments to show this. So why would the brain not use these shape percepts to help it recognize objects? The neuroscientists often tell us that the brain uses whatever tricks it can to solve its problems. So why not use shape perception when recognizing objects?

4 What is the role of shape from shading in object recognition ?
To what extent do humans perceive 3D shape from shading ? To what extent do humans use 3D shape perception to recognize objects ? We need to answer Q1 before we can answer Q2. Both of these claims are too strong. So we need a middle ground. We need to ask, what IS the role of shape from shading in object recognition? This question really breaks down into two very different questions. The first is, to what extent do we perceive 3D shape from shading? We certainly do perceive shape from shading. But what are the rules? What are the mechanisms? The second question is, to what extent do we use the 3D perception to recognize objects. That is, if you accept that we do perceive 3D shape information, then you have to next ask, how (if at all) are these percepts of 3D shape used to recognize the object. If you think about it, you can’t answer the second question until you’ve answered the first one. Before you can ask how or if we use shape percepts to recognize objects, (before we can design careful psychophysical experiments to test how we use our 3D shape percepts to recognize objects), we first need to know what 3D shape properties we do perceive. The experiments I’m going to present in this talk are addressing the first question.

5 Human perception of local shape from shading under variable lighting
Michael S. Langer* Heinrich H. Bülthoff Max-Planck-Institute for Biological Cybernetics Tübingen, Germany *McGill University, Montreal, Canada I’m going to tell you about some recent shape from shading psychophysical experiments that I’ve done together with Heinrich Buelthoff.

6 Illumination models Sunny day (Horn ‘70, …)
Cloudy day (Langer and Zucker ICCV ’93, Stewart and Langer CVPR ’97) We are going to examine shape from shading perception under two different lighting conditions. Because this is primarily a computer vision audience, I’ll point you first to the lighting conditions that we will consider. First we will look at the sunny day problem. This the problem with which you are most familiar. It was introduced by Horn and studied throughout the 70’s and 80’s. The Horn model is what you find in the textbooks under SFS. I’ll review the Horn model in a moment. We are also going to look at the cloudy day SFS problem. This is a problem my collegues and I introduced and solved in the ’90s my colleagues. The cloudy day problem is really quite different from the sunny day shape from shading problem.

7 Overview Experiment 1: SFS on a sunny day
Experiment 2: SFS on a cloudy day This talk will consist of two parts. First, I’ll tell you about an Experiment we did to understand how humans perceive local shape from shading on a sunny day. Second, I’ll tell you about an Experiment we did to understand how humans perceive local shape from shading on a cloudy day.

8 I(x) = r N(x) L SFS on a Sunny Day L N(x)
Let’s begin with the sunny day problem. The sunny day model says that the image intensity at a point x depends on the angle between the surface normal N and the light source direction L. that is, “ I equals N dot L.” Throughout this talk I will assume that the surface is Lambertian and that reflectance is constant over the object. From now on, I’ll just drop the constant rho from the equations I(x) = r N(x) L

9 Depth-reversal ambiguity on a sunny day
One very interesting and important aspect of the sunny day SFS problem is the depth reversal ambiguity. A valley illluminated from one direction produces the same shading pattern as a hill illuminated from the opposite direction. There are two ways to resolve this ambiguity,. The first is to just assume some particular lighting direction. For example, it is well known that the human visual system prefers light source from above rather than light source from below. The second way to resolve the ambiguity is to use other information in the stimulus, for example, shadows or perspective cues. Our main interest in the first experiment is to understand how the visual system resolves the depth reversal ambiguity for complex shapes such as the shaded figures I showed earlier. valley hill

10 Hollow Mask Illusion (Luckiesh, 1916)
The depth reversal ambiguity is classical. Many of the early references to this ambiguity talk about the so-called hollow mask illusion. If we are shown an image of a face, then our visual system will assume a face is indeed present even though there is an equally valid interpretation which is a hollow mask. That is, a face illuminated from the right looks the same as a hollow mask illuminated from the left. In fact, the object shown here is actually a hollow mask illuminated from the left, rather than a face illuminated from the right. This hollow mask illusion is quite strong. It can often overwhelm texture cues, stereo cues, and motion cues.

11 “Hollow mask illusion is due to two factors” (Johnston et al ’92 , Hill and Bruce ‘94)
face vs. hollow mask familiarity global convexity (recognition) The classical explanation of the hollow mask illusion is that the visual system prefers the face interpretation because the visual system prefers to see familiar objects over unfamilar ones. Faces are certainly more familiar and more commonly seen that the inside of a hollow mask. It was argued in the 1990s by Alan Johnston and colleagues and later by Hill and Bruce that the hollow mask illusion is in fact a result of two factors. The first factor is familiarity which I just mentioned. The second factor is global shape. The visual system makes a prior assumption that an object seen in isolation is globally convex, rather than globally concave.

12 Our experiments use unfamiliar surfaces
convex concave “face” “mask” So now let’s get to our experiments. The first experiment I will tell you about looked at how well humans perceive local shape from shading, when global factors such as global shape and lighting direction are varied. We used surfaces such as shown here. These were rendered using computer graphics. The surfaces are a slice through a bumpy hollow cylinder. The surface on the left is globally convex, like a face. The one on the right is globally concave, like a mask. You can see for yourself that it is easy to discriminate these surfaces by their global shape. The surface on the left is globally convex since it bulges in the middle and the surface on the right is globally concave since it is narrow in the middle. The bulging and narrowing is due to the perspective projection. What’s new in our experiment is that we measured local shape perception, rather than global shape perception. As we will see, local shape perception is not at all easy.

13 Procedure Then a single point is marked on the surface and the observer makes an eye movement toward that marked point. One second later, the cylindrical surface is replaced by a randomly corrugated surface having the same global shape and the same lighting condition as the cylindrical surface.

14 Then a single point is marked on the surface and the observer makes an eye movement toward that marked point. One second later, the cylindrical surface is replaced by a randomly corrugated surface having the same global shape and the same lighting condition as the cylindrical surface.

15 Task: hill or valley ? The observer’s task is to judge whether the marked point is on a local hill or in a local valley. In the experiment, the size of the black probe point was much smaller than what is shown here so that the marked probe does not interfere very much with the surrounding shading pattern. The response time for each trial was restricted to a maximum of 3 seconds. Observers were typically responded “hill” or “valley” in less than one second. They responded by pressing one of two keys on the keyboard. Notice that to perform the task, observers had to resolve the depth reversal ambiguity. Because the global shape is easy to resolve, you might think that there is no depth reversal ambiguity. The global shape percept should resolve the depth reversal ambiguity. But just because this information is available to the observer, it doesn’t mean the observer will use this information.

16 Three factors were tested
light source direction (above > below) global shape (convex > concave) viewpoint (above > below) We tested three different factors (or prior assumptions) that observers could use in this hill vs. valley task. We explain these in more detail in the following slides. The first factor is the on light source direction. Observers prefer the light source to be from above rather than from below. So their local shape interpretation might depend on this factor. The second factor is global shape. As I mentioned earlier, observers prefer globally convex objects over globally concave objects. That is, observers prefer faces rather than masks. So the local shape interpretation might depend on this factor as well. The third factor is a very interesting one that you might not be familiar with. This is a factor known as viewpoint direction. Observers prefer the viewpoint to be from above rather than from below. That is, they prefer a view of a floor over a view of a ceiling. It was discovered by Reichel and Todd and there have since been other experiments done that strongly support this factor. I’ll explain this factor more over the next few slides.

17 1. light source direction
from above below The first factor is light source direction. On half of the trials, the light was from above and on the other half of the trials the light was from below.

18 2. global shape convex concave
The second factor is global shape. In half the trials, the surface was globally convex. In the other half of the trials the surface was globally concave.

19 3. Viewpoint (Reichel & Todd 1990)
from above from below (floor) (ceiling) Reichel and Todd discovered that when resolving the depth reversal ambiguity, observers tend to perceive shapes that are consistent with the surface having a global floor like slant rather than a global ceiling like slant. If the surface has an overall slant, then observers tend to perceive it as viewed from above rather than from below.

20 Factor 3: viewpoint view from above view from below
The third factor was viewing direction. On half the trials, the surface probe point was on a floor-like part of the surface – that is, viewed from above. In the other half of the trials, the probe point was on a ceiling-like part of the surface, that is, viewed from below. For a globally convex surface, the upper half has a viewed from above slant and the lower half has a viewed from below slant.

21 Factor 3: viewpoint view from above view from below
For the globally concave surfaces, the upper half of the surface is viewed from below and the lower half of the surface is viewed from above. I should mention as well that the previous studies that examined this factor only used surfaces that were globally flat. Our study is the first to use globally curved surfaces.

22 Design three factors (2 x 2 x 2) - light direction (above, below)
- global shape (convex, concave) - viewpoint (floor, ceiling) 512 trials (64 x 8 conditions) The experiment we present here used a three factor, within-observer design. The three factors were 1. Light direction, global shape, and viewpoint. For each factor, there were two levels. Light from above vs. below,, globally convex vs. concave, viewpoint from above vs. below. Observers ran 512 trials each, consisting of 64 trials for each of the eight conditions. The trials were randomly ordered for each observer. Let us have a look at the data.

23 Results (linear regression)
percent correct (hill or valley ?) = * light direction * global shape * viewpoint (Each factor had value of –1 or 1) An analysis of variance revealed a significant effect for all three factors. The results of a linear regression are shown here. Each of the three factors has a value of either 1 or -1, depending on whether the it had the preferred or non-preferred value. We see that each of three of the factors added or subtracted about 10 percent from the score, depending on whether the value was prefered or non-prefered. Also notice that the overall percent correct score was Thus, observers were at chance overall. What does this mean? It means that in judging local shape, observers ignored all global information in the image. There were shadow cues that indicated the light was from above or from below. There were perspective cues that indicated whether the surface was globally convex or concave. But observers ignored these cues. In judging local shape, it appears they restricted their processing or attention to the local information in the stimulus only. This was very interesting, and quite unexpected.

24 Conclusions: Experiment 1
many factors play a role in resolving the depth-reversal ambiguity (light direction, global shape, viewpoint, ….) The main conclusion from this study is that factors of light source direction, globally convex shape, and “viewpoint” all play a role in resolving the depth reversal ambiguity in local shape from shading. The factors of global shape and viewpoint are really quite interesting. What we seem to have found is that when we perceive qualitative local shape at small scale, we make assumptions about the surface on a larger scale both about the second order structure – namely whether the surface is globally convex or concave – as well as about the first order large scale structure – namely whether the surface has a floor or ceilling like slant in a neighborhood. These factors play a significant role even if other information is present in the stimulus which could potentially overrule the factors, such as perspective information.

25 Conclusions: Experiment 1
many factors play a role in resolving the depth-reversal ambiguity (light direction, global shape, viewpoint, ….) observers often ignore available image information (perspective, occluding contours, shadows) The main conclusion from this study is that factors of light source direction, globally convex shape, and “viewpoint” all play a role in resolving the depth reversal ambiguity in local shape from shading. The factors of global shape and viewpoint are really quite interesting. What we seem to have found is that when we perceive qualitative local shape at small scale, we make assumptions about the surface on a larger scale both about the second order structure – namely whether the surface is globally convex or concave – as well as about the first order large scale structure – namely whether the surface has a floor or ceilling like slant in a neighborhood. These factors play a significant role even if other information is present in the stimulus which could potentially overrule the factors, such as perspective information.

26 Experiment 2: Shape from shading on a cloudy day
The surface shown here was rendered using a diffuse lighting model. The light source we used was a Ganzfeld, .. Sphere of uniform luminance. I hope you agree that, subjectively at least, the rendering yields a vivid impression of surface shape. Our visual systems don’t just shut down when we look at shading under diffuse lighting. In fact, this subjective impression is correct. When we asked observers to make these judgments for surfaces rendered under diffuse lighting, in an experiment similar to one I just presented, their performance was a high as under the best sunny day condition tested. Is this surprising? Certainly the visual system cannot use the same “model” to solve the task as in the sunny day case. As I discussed earlier, shading is very different on sunny vs. cloudy days. To perform so well under diffuse lighting, observers must have used a model that was consistent with shading under diffuse lighting. Our second experiment tries to get at the question of what shading model the visual system uses to perceive shape from shading under diffuse lighting?

27  (x) Shading on a cloudy day
The second experiment looks at the problem of shape from shading under diffuse lighting, such as on a cloudy day. Surfaces have a very different appearance under diffuse lighting than they do under sunny day lighting. The main difference is that, under diffuse lighting, shadowing effects are always present and important. Points in the valley receive less illumination from the diffuse source than points on the surrounding hill since less of the diffuse source is visible from the valley. This is primarily a shadowing effect. For example, in this sketch, the luminance at a point depends not just on the surface normal, but also on the amount of the diffuse source that is visible from the surface. In the figure shown here, the amount of the diffuse source is represented by the angle theta For points on top of the highest hill, theta is 180 degrees. For the point shown in the valley, theta is roughly 90 degrees. (x)

28   I(x) N(x) L d L (x)  (x) = angle of visible light source 
Shading on a Cloudy Day We can model shading under diffuse lighting as follows. The model is identical to the sunny day model, “ I equals N dot L”, except that now we integrate “N dot L” over the set of directions in which the diffuse source is visible. The image intensity at x thus depends on the surface normal, as in the sunny day case, but it also depends on the angle “theta” of the diffuse source that is visible from x. I(x) N(x) L d L (x) (x) = angle of visible light source

29 Shading in a valley on a cloudy day
The second experiment looks at the problem of shape from shading under diffuse lighting, such as on a cloudy day. Surfaces have a very different appearance under diffuse lighting than they do under sunny day lighting. The main difference is that, under diffuse lighting, shadowing effects are always present and important. Points in the valley receive less illumination from the diffuse source than points on the surrounding hill since less of the diffuse source is visible from the valley. This is primarily a shadowing effect. For example, in this sketch, the luminance at a point depends not just on the surface normal, but also on the amount of the diffuse source that is visible from the surface. In the figure shown here, the amount of the diffuse source is represented by the angle theta For points on top of the highest hill, theta is 180 degrees. For the point shown in the valley, theta is roughly 90 degrees. local intensity maxima

30 Local intensity maxima in valleys
These local intensity maxima in valleys are actually quite significant. Have a look at this image. If you look in the valleys, you should be able to see these local intensity maxima quite easily.

31 Experiment 2: How well do humans perceive shape from shading in the presence of these local intensity maxima ? One simple model that has been suggested is “dark means deep”. According to this model, image intensity is identified directly with surface height. That is, brighter points are closer and darker points are farther away. A dark-means-deep model is plausible under diffuse lighting since hills do tend to receive more illumination than valleys and so hills do tend to be brighter than valleys.

32 Experiment 2: How well do humans perceive shape from shading in the presence of these local intensity maxima ? Hypothesis: (shape from shading skeptic) Humans use “dark means deep” model. One simple model that has been suggested is “dark means deep”. According to this model, image intensity is identified directly with surface height. That is, brighter points are closer and darker points are farther away. A dark-means-deep model is plausible under diffuse lighting since hills do tend to receive more illumination than valleys and so hills do tend to be brighter than valleys.

33 Procedure Let us now walk through one trial of the Experiment 2. The subject is shown a white CRT screen containing a grey silhouette.

34 Then, two small black dots appear, and the subject makes an eye movement to these dots.

35 Task: Which is higher ? Then the shaded stimulus appears beneath the dots . The task is to judge which of the two dots on the surface is higher, that is, which is closer to the observer. The subject has 1.5 seconds to respond. Let me be clear on what we are testing. The hypothesis is that observers use a dark-means-deep model. If this hypothesis is correct, then performance should be above chance in the correlated condition, and below chance in the anti-correlated condition. And overall, performance should be at chance. In the actual experiment the dots were smaller than what is shown here ! We didn’t want the dots to cover up the shading.

36 Two conditions _ + _ + intensity correlated anti-correlated
One can show that luminance and surface height are statistically correlated under diffuse lighting, since hills do tend to receive more illumination than valleys. That is, if you take two nearby pixels, their difference in height will be correlated with the difference in intensity. The correlation is not perfect, however. Sometimes, as we’ve just seen, the deeper point will be brighter. Our second experiment was designed to test whether human observers used a dark means deep cue as a model for shape from shading. The idea was to have observers compare the heights of two nearby surface points. Each pair of points differs both in intensity and in height. This defines two conditions. Either the height and intensity differences were of the same sign, or the differences were of opposite sign. We refer to the two conditions as “correlated” and “anti-correlated”, respectively. _ + height _ + correlated anti-correlated

37 overall score 65% _ + Which is higher ? (above chance) N=17
0.2 0.4 0.6 0.8 1 N=17 Here are the results for seventeen naive observers. Performance was significantly better in the correlated condition than in the anti-correlated condition, which is consistent with a dark-means-deep model. However, overall, observers were well above chance. This is not consistent with a dark-means-deep model. A dark-means-deep model predicts that observers should be at chance overall. What’s going on here? If you are on your toes, then you might ask whether the near-chance performance in the anti-correlated condition might have been due to a different factor, for example, it could be that observers just couldn’t resolve the shading in the anti-correlated trials because the local intensity maxima in valleys was too small. overall score 65% (above chance) percent correct _ +

38 Conclusion: Experiment 2
“Dark means deep” is too simple a model to explain human perception of shape from shading on a cloudy day.

39 Big Picture: What role does 3D shape perception play in 3D object recognition?

40 Big Picture: What role does 3D shape perception play in 3D object recognition?
To what extent do we perceive 3D shape ? We’re on the way to answering this question.

41 Big Picture: What role does 3D shape perception play in 3D object recognition?
To what extent do we perceive 3D shape ? We’re on the way to answering this question. To what extent do we use these 3D shape percepts to recognize objects ? The answer to Q2 depends on Q1.

42 Computer vision psychophysics
I(x) = q(x) (Langer and Zucker ‘93) The first algorithm is based on a simple assumption that image intensity, I, is proportional to the angle, theta, of the visible diffuse source. This is the same angle theta we saw earlier in the talk. The first model does not consider the surface normal effects. Thus the first model cannot be expected to correctly interpret the local maxima in valleys, which I argued are due to local surface normal. The second model does consider surface normal effects, as well as the angle of the diffuse source. To reiterate, we compare these models to the human observers, we ran each of these models on the images used in the experiment. From the computed depth maps, we obtained the performance of the models. I(x) = N(x) L d L (x) (Stewart and Langer ‘96)

43 Psychophysics: Human vs. Computer
1 0.8 q q, N Here are the results, shown alongside the human data. Both computer models were 100 percent correct in the correlated condition but in the anti-correlated condition, the model based only on the solid angle was 28 percent correct while the more accurate model was 78 percent correct. The below-chance performance of the first model in the anti-correlated condition is understandable, since that model does not consider surface normal effects. As we discussed earlier, many of the anti-correlated trials are due precisely to surface normal effects within valleys. Also note that the second computational model (SL’96) was above-chance in the anti-correlated condition. This implies that such performance is possible by a vision system in principle, and that the limitations of the human observer are not necessarily inherent in the task.” 0.6 percent correct 0.4 0.2 + - + - + - human computer

44 Conclusions: Experiment 2
“Dark means deep” is too simple to explain human perception of shape from shading on a cloudy day Computing local shape in valleys is an inherently difficult computational problem on a cloudy day

45 Which point is brighter ?
0.2 0.4 0.6 0.8 1 height N=10 brightness To test for this, we repeated the experiment with several news subjects but we now asked them to judge which of the two points was brighter, rather than which was higher. The results shown in yellow, along with the previous results for the height task, shown in white. We see that performance was well above chance for the brightness task. Although there does seem to be some difficulty in judging relative brightness in the anti-correlated condition, observers are still quite good at this task. So the fact that they are at chance in judging relative height in the anti-correlated condition is not just due to an inability to see shading. Why then do observers perform so poorly in the anti-correlated condition? A second hypothesis is that this condition is inherently difficult. That is, computing the local shape of a valley is inherently difficult under diffuse lighting. To test this hypothesis, we ran our two computer vision algorithms on the same stimuli as were used by the human observers. percent correct _ + Correlation

46 { } Examples: 87 % 15% (best) (worst)
Here are the best and worst cases. The upper left shows the best case. The light is coming from above. The surface region is viewed from above. And the surface is convex. In this condition, observers were 87 percent correct in discriminating hills from valleys. The upper right shows the worse case. The light is from below, the surface region is viewed from below, and the surface is concave. In this case, observers were 15 percent correct. That is, observers were systematically fooled under this condition. .


Download ppt "Object recognition using shading"

Similar presentations


Ads by Google