Download presentation
1
The Three R’s of Vision Jitendra Malik
2
The Three R’s of Vision Recognition Reorganization Reconstruction
3
Recognition, Reconstruction & Reorganization
4
Review Recognition Reconstruction Reorganization
2D problems such as handwriting recognition, face detection Partial progress on 3d object category recognition Reconstruction Feature matching + multiple view geometry has led to city scale point cloud reconstructions Reorganization Graphcuts for interactive segmentation Bottom up boundaries and regions/superpixels
5
Caltech-101 [Fei-Fei et al. 04]
102 classes, images/class
6
Caltech 101 classification results (even better by combining cues..)
7
PASCAL Visual Object Challenge
(Everingham et al)
9
Builds on Dalal & Triggs HOG detector (2005)
10
Critique of the State of the Art
Performance is quite poor compared to that at 2d recognition tasks and the needs of many applications. Pose Estimation / Localization of parts or keypoints is even worse. We can’t isolate decent stick figures from radiance images, making use of depth data necessary. Progress has slowed down. Variations of HOG/Deformable part models dominate.
11
Recall saturates around 70%
12
Recall saturates around 70%
13
State of the Art in Reconstruction
Multiple photographs Range Sensors Kinect (PrimeSense) Velodyne Lidar Agarwal et al (2010) Critique: Semantic Segmentation is needed to make this more useful…
14
State of the Art in Reorganization
Interactive segmentation using graph cuts Berkeley gPb edges & regions Rother, Kolmogorov & Blake (2004), Boykov & Jolly (2001), Boykov, Veksler & Zabih(2001) Arbelaez et al (2009), Martin, Fowlkes, Malik (2004), Shi & Malik (2000) Critique: What is needed is fully automatic semantic segmentation
15
The Three R’s of Vision Recognition Reconstruction Reorganization
Each of the 6 directed arcs in this diagram is a useful direction of information flow
16
The Three R’s of Vision Recognition Reconstruction Reorganization
Shape as feature Superpixel assemblies as candidates Morphable shape models Reconstruction Reorganization
17
Next steps in recognition
Incorporate the “shape bias” known from child development literature to improve generalization This requires monocular computation of shape, as once posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours Top down templates should predict keypoint locations and image support, not just information about category Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise.
18
Next steps in recognition
Incorporate the “shape bias” known from child development literature This requires monocular computation of shape, as once posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours Top down templates should predict keypoint locations and image support, not just information about category Poselets: Bourdev & Malik, 2009 & later Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise. Barron & Malik, CVPR 2012 Arbelaez et al, CVPR 2012
19
Reconstruction Shape, Albedo & Illumination
20
Shape, Albedo, and Illumination from Shading
Jonathan Barron Jitendra Malik UC Berkeley
21
Forward Optics Far Near shape / depth
And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.
22
Forward Optics Far Near shape / depth illumination
And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.
23
log-shading image of Z and L
Forward Optics Far Near shape / depth log-shading image of Z and L illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.
24
Forward Optics Far Near shape / depth log-shading image of Z and L
illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance
25
Forward Optics Far Near shape / depth log-shading image of Z and L
illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance Lambertian reflectance in log-intensity
26
? ? ? ? Shape, Albedo, and Illumination from Shading SAIFS (“safes”)
Far ? ? ? Near shape / depth log-shading image of Z and L illumination ? And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance Lambertian reflectance in log-intensity
27
Assume illumination and albedo are known, and solve for the shape
Past Work Shape from Shading ? Assume illumination and albedo are known, and solve for the shape Intrinsic Images And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. ? ? Ignore shape and illumination, and classify edges as either shading or albedo
28
Given a single image, search for the most likely explanation
Problem Formulation Given a single image, search for the most likely explanation (shape, albedo, and illumination) that together exactly reproduces that image. Input: Output: Shape Albedo Shading Illumination
29
Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.
30
Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.
31
Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.
32
Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.
33
Evaluation: MIT intrinsic images data set
34
Evaluation: Real World Images
35
Evaluation: Real World Images
36
Evaluation: Real World Images
37
Evaluation: Graphics!
38
Evaluation: Graphics!
39
Evaluation: Graphics!
40
Evaluation: Graphics!
41
Semantic Segmentation using Regions and Parts
P. Arbeláez, B. Hariharan, S. Gupta, C. Gu, L. Bourdev and J. Malik
42
Top-down Part/Object Detectors
This Work Top-down Part/Object Detectors Bottom-up Region Segmentation 0.32 0.57 0.93 Cat Segmenter
43
Region Generation Hierarchical segmentation tree based on contrast
Hierarchy computed at three image resolutions Nodes of the trees are object candidates, and also pairs and triplets of adjacent nodes Output: For each image, a set of ~1000 object candidates (connected regions)
44
Results on PASCAL VOC
45
How to think about Vision
“Theory” Models Feature Histograms Support Vector machines Randomized decision trees Spectral partitioning L1 minimization Stochastic Grammars Deep Learning Markov Random Fields … Recognition Reorganization Reconstruction
46
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.