Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Three R’s of Vision Jitendra Malik.

Similar presentations


Presentation on theme: "The Three R’s of Vision Jitendra Malik."— Presentation transcript:

1 The Three R’s of Vision Jitendra Malik

2 The Three R’s of Vision Recognition Reorganization Reconstruction

3 Recognition, Reconstruction & Reorganization

4 Review Recognition Reconstruction Reorganization
2D problems such as handwriting recognition, face detection Partial progress on 3d object category recognition Reconstruction Feature matching + multiple view geometry has led to city scale point cloud reconstructions Reorganization Graphcuts for interactive segmentation Bottom up boundaries and regions/superpixels

5 Caltech-101 [Fei-Fei et al. 04]
102 classes, images/class

6 Caltech 101 classification results (even better by combining cues..)

7 PASCAL Visual Object Challenge
(Everingham et al)

8

9 Builds on Dalal & Triggs HOG detector (2005)

10 Critique of the State of the Art
Performance is quite poor compared to that at 2d recognition tasks and the needs of many applications. Pose Estimation / Localization of parts or keypoints is even worse. We can’t isolate decent stick figures from radiance images, making use of depth data necessary. Progress has slowed down. Variations of HOG/Deformable part models dominate.

11 Recall saturates around 70%

12 Recall saturates around 70%

13 State of the Art in Reconstruction
Multiple photographs Range Sensors Kinect (PrimeSense) Velodyne Lidar Agarwal et al (2010) Critique: Semantic Segmentation is needed to make this more useful…

14 State of the Art in Reorganization
Interactive segmentation using graph cuts Berkeley gPb edges & regions Rother, Kolmogorov & Blake (2004), Boykov & Jolly (2001), Boykov, Veksler & Zabih(2001) Arbelaez et al (2009), Martin, Fowlkes, Malik (2004), Shi & Malik (2000) Critique: What is needed is fully automatic semantic segmentation

15 The Three R’s of Vision Recognition Reconstruction Reorganization
Each of the 6 directed arcs in this diagram is a useful direction of information flow

16 The Three R’s of Vision Recognition Reconstruction Reorganization
Shape as feature Superpixel assemblies as candidates Morphable shape models Reconstruction Reorganization

17 Next steps in recognition
Incorporate the “shape bias” known from child development literature to improve generalization This requires monocular computation of shape, as once posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours Top down templates should predict keypoint locations and image support, not just information about category Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise.

18 Next steps in recognition
Incorporate the “shape bias” known from child development literature This requires monocular computation of shape, as once posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours Top down templates should predict keypoint locations and image support, not just information about category Poselets: Bourdev & Malik, 2009 & later Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise. Barron & Malik, CVPR 2012 Arbelaez et al, CVPR 2012

19 Reconstruction Shape, Albedo & Illumination

20 Shape, Albedo, and Illumination from Shading
Jonathan Barron Jitendra Malik UC Berkeley

21 Forward Optics Far Near shape / depth
And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.

22 Forward Optics Far Near shape / depth illumination
And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.

23 log-shading image of Z and L
Forward Optics Far Near shape / depth log-shading image of Z and L illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.

24 Forward Optics Far Near shape / depth log-shading image of Z and L
illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance

25 Forward Optics Far Near shape / depth log-shading image of Z and L
illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance Lambertian reflectance in log-intensity

26 ? ? ? ? Shape, Albedo, and Illumination from Shading SAIFS (“safes”)
Far ? ? ? Near shape / depth log-shading image of Z and L illumination ? And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance Lambertian reflectance in log-intensity

27 Assume illumination and albedo are known, and solve for the shape
Past Work Shape from Shading ? Assume illumination and albedo are known, and solve for the shape Intrinsic Images And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. ? ? Ignore shape and illumination, and classify edges as either shading or albedo

28 Given a single image, search for the most likely explanation
Problem Formulation Given a single image, search for the most likely explanation (shape, albedo, and illumination) that together exactly reproduces that image. Input: Output: Shape Albedo Shading Illumination

29 Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

30 Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

31 Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

32 Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

33 Evaluation: MIT intrinsic images data set

34 Evaluation: Real World Images

35 Evaluation: Real World Images

36 Evaluation: Real World Images

37 Evaluation: Graphics!

38 Evaluation: Graphics!

39 Evaluation: Graphics!

40 Evaluation: Graphics!

41 Semantic Segmentation using Regions and Parts
P. Arbeláez, B. Hariharan, S. Gupta, C. Gu, L. Bourdev and J. Malik

42 Top-down Part/Object Detectors
This Work Top-down Part/Object Detectors Bottom-up Region Segmentation 0.32 0.57 0.93 Cat Segmenter

43 Region Generation Hierarchical segmentation tree based on contrast
Hierarchy computed at three image resolutions Nodes of the trees are object candidates, and also pairs and triplets of adjacent nodes Output: For each image, a set of ~1000 object candidates (connected regions)

44 Results on PASCAL VOC

45 How to think about Vision
“Theory” Models Feature Histograms Support Vector machines Randomized decision trees Spectral partitioning L1 minimization Stochastic Grammars Deep Learning Markov Random Fields Recognition Reorganization Reconstruction

46 Thanks!


Download ppt "The Three R’s of Vision Jitendra Malik."

Similar presentations


Ads by Google