The Three R’s of Vision Jitendra Malik.

Slides:



Advertisements
Similar presentations
Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
Advertisements

Classification using intersection kernel SVMs is efficient
Computer Vision Group UC Berkeley How should we combine high level and low level knowledge? Jitendra Malik UC Berkeley Recognition using regions is joint.
Poselets: Body Part Detectors trained Using 3D Human Pose Annotations Lubomir Bourdev & Jitendra Malik ICCV 2009.
The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.
Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Shape Sharing for Object Segmentation
Recovering Human Body Configurations: Combining Segmentation and Recognition Greg Mori, Xiaofeng Ren, and Jitentendra Malik (UC Berkeley) Alexei A. Efros.
Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
Computer Vision Detecting the existence, pose and position of known objects within an image Michael Horne, Philip Sterne (Supervisor)
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
Fast intersection kernel SVMs for Realtime Object Detection
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
The Visual Recognition Machine Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
A Study of Approaches for Object Recognition
CS 561, Sessions 27 1 Towards intelligent machines Thanks to CSCI561, we now know how to… - Search (and play games) - Build a knowledge base using FOL.
Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.
Region-based Voting Exemplar 1 Query 1 Exemplar 2.
1 Learning to Detect Natural Image Boundaries David Martin, Charless Fowlkes, Jitendra Malik Computer Science Division University of California at Berkeley.
CVR05 University of California Berkeley 1 Familiar Configuration Enables Figure/Ground Assignment in Natural Scenes Xiaofeng Ren, Charless Fowlkes, Jitendra.
Computational Vision Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
Presentation By Michael Tao and Patrick Virtue. Agenda History of the problem Graph cut background Compute graph cut Extensions State of the art Continued.
Perceptual Organization: Segmentation and Optical Flow.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.
Convergence of vision and graphics Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.
What, Where & How Many? Combining Object Detectors and CRFs
3D Scene Models Object recognition and scene understanding Krista Ehinger.
Generic object detection with deformable part-based models
Graph-based Segmentation
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/31/15.
A Bayesian Approach For 3D Reconstruction From a Single Image
COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013.
Perception Introduction Pattern Recognition Image Formation
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.
1 Contours and Junctions in Natural Images Jitendra Malik University of California at Berkeley (with Jianbo Shi, Thomas Leung, Serge Belongie, Charless.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Detection, Segmentation and Fine-grained Localization
12/7/10 Looking Back, Moving Forward Computational Photography Derek Hoiem, University of Illinois Photo Credit Lee Cullivan.
Object detection, deep learning, and R-CNNs
Fully Convolutional Networks for Semantic Segmentation
Grammars in computer vision
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Computational Vision Jitendra Malik University of California, Berkeley.
Representation in Vision Derek Hoiem CS 598, Spring 2009 Jan 22, 2009.
1 Review and Summary We have covered a LOT of material, spending more time and more detail on 2D image segmentation and analysis, but hopefully giving.
Image segmentation.
Jo˜ao Carreira, Abhishek Kar, Shubham Tulsiani and Jitendra Malik University of California, Berkeley CVPR2015 Virtual View Networks for Object Reconstruction.
Structure Recovery by Part Assembly Chao-Hui Shen, Hongbo Fu, Kang Chen and Shi-Min Hu Tsinghua University City University of Hong Kong Presented by: Chenyang.
Recent developments in object detection
- photometric aspects of image formation gray level images
Object detection with deformable part-based models
Nonparametric Semantic Segmentation
Part-Based Room Categorization for Household Service Robots
Object detection as supervised classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Finding Clusters within a Class to Improve Classification Accuracy
Computer Vision James Hays
Image Based Modeling and Rendering (PI: Malik)
“The Truth About Cats And Dogs”
KFC: Keypoints, Features and Correspondences
“Traditional” image segmentation
Presentation transcript:

The Three R’s of Vision Jitendra Malik

The Three R’s of Vision Recognition Reorganization Reconstruction

Recognition, Reconstruction & Reorganization

Review Recognition Reconstruction Reorganization 2D problems such as handwriting recognition, face detection Partial progress on 3d object category recognition Reconstruction Feature matching + multiple view geometry has led to city scale point cloud reconstructions Reorganization Graphcuts for interactive segmentation Bottom up boundaries and regions/superpixels

Caltech-101 [Fei-Fei et al. 04] 102 classes, 31-300 images/class

Caltech 101 classification results (even better by combining cues..)

PASCAL Visual Object Challenge (Everingham et al)

Builds on Dalal & Triggs HOG detector (2005)

Critique of the State of the Art Performance is quite poor compared to that at 2d recognition tasks and the needs of many applications. Pose Estimation / Localization of parts or keypoints is even worse. We can’t isolate decent stick figures from radiance images, making use of depth data necessary. Progress has slowed down. Variations of HOG/Deformable part models dominate.

Recall saturates around 70%

Recall saturates around 70%

State of the Art in Reconstruction Multiple photographs Range Sensors Kinect (PrimeSense) Velodyne Lidar Agarwal et al (2010) Critique: Semantic Segmentation is needed to make this more useful…

State of the Art in Reorganization Interactive segmentation using graph cuts Berkeley gPb edges & regions Rother, Kolmogorov & Blake (2004), Boykov & Jolly (2001), Boykov, Veksler & Zabih(2001) Arbelaez et al (2009), Martin, Fowlkes, Malik (2004), Shi & Malik (2000) Critique: What is needed is fully automatic semantic segmentation

The Three R’s of Vision Recognition Reconstruction Reorganization Each of the 6 directed arcs in this diagram is a useful direction of information flow

The Three R’s of Vision Recognition Reconstruction Reorganization Shape as feature Superpixel assemblies as candidates Morphable shape models Reconstruction Reorganization

Next steps in recognition Incorporate the “shape bias” known from child development literature to improve generalization This requires monocular computation of shape, as once posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours Top down templates should predict keypoint locations and image support, not just information about category Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise.

Next steps in recognition Incorporate the “shape bias” known from child development literature This requires monocular computation of shape, as once posited in the 2.5D sketch, and distinguishing albedo and illumination changes from geometric contours Top down templates should predict keypoint locations and image support, not just information about category Poselets: Bourdev & Malik, 2009 & later Recognition and figure-ground inference need to co-evolve. Occlusion is signal, not noise. Barron & Malik, CVPR 2012 Arbelaez et al, CVPR 2012

Reconstruction Shape, Albedo & Illumination

Shape, Albedo, and Illumination from Shading Jonathan Barron Jitendra Malik UC Berkeley

Forward Optics Far Near shape / depth And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.

Forward Optics Far Near shape / depth illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.

log-shading image of Z and L Forward Optics Far Near shape / depth log-shading image of Z and L illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I.

Forward Optics Far Near shape / depth log-shading image of Z and L illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance

Forward Optics Far Near shape / depth log-shading image of Z and L illumination And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance Lambertian reflectance in log-intensity

? ? ? ? Shape, Albedo, and Illumination from Shading SAIFS (“safes”) Far ? ? ? Near shape / depth log-shading image of Z and L illumination ? And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. log-albedo / log-reflectance Lambertian reflectance in log-intensity

Assume illumination and albedo are known, and solve for the shape Past Work Shape from Shading ? Assume illumination and albedo are known, and solve for the shape Intrinsic Images And this is the image produced from Z and A by Lambertian reflectance. We just take the shape Z, render it to produce the shading image S(Z), and multiply that shading image by the albedo map A, and we end up with this image I. ? ? Ignore shape and illumination, and classify edges as either shading or albedo

Given a single image, search for the most likely explanation Problem Formulation Given a single image, search for the most likely explanation (shape, albedo, and illumination) that together exactly reproduces that image. Input: Output: Shape Albedo Shading Illumination

Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

Some Explanations The first half of that model we’ll call f(Z), which is the negative log-likelihood of a statistical model of depth maps. This can be thought of as a “cost function” on shapes, which is large when shapes are unnatural, and 0 when shapes are as natural as possible.

Evaluation: MIT intrinsic images data set

Evaluation: Real World Images

Evaluation: Real World Images

Evaluation: Real World Images

Evaluation: Graphics!

Evaluation: Graphics!

Evaluation: Graphics!

Evaluation: Graphics!

Semantic Segmentation using Regions and Parts P. Arbeláez, B. Hariharan, S. Gupta, C. Gu, L. Bourdev and J. Malik

Top-down Part/Object Detectors This Work Top-down Part/Object Detectors Bottom-up Region Segmentation 0.32 0.57 0.93 Cat Segmenter

Region Generation Hierarchical segmentation tree based on contrast Hierarchy computed at three image resolutions Nodes of the trees are object candidates, and also pairs and triplets of adjacent nodes Output: For each image, a set of ~1000 object candidates (connected regions)

Results on PASCAL VOC

How to think about Vision “Theory” Models Feature Histograms Support Vector machines Randomized decision trees Spectral partitioning L1 minimization Stochastic Grammars Deep Learning Markov Random Fields … Recognition Reorganization Reconstruction

Thanks!