HOGgles Visualizing Object Detection Features

Slides:

Advertisements

Similar presentations

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.

Jan-Michael Frahm, Enrique Dunn Spring 2013

Histograms of Oriented Gradients for Human Detection

Presented by Xinyu Chang

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Lecture 31: Modern object recognition

LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.

HOGgles: Visualizing Object Detection Features (to be appeared in ICCV 2013) Carl Vondrick Aditya Khosla Tomasz Malisiewicz and Antonio Torralba,MIT Presented.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Detecting Pedestrians by Learning Shapelet Features

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Distinctive Image Feature from Scale-Invariant KeyPoints

Object Detection using Histograms of Oriented Gradients

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Generic object detection with deformable part-based models

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

Computer vision.

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Object detection, deep learning, and R-CNNs

Histograms of Oriented Gradients for Human Detection(HOG)

Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Recognition Using Visual Phrases

Presented by David Lee 3/20/2006

Carl Vondrick, Aditya Khosla, Tomasz Malisiewicz, Antonio Torralba Massachusetts Institute of Technology

776 Computer Vision Jan-Michael Frahm Spring 2012.

More sliding window detection: Discriminative part-based models

Object Recognizing. Object Classes Individual Recognition.

Recognizing specific objects Matching with SIFT Original suggestion Lowe, 1999,2004.

776 Computer Vision Jan-Michael Frahm Spring 2012.

SIFT Scale-Invariant Feature Transform David Lowe

Cascade for Fast Detection

Object detection with deformable part-based models

Presented by David Lee 3/20/2006

Data Driven Attributes for Action Detection

Lit part of blue dress and shadowed part of white dress are the same color

Object detection, deep learning, and R-CNNs

Recap: Advanced Feature Encoding

Recognition using Nearest Neighbor (or kNN)

Feature description and matching

Digit Recognition using SVMS

Object detection as supervised classification

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

A Tutorial on HOG Human Detection

An HOG-LBP Human Detector with Partial Occlusion Handling

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

The SIFT (Scale Invariant Feature Transform) Detector and Descriptor

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Feature descriptors and matching

Presented by Xu Miao April 20, 2005

Presentation transcript:

HOGgles Visualizing Object Detection Features C. Vondrick, A. Khosla, T. Malisiewicz, A. Torralba ICCV, 2013. presented by Ezgi Mercan

Object Detection Failures Why do our detectors think water looks like a car? Data set? Machine learning method? Features? A high scoring car detection from DPM. Discriminatively trained part-based models. Computers do not actually see the RGB images, they only see the feature vectors we give them. What do computers see in the features? 11/14/2018 CSE590V

Object Detection Failures Chair Object Categories Aeroplane Table Bicycle Dog Bird Horse Boat Motorbike Bottle Person Bus Potted plant Car Sheep Cat Sofa Chair Train Cow TV/ monitor Person Car 11/14/2018 CSE590V

Reconstructing an Image calculate SIFT elliptic region of interest affine normalization to square patch local descriptor original image copied patches blending interpolation For each descriptor, search its nearest neighbor descriptor and warp the corresponding image patch such that it fits into destination ellipse. Seamlessly stitch all the patches, blend and complete the missing parts. Voila! Reconstructing an image from its local descriptors, Philippe Weinzaepfel, Hervé Jégou and Patrick Pérez, Proc. IEEE CVPR’11. 11/14/2018 CSE590V

Histogram of Oriented Gradients Used by Dalal and Triggs for human detection in their 2005 CVPR paper. -Gradient computation: 1D derivative mask in both horizontal and vertical directions. (Also can use more complex masks, such as 3x3 Sobel masks or diagonal masks) -Orientation binning: Create cells, each pixel in a cell casts a weighted vote based on the values of gradient. Gradient can be signed or unsigned, bins could be spread over 0-180 degrees or 0-360 degrees. Vote can be the gradient value itself or a function of gradient. -Descriptor blocks: Create overlapping blocks, rectangular or circular, group cells into overlapping blocks. -Block normalization: Not really improves performance in human detection case. What works best is 3x3 cell blocks of 6x6 pixel cells with 9 histogram channels. Or 2 radial bins with 4 angular bins, central radius of 4 pixels and an expansion factor of 2. original image R-HOG descriptor (HOG glyph) R-HOG descriptor weighted by positive SVM weights R-HOG descriptor weighted by negative SVM weights Histograms of oriented gradients for human detection, Dalal, N., Triggs, B., CVPR’05. 11/14/2018 CSE590V

Algorithms Image: 𝑥∈ ℝ 𝐷 HOG descriptor: 𝑦=∅ 𝑥 ∈ ℝ 𝑑 HOG inverse: ∅ −1 𝑦 Exemplar LDA ∅ 𝐴 −1 𝑦 = 1 𝐾 𝑖=1 𝐾 𝑧 𝑖 Ridge Regression ∅ 𝐵 −1 𝑦 = 𝑋𝑌 𝑌𝑌 −1 𝑦− 𝜇 𝑌 + 𝜇 𝑋 Exemplar LDA: Train an ELDA detector for the query. Then using the detector, find top K detections from the database and average them. Very simple, produce reasonably good results but very slow because it requires running an object detector across a large database. Ridge Regression: Assume P( image, HOG) is Gaussian, and calculate P( image | HOG). It’s single matrix multiplication, under one second. Statistically most likely image. Very fast after calculating covariance and means. Inversions are blurry because HOG is invariant to shifts up to its bin size. That’s why an inverted patch is the average of all shifted patches that produce the same HOG descriptor. 11/14/2018 CSE590V

Algorithms Direct Optimization Image: 𝑥∈ ℝ 𝐷 HOG descriptor: 𝑦∈ ℝ 𝑑 Image basis: 𝑈∈ ℝ 𝐷×𝐾 Coefficients: 𝜌∈ ℝ 𝐾 ; 𝑥=𝑈𝜌 ∅ 𝐶 −1 𝑦 =𝑈 𝜌 ∗ 𝜌 ∗ = argmin 𝜌∈ ℝ 𝐾 ∅ 𝑈𝜌 −𝑦 2 2 Direct Optimization: Use only images that span a natural image basis (?). Use first K eigenvectors. Any image can be encoded by coefficients of these basis. Noisy because recovers high frequency details. 11/14/2018 CSE590V

Algorithms Paired Dictionary Learning Image: 𝑥∈ ℝ 𝐷 HOG descriptor: 𝑦∈ ℝ 𝑑 Image basis: 𝑈∈ ℝ 𝐷×𝐾 HOG basis: 𝑉∈ ℝ 𝑑×𝐾 Coefficients: 𝛼∈ ℝ 𝐾 ; 𝑥=𝑈𝛼 𝑎𝑛𝑑 𝑦=𝑉𝛼 ∅ 𝐷 −1 𝑦 =𝑈 𝛼 ∗ 𝛼 ∗ = argmin 𝛼∈ ℝ 𝐾 𝑉𝛼−𝑦 2 2 s.t. 𝛼 1 ≤𝜆 Paired dictionaries require finding appropriate bases U and V. Efficient solvers exist for alpha, it takes under 2 seconds to calculate it on a quad core CPU. 11/14/2018 CSE590V

Paired Dictionary Learning HOG feature 𝑦 HOG Basis 𝑉 Image Basis 𝑈 HOG Inversion ∅ 𝐷 −1 𝑦 11/14/2018 CSE590V

Paired Dictionary Learning Solving paired dictionary learning problem: argmin 𝑈,𝑉,𝛼 𝑖=1 𝑁 ( 𝑥 𝑖 −𝑈 𝑎 𝑖 2 2 + 𝜙(𝑥 𝑖 )−𝑉 𝑎 𝑖 2 2 ) s.t. 𝑎 𝑖 1 ≤𝜆 ∀𝑖, 𝑈 2 2 ≤ 𝛾 1 , 𝑉 2 2 ≤ 𝛾 2 some learned pairs of dictionaries for 𝑈 and 𝑉 It takes a few hours to learn U and V with dictionary size K = 10^3, and training samples N = 10^6. The objective functions can be simplified to standard sparse coding and dictionary learning problem with concatenated dictionaries. They optimize it using SPAMS. 11/14/2018 CSE590V

Feature Visualization Original ELDA Ridge Regression Direct Optimization Paired Dict. Learning 11/14/2018 CSE590V

Pixel level comparison Pixel level comparison. Compare the inverted HOG image to the original image pixel by pixel. LDA does the best. 11/14/2018 CSE590V

People look at inverted object windows and try to categorize People look at inverted object windows and try to categorize. They have a hard time categorizing glyph images. Expert: MIT vision lab students look at glyphs and try to classify. Mechanical Turks do better with direct optimization and paired dictionary than glyphs, and even experts on glyphs. 11/14/2018 CSE590V

Deformable Parts Model Does HOG capture color? Potted Plant Chair The famous object detector. Train paired dictionaries on RGB images. Outdoor scenes produce reasonable results probably because those categories are strongly correlated to HOG descriptors. But indoor scenes with many man-made objects are arbitrarily colored probably because of many to one nature of HOG descriptor and the high variance of image patches that produce the same HOG descriptor. Object detection with discriminatively trained part-based models, P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. PAMI, 2010. 11/14/2018 CSE590V

Computers can see better than us Because HOG is invariant to illumination. It normalizes patches locally. 11/14/2018 CSE590V

web.mit.edu/vondrick/ihog/ Questions? Visit web.mit.edu/vondrick/ihog/ for more cool stuff. 11/14/2018 CSE590V