CS 1674: Intro to Computer Vision Scene Recognition

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.
Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Computer Vision – Image Representation (Histograms)
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Discriminative and generative methods for bags of features
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Agenda Introduction Bag-of-words model Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.
Bag-of-features models
Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.
Pyramids of Features For Categorization Greg Griffin and Will Coulter (see Lazebnik et al., CVPR 2006, too)
Discriminative and generative methods for bags of features
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Exercise Session 10 – Image Categorization
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,
Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
In Defense of Nearest-Neighbor Based Image Classification Oren Boiman The Weizmann Institute of Science Rehovot, ISRAEL Eli Shechtman Adobe Systems Inc.
Why is computer vision difficult?
A Sparse Texture Representation Using Affine-Invariant Regions Svetlana Lazebnik, Jean Ponce Svetlana Lazebnik, Jean Ponce Beckman Institute University.
Modeling the Shape of a Scene: Seeing the trees as a forest Scene Understanding Seminar
Methods for classification and image representation
Hierarchical Matching with Side Information for Image Classification
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
CS 1699: Intro to Computer Vision Bias-Variance Trade-off + Other Models and Problems Prof. Adriana Kovashka University of Pittsburgh November 3, 2015.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Lecture 15: Eigenfaces CS6670: Computer Vision Noah Snavely.
CS654: Digital Image Analysis
Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.
NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.
Lecture IX: Object Recognition (2)
Prof. Adriana Kovashka University of Pittsburgh September 14, 2016
Tues April 25 Kristen Grauman UT Austin
Learning Mid-Level Features For Recognition
Face Recognition and Feature Subspaces
Paper Presentation: Shape and Matching
Mixtures of Gaussians and Advanced Feature Encoding
Digit Recognition using SVMS
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Object detection as supervised classification
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes
CS 2770: Computer Vision Intro to Visual Recognition
CS 2750: Machine Learning Support Vector Machines
Brief Review of Recognition + Context
KFC: Keypoints, Features and Correspondences
SIFT keypoint detection
A Graph-Matching Kernel for Object Categorization
Presentation transcript:

CS 1674: Intro to Computer Vision Scene Recognition Prof. Adriana Kovashka University of Pittsburgh October 26, 2016

Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories CVPR 2006 Svetlana Lazebnik (slazebni@uiuc.edu) Beckman Institute, University of Illinois at Urbana-Champaign Cordelia Schmid (cordelia.schmid@inrialpes.fr) INRIA Rhône-Alpes, France Jean Ponce (ponce@di.ens.fr) Ecole Normale Supérieure, France http://www-cvr.ai.uiuc.edu/ponce_grp

Scene category dataset Fei-Fei & Perona (2005), Oliva & Torralba (2001) http://www-cvr.ai.uiuc.edu/ponce_grp/data Slide credit: L. Lazebnik

Bags of words Slide credit: L. Lazebnik

Bag-of-words steps Extract local features Learn “visual vocabulary” using clustering Quantize local features using visual vocabulary Represent images by frequencies of “visual words” Slide credit: L. Lazebnik

Feature extraction (on which BOW is based) Weak features Strong features Edge points at 2 scales and 8 orientations (vocabulary size 16) SIFT descriptors of 16x16 patches sampled on a regular grid, quantized to form visual vocabulary (size 200, 400) Slide credit: L. Lazebnik

Local feature extraction … Slide credit: Josef Sivic

Learning the visual vocabulary … Slide credit: Josef Sivic

Learning the visual vocabulary … Clustering Slide credit: Josef Sivic

Learning the visual vocabulary … Clustering Slide credit: Josef Sivic

Image categorization with bag of words Training Compute bag-of-words representation for training images Train classifier on labeled examples using histogram values as features Labels are the scene types (e.g. mountain vs field) Testing Extract keypoints/descriptors for test images Quantize into visual words using the clusters computed at training time Compute visual word histogram for test images Compute labels on test images using classifier obtained at training time Measure accuracy of test predictions by comparing them to ground-truth test labels (obtained from humans) Adapted from D. Hoiem

What about spatial layout? So far, global histogram over the whole image discard spatial information Images contain structures and looking at a scene spatial distribution of feature we want to capture but we throw them away Result in the fact that I can construct a very different image and still get the same histogram This image is obvious a scene on grassland but with the noisy and gradient image on the bottom, color histogram All of these images have the same color histogram Slide credit: D. Hoiem

Spatial pyramid Compute histogram in each spatial bin Overcome this, global, pyramid of histogram Split into spatial bins at several levels and compute the histogram in each of them Compute histogram in each spatial bin Slide credit: D. Hoiem

Spatial pyramid [Lazebnik et al. CVPR 2006] In more details, we still have the global histogram, but then I also have histograms computed in this 2x2 grid and histogram computed in 4x4 grid Each of the histograms are individual normalized. They are weighted because you want each level to have equal weights, and combined together. This spatial pyramid bag of bow is often considered as the goto feature to image categorization, works best for scenes, also works for most object. One critism high dimensionality and you can deal with by doing PCA dimension reduction [Lazebnik et al. CVPR 2006] Slide credit: D. Hoiem

Pyramid matching Indyk & Thaper (2003), Grauman & Darrell (2005) Matching using pyramid and histogram intersection for some particular visual word: xi xj Original images Feature histograms: Level 3 Level 2 Level 1 Level 0 Total weight (value of pyramid match kernel): K( xi , xj ) Adapted from L. Lazebnik

Scene category dataset Fei-Fei & Perona (2005), Oliva & Torralba (2001) http://www-cvr.ai.uiuc.edu/ponce_grp/data Multi-class classification results (100 training images per class) Fei-Fei & Perona: 65.2% Slide credit: L. Lazebnik

Scene category confusions Difficult indoor images kitchen living room bedroom Slide credit: L. Lazebnik

Caltech101 dataset Fei-Fei et al. (2004) http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Multi-class classification results (30 training images per class) Slide credit: L. Lazebnik