Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.

Slides:

Advertisements

Similar presentations

CSC2535: 2013 Advanced Machine Learning Lecture 8b Image retrieval using multilayer neural networks Geoffrey Hinton.

Advertisements

Aggregating local image descriptors into compact codes

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

CS590M 2008 Fall: Paper Presentation

Presented by Xinyu Chang

Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.

Presented by Relja Arandjelović The Power of Comparative Reasoning University of Oxford 29 th November 2011 Jay Yagnik, Dennis Strelow, David Ross, Ruei-sung.

Presented by Arshad Jamal, Rajesh Dhania, Vinkal Vishnoi Active hashing and its application to image and text retrieval Yi Zhen, Dit-Yan Yeung, Published.

Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.

Presented by Relja Arandjelović Iterative Quantization: A Procrustean Approach to Learning Binary Codes University of Oxford 21 st September 2011 Yunchao.

Large-scale matching CSE P 576 Larry Zitnick

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

Recognition: A machine learning approach

Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.

Opportunities of Scale Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)

CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Radial-Basis Function Networks

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Efficient Image Search and Retrieval using Compact Binary Codes

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

Indexing Techniques Mei-Chen Yeh.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

CSE 185 Introduction to Computer Vision Pattern Recognition.

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

Nearest Neighbor Paul Hsiung March 16, Quick Review of NN Set of points P Query point q Distance metric d Find p in P such that d(p,q) < d(p’,q)

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Minimal Loss Hashing for Compact Binary Codes

UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Scene Completion Using Millions of Photographs James Hays, Alexei A. Efros Carnegie Mellon University ACM SIGGRAPH 2007.

Methods for classification and image representation

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Outline Problem Background Theory Extending to NLP and Experiment

Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU) Jon Barron (NYU/UC Berkeley) Antonio Torralba (MIT) Yair Weiss (Hebrew.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.

SUN Database: Large-scale Scene Recognition from Abbey to Zoo Jianxiong Xiao *James Haysy Krista A. Ehinger Aude Oliva Antonio Torralba Massachusetts Institute.

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Naifan Zhuang, Jun Ye, Kien A. Hua

Another Example: Circle Detection

Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton

Compact Bilinear Pooling

CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.

Efficient Image Classification on Vertically Decomposed Data

Multimodal Learning with Deep Boltzmann Machines

Probabilistic Data Management

Saliency detection Donghun Yeo CV Lab..

Basic machine learning background with Python scikit-learn

Structure learning with deep autoencoders

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Efficient Image Classification on Vertically Decomposed Data

Rob Fergus Computer Vision

CSC 578 Neural Networks and Deep Learning

Presentation transcript:

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Outline Introduction Methods Experiment Conclusion

Outline Introduction Methods Experiment Conclusion

Summary Goal – efficient image search(real time on web-sized) and fast, just require little memory, enable on standard hardware or handheld devices Approach – Use machine learning to convert Gist descriptor to a compact binary code with a few hundred bits per image

Gist descriptor Global image representation Describe the shapes occurring in an image with one descriptor – Subdivide image in 4×4 sub images – Calculate Gabor responses in each of these – Create histograms of Gabor responses in each sub image Slide by James Hays and Alexei Efros

Gist descriptor Slide by James Hays and Alexei Efros

Gist descriptor In this paper – 8 orientations,4 frequency = 4×8×16 = 512 dimensional vector. – For smaller images (32×32 pixels), use 3 frequency = 3×8×16 = 384 dimensions.

Binary Code Three reason – compression, it’s possible to represent images with a very small number of bits and still maintain the information for recognition

Binary Code – scaling up to web-size databases requires doing the calculations in memory. Fitting hundreds of millions of images into a few GB of memory means we have a budget of very few bytes per image. – short binary codes allow very fast querying in standard hardware, either using hash tables or efficient bit-count operations

Locality Sensitive Hashing (LSH) high dimensional Euclidean space – finds nearest neighbors in constant time a number of random projections of that point into R1 – each projection contributes a few bits when the number of bits is fixed and small – LSH can perform quite poorly In this paper – N = 30 bits

Outline Introduction Methods Experiment Conclusion

Learning binary codes A database of images {xi} a distance function D(i, j) a binary feature vector yi = f(xi) Hamming distance N100(xi) - the 100 nearest neighbors of xi according to the distance function D(i, j) N100(yi) - the 100 descriptors yj that are closest to yi in terms of Hamming distance we would like N100(xi) = N100(yi) for all examples in our training set

BoostSSC Boosting similarity sensitive coding Learn original input space into a new space – distances between images can be computed using a weighted Hamming distance. Binary feature(M bits) – weighted Hamming distance –

BoostSSC positive examples – pairs of images xi, xj, j ∈ N(xi). Negative examples – pairs of images that are not neighbors regression stump –

BoostSSC Minimize the square loss – – K is the number of training pairs – Zk = 1, if the two images are neighbors; = −1, otherwise – In this paper – – M around 30 bits

Restricted Boltzmann Machines Network of binary stochastic units – weights W, bias b Hidden units: h Symmetric weights: w Visible units: v

Restricted Boltzmann Machines A probability can be assigned to a binary vector at the visible units – Convenient conditional distributions – – Learn weights and biases using Contrastive Divergence

Multi‐Layer RBM architecture

Training RBM models Pre‐training – Unsupervised – Use Contrastive Divergence to learn weights and biases – Gets parameters to right ballpark Fine‐tuning – Supervised – No longer stochastic – Backpropagate error to update parameters – Moves parameters to local minimum

Outline Introduction Methods Experiment Conclusion

Two test datasets LabelMe – 22,000 images – Ground truth segmentations for all – Can define distance between images using these segmentations Web data[28] – 12.9 million images 32 × 32 colorimages – Subset of 80 million images – No labels, so use L2 distance between GIST vectors as ground truth [28] A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR , Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 2007.

LabelMe retrieval

what ground truth semantic similarity is – spatial pyramid matching over object labels

LabelMe retrieval

On 2000 test images, N = 50

Web images retrieval

Retrieval speed evaluation Using multi-threading (M/T) on a quad-core

Pixel label On 2000 test images

Web images recognition On 2000 test images

Outline Introduction Methods Experiment Conclusion

Possible to build compact codes for retrieval – Fast and small on standard PC – Suitable for use on large database – Much room for improvement