Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition.

Slides:

Advertisements

Similar presentations

CSC2535: 2013 Advanced Machine Learning Lecture 8b Image retrieval using multilayer neural networks Geoffrey Hinton.

Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Aggregating local image descriptors into compact codes

Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.

VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.

Searching on Multi-Dimensional Data

MIT CSAIL Vision interfaces Towards efficient matching with random hashing methods… Kristen Grauman Gregory Shakhnarovich Trevor Darrell.

Organizing a spectral image database by using Self-Organizing Maps Research Seminar Oili Kohonen.

ImageNet Classification with Deep Convolutional Neural Networks

Similarity Search in High Dimensions via Hashing

Query Specific Fusion for Image Retrieval

Presented by Arshad Jamal, Rajesh Dhania, Vinkal Vishnoi Active hashing and its application to image and text retrieval Yi Zhen, Dit-Yan Yeung, Published.

Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.

Large dataset for object and scene recognition A. Torralba, R. Fergus, W. T. Freeman 80 million tiny images Ron Yanovich Guy Peled.

Large-scale matching CSE P 576 Larry Zitnick

Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.

Recognition: A machine learning approach

Non-metric affinity propagation for unsupervised image categorization Delbert Dueck and Brendan J. Frey ICCV 2007.

Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Large Image Databases and Small Codes for Object Recognition Rob Fergus (NYU) Antonio Torralba (MIT) Yair Weiss (Hebrew U.) William T. Freeman (MIT)

Similarity Search in High Dimensions via Hashing Aristides Gionis, Protr Indyk and Rajeev Motwani Department of Computer Science Stanford University presented.

Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.

1 Segmentation with Scene and Sub-Scene Categories Joseph Djugash Input Image Scene/Sub-Scene Classification Segmentation.

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Y. Weiss (Hebrew U.) A. Torralba (MIT) Rob Fergus (NYU)

Opportunities of Scale, Part 2 Computer Vision James Hays, Brown Many slides from James Hays, Alyosha Efros, and Derek Hoiem Graphic from Antonio Torralba.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Radial-Basis Function Networks

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Efficient Image Search and Retrieval using Compact Binary Codes

Indexing Techniques Mei-Chen Yeh.

Image Based Positioning System Ankit Gupta Rahul Garg Ryan Kaminsky.

CSE 185 Introduction to Computer Vision Pattern Recognition.

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

This week: overview on pattern recognition (related to machine learning)

Internet-scale Imagery for Graphics and Vision James Hays cs195g Computational Photography Brown University, Spring 2010.

Human abilities Presented By Mahmoud Awadallah 1.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Minimal Loss Hashing for Compact Binary Codes

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Linear Discrimination Reading: Chapter 2 of textbook.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

© 2009 IBM Corporation IBM Research Xianglong Liu 1, Yadong Mu 2, Bo Lang 1 and Shih-Fu Chang 2 1 Beihang University, Beijing, China 2 Columbia University,

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.

Efficient Image Search and Retrieval using Compact Binary Codes Rob Fergus (NYU) Jon Barron (NYU/UC Berkeley) Antonio Torralba (MIT) Yair Weiss (Hebrew.

Convolutional Neural Network

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Learning Mid-Level Features For Recognition

Article Review Todd Hricik.

Multimodal Learning with Deep Boltzmann Machines

Machine Learning Basics

Recognition using Nearest Neighbor (or kNN)

Structure learning with deep autoencoders

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

K Nearest Neighbor Classification

Rob Fergus Computer Vision

INF 5860 Machine learning for image classification

Instance Based Learning

Minwise Hashing and Efficient Search

Presentation transcript:

Fast and Compact Retrieval Methods in Computer Vision Part II A. Torralba, R. Fergus and Y. Weiss. Small Codes and Large Image Databases for Recognition. CVPR 2008 Small Codes and Large Image Databases for Recognition A. Torralba, R. Fergus, W. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. TR80 million tiny images: a large dataset for non-parametric object and scene recognition. Presented by Ken and Ryan

Outline Large Datasets of Images Searching Large Datasets –Nearest Neighbor –ANN: Locality Sensitive Hashing Dimensionality Reduction –Boosting –Restricted Boltzmann Machines (RBM) Results

Goal Develop efficient image search and scene matching techniques that are fast and require very little memory Particularly on VERY large image sets Query

Motivation Image sets –Vogel & Schiele: 702 natural scenes in 6 cat –Olivia & Torralba: 2688 –Caltech 101: ~50 images/cat ~ 5000 –Caltech 256: images/cat ~ Why do we want larger datasets?

Motivation Classify any image Complex classification methods don’t extend well Can we use a simple classification method?

Thumbnail Collection Project Collect images for ALL objects –List obtained from WordNet –75,378 non-abstract nouns in English

Thumbnail Collection Project Collected 80M images

How Much is 80M Images? One feature-length movie: –105 min = 151K 24 FPS For 80M images, watch 530 movies How do we store this? –1k * 80M = 80 GB –Actual storage: 760GB

First Attempt Store each image as 32x32 color thumbnail Based on human visual perception Information: 32*32*3 channels =3072 entries

First Attempt Used SSD++ to find nearest neighbors of query image –Used first 19 principal components

Motivation Part 2 Is this good enough? SSD is naïve Still too much storage required How can we fix this? –Traditional methods of searching large datasets –Binary reduction

Locality-Sensitive Hash Families

LSH Example

Binary Reduction Lots of pixels 512 values32 bits Gist vector Binary reduction 164 GB 320 MB 80 million images?

Gist “The ‘gist’ is an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.)” A. Oliva and A. Torralba. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. Journal of Computer Vision, 42(3):145–175, 2001.

Gist

Gist vector

Query Image Dataset Querying

1  ?

6  ?

Boosting Positive and negative image pairs train the discovery of the binary reduction. & & = 1 = -1 80% negatives150K pairs

BoostSSC Similarity Sensitive Coding Weights start uniformly xixi N values Weight 

BoostSSC For each bit m: –Choose the index n that minimizes a weighted error across entire training set Feature vector x from image i Binary reduction h(x) N values M bits m n

BoostSSC Weak classifications are evaluated via regression stumps: xixi N values n xjxj We need to figure out , , and T for each n. If x i and x j are similar, we should get 1 for most n’s.

BoostSSC Try a range of threshold T: –Regress f across entire training set to find each  and . –Keep the T that fits the best. Then, keep the n that causes the least weighted error. xixi xjxj n N values n n

BoostSSC xixi xjxj N values M bits m n

BoostSSC Update weights. –Affects future error calculations xixi xjxj N values n Weight 

BoostSSC In the end, each bit has an n index and a threshold. xixi N values M bits

BoostSSC

Restricted Boltzmann Machine (RBM) Architecture Network of binary stochastic units Hinton & Salakhutdinov, Nature 2006 Parameters:w:Symmetric Weights b:Biases h:Hidden Units v:Visible Units

Multi-Layer RBM Architecture

Training RBM Models Two phases 1.Pre-training Unsupervised Use Contrastive Divergence to learn weights and biases Gets parameters in the right ballpark 2.Fine-tuning Supervised No longer stochastic Backpropogate error to update parameters Moves parameters to local minimum

Greedy Pre-training (Unsupervised)

Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Output of RBM W are RBM weights

Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Assume K=2 classes

Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Pulls nearby points of same class closer

Neighborhood Components Analysis Goldberger, Roweis,Salakhutdinov & Hinton, NIPS 2004 Pulls nearby points of same class closer Goal is to preserve neighborhood structure of original, high-dimensional space

Experiments and Results

Searching Bit limitations: –Hashing scheme: Max. capacity for 13M images: 30 bits –Exhaustive search: 256 bits possible

Searching Results

LabelMe Retrieval

Examples of Web Retrieval 12 neighbors using different distance metrics

Web Images Retrieval

Conclusion Efficient searching for large image datasets Compact image representation Methods for binary reductions –Locality-Sensitive Hashing –Boosting –Restricted Boltzmann Machines Searching techniques

How Much is 80M Images? 1 feature-length movie: –105 min = 151K 24 FPS For 80M images, watch 530 movies

Parameters:Weights wBiases b

Input to RBM: Gist Vectors