Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,

Slides:



Advertisements
Similar presentations
Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.
Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Three things everyone should know to improve object retrieval
Input Space versus Feature Space in Kernel- Based Methods Scholkopf, Mika, Burges, Knirsch, Muller, Ratsch, Smola presented by: Joe Drish Department of.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Generation of Virtual Image from Multiple View Point Image Database Haruki Kawanaka, Nobuaki Sado and Yuji Iwahori Nagoya Institute of Technology, Japan.
Application of light fields in computer vision AMARI LEWIS – REU STUDENT AIDEAN SHARGHI- PH.D STUENT.
Thesis title: “Studies in Pattern Classification – Biological Modeling, Uncertainty Reasoning, and Statistical Learning” 3 parts: (1)Handwritten Digit.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
A Study of Approaches for Object Recognition
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Slide credits for this chapter: Frank Dellaert, Forsyth & Ponce, Paul Viola, Christopher Rasmussen.
FACE RECOGNITION, EXPERIMENTS WITH RANDOM PROJECTION
Scale Invariant Feature Transform (SIFT)
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Super-Resolution of Remotely-Sensed Images Using a Learning-Based Approach Isabelle Bégin and Frank P. Ferrie Abstract Super-resolution addresses the problem.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Terrorists Team members: Ágnes Bartha György Kovács Imre Hajagos Wojciech Zyla.
Using Support Vector Machines to Enhance the Performance of Bayesian Face Recognition IEEE Transaction on Information Forensics and Security Zhifeng Li,
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Face Recognition: An Introduction
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Computer Graphics and Image Processing (CIS-601).
CSE 185 Introduction to Computer Vision Face Recognition.
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
1 End-to-End Learning for Automatic Cell Phenotyping Paolo Emilio Barbano, Koray Kavukcuoglu, Marco Scoffier, Yann LeCun April 26, 2006.
776 Computer Vision Jan-Michael Frahm Spring 2012.
More sliding window detection: Discriminative part-based models
776 Computer Vision Jan-Michael Frahm Spring 2012.
A Plane-Based Approach to Mondrian Stereo Matching
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.
University of Ioannina
Evaluating Techniques for Image Classification
Learning Mid-Level Features For Recognition
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Recognizing Deformable Shapes
Machine Learning Basics
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Common Classification Tasks
RGB-D Image for Scene Recognition by Jiaqi Guo
Brief Review of Recognition + Context
Introduction of MATRIX CAPSULES WITH EM ROUTING
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Presented by Xu Miao April 20, 2005
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04, 2004 Presentation by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Department of Computer Engineering Bilkent University April 21, 2005

2 About the paper… Recognition of Generic Object Categories The NORB Dataset Experiments and Results  Principal Component Analysis  K-Nearest Neighbors  Pairwise Support Vector Machines  Convolutional Networks Conclusion and Future Work Outline

3 The paper is about… Describing the largest publicly available dataset Reporting baseline performance with standard methods on this dataset Exploring how methods fare when the number of input variables is huge The performance of methods based on global template matching The performance when the size of the problem is at the upper-limit of applicability Learning invariance to 3D pose, lighting conditions and variabilities of images Taking advantage of binocular inputs

4 Recognition of Generic Object Categories The recognition of generic object categories with invariance to pose, lighting, diverse backgrounds, and the presence of clutter is one of the major challenges of Computer Vision. Variety of clues have been used previously:  Color and Texture  Distinctive Local Features  Separately acquired 3D models  Silhouettes and edges  Pose-invariant Feature Histograms Shape information??

5 Using Shape Information Recognizing Generic Categories such as cars, trucks, airplanes, human figures, or four- legged animals purely from the shape information is a difficult problem Another difficulty of the problem is the non-availability of a dataset with sufficient size and diversity to carry out meaningful experiments.

6 The NORB Dataset The only useful and reliable clue in the dataset is the shape of the object NORB is considerably larger than the past datasets and it offers:  More variability  Stereo pairs  The ability to composite the objects and their cast shadows onto diverse backgrounds Images of 50 toys were collected using the peripherals whose details are given in the paper

7 The NORB Dataset The collection consists of 10 instance of 5 generic categories:  Four-legged Animals, Human Figures, Airplanes, Trucks, Cars  All objects are painted uniform green to eliminate irrelevant color and texture  Each object instance was placed in a different initial pose  1944 stereo pairs were collected for each instance: 9 elevations, 36 azimuths and 6 lighting conditions  A total of images RGB images of resolution 640x480 were collected (5 categories, 10 instances, 9 elevations, 36 azimuths, 6 lightings, and 2 cameras)

8 The NORB Dataset Experiments were conducted with 4 datasets generated from the normalized object images  Normalized-Uniform Set  Jittered-Uniform Set  Jittered-Textured Set  Jittered-Cluttered Set Each dataset consists of the 5 instances of categories for training and 5 instances for testing

9 The NORB Dataset

10 On raw image pairs  Linear Classifier  K-Nearest Neighbor  Pairwise Support Vector Machines with Gaussian Kernels  Convolutional Networks On PCA coefficients  K-Nearest Neighbor  Pairwise Support Vector Machines with Gaussian Kernels Lush environment, Torch Library are used Experiments

11 18,432 x 18,432 covariance matrix so we need a method Find the principal direction of a centered cloud of points by finding two cluster centroids that are symmetric with respect to the origin i.e., find u that minimizes Yields the first 100 principal components in a few CPU hours Experiments - PCA

12 Running on 24,300 reference images of size 18,432 is prohibitively expensive Pre-compute the distances of a few representative images A k to all other reference images X i. Distances are bounded below by: This can be used to choose which distances should be computed first. Conducted up to K = 18 but best results are obtained for K = 1 Experiments – K-Nearest Neighbors

13 Failed to obtain convergence on normalized-uniform dataset in manageable time, also SVMs were not trained on jitter datasets Applied on sub-sampled versions and PCA-derived versions 10 SVMs were independently trained to do pairwise classification and used voting strategy The number of support vectors was between [800, 2000] for PCA-derived inputs The number of support vectors was between [2000, 3000] for 32x32 raw images Experiments – Pairwise SVM

14 Succession of layers of trainable convolutions and spatial sub-sampling Extracts features of:  Increasingly large receptive fields  Increasing complexity  Increasing robustness to irrelevant variabilities The network has 90,575 trainable parameters (Full propagation requires 3,896,920 multiply-adds) Levenberg-Marquardt algorithm with diagonal approximation of the Hessian for 250,000 online updates No over-training, no early-stops. Experiments – Convolutional Network

15 Results

16 Results

17 Results

18 Discussion These are the first systematic experiments that apply machine learning to shape-based generic object recognition with invariance to pose and lighting Normalized-uniform dataset is unrealistically favorable to template- based methods because of the perfect conditions The size of the jittered database was too large to carry out experiments with the template based methods The shear size and complexity of the jittered datasets place them above the practical limits of template based methods. Binocular convolution network take advantage of disparity information to locate the outline of the object

19 Conclusions The system can spot and recognize the animals, human figures, planes, cars and trucks in natural scenes with high accuracy at a rate of several frames per second By presenting the input image at multiple scales, the system can detect those objects over a wide range of scales Popular template-based approaches including SVMs are limited for classification over very large datasets with complex variabilities. Convolutional Networks can be scanned over large images very efficiently The NORB Dataset opens the door to large-scale experiments with learning-based approaches to invariant object recognition Future works may use trainable classifiers that incorporate explicit models of image formation and geometry

20 Comments The authors just dealt with their problems, not to the specific problems of the algorithms The paper is well organized and clearly understandable The dataset preparation details might be reduced Previous works in the area could be discussed more with their disadvantages

21 Questions?