Object recognition and scene “understanding”

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Towards an Implementation of a Theory of Visual Learning in the Brain Shamit Patel CMSC 601 May 2, 2011.
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Standard Brain Model for Vision
Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
HMAX Models Architecture Jim Mutch March 31, 2010.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
My Group’s Current Research on Image Understanding.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Thesis title: “Studies in Pattern Classification – Biological Modeling, Uncertainty Reasoning, and Statistical Learning” 3 parts: (1)Handwritten Digit.
Spatio-Temporal Sequence Learning of Visual Place Cells for Robotic Navigation presented by Nguyen Vu Anh date: 20 th July, 2010 Nguyen Vu Anh, Alex Leng-Phuan.
Robust Object Recognition with Cortex-Like Mechanisms Thomas Serre, Lior Wolf, Stanley Bileshi, Maximilian Riesenhuber, and Tomaso Poggio, Member, IEEE.
December 1, 2009Introduction to Cognitive Science Lecture 22: Neural Models of Mental Processes 1 Some YouTube movies: The Neocognitron Part I:
Michael Arbib & Laurent Itti: CS664 – USC, spring Lecture 6: Object Recognition 1 CS664, USC, Spring 2002 Lecture 6. Object Recognition Reading Assignments:
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Spatial Pyramid Pooling in Deep Convolutional
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Overview of Back Propagation Algorithm
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Multiclass object recognition
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
Image Recognition using Hierarchical Temporal Memory Radoslav Škoviera Ústav merania SAV Fakulta matematiky, fyziky a informatiky UK.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
Biases: An Example Non-accidental properties: Properties that appear in an image that are very unlikely to have been produced by chance, and therefore.
1 Computational Vision CSCI 363, Fall 2012 Lecture 10 Spatial Frequency.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Hierarchical Neural Networks for Object Recognition and Scene “Understanding”
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
Object Recognition in Images Slides originally created by Bernd Heisele.
Perception & Attention Computational Cognitive Neuroscience Randall O’Reilly.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Human vision Jitendra Malik U.C. Berkeley. Visual Areas.
Object detection, deep learning, and R-CNNs
Histograms of Oriented Gradients for Human Detection(HOG)
Class 21, 1999 CBCl/AI MIT Neuroscience II T. Poggio.
EMPATH: A Neural Network that Categorizes Facial Expressions Matthew N. Dailey and Garrison W. Cottrell University of California, San Diego Curtis Padgett.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Statistical Modeling and Learning in Vision --- cortex-like generative models Ying Nian Wu UCLA Department of Statistics JSM, August 2010.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Activity Recognition Journal Club “Neural Mechanisms for the Recognition of Biological Movements” Martin Giese, Tomaso Poggio (Nature Neuroscience Review,
9.012 Presentation by Alex Rakhlin March 16, 2001
Convolutional Neural Network
Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.
Article Review Todd Hricik.
Recognizing Deformable Shapes
Recognition using Nearest Neighbor (or kNN)
Lecture 5 Smaller Network: CNN
Object detection as supervised classification
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Non-linear classifiers Neural networks
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Computer Vision James Hays
Bilinear Classifiers for Visual Recognition
Crowding by a single bar
Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.
Creating Data Representations
SIFT keypoint detection
RCNN, Fast-RCNN, Faster-RCNN
Recognizing Deformable Shapes
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Volume 50, Issue 1, Pages (April 2006)
Volume 27, Issue 1, Pages (January 2017)
Presentation transcript:

Object recognition and scene “understanding” Computer Vision, Part 2 Object recognition and scene “understanding”

What makes object recognition a hard task for computers?

HMAX Riesenhuber, M. & Poggio, T HMAX Riesenhuber, M. & Poggio, T. (1999), “Hierarchical Models of Object Recognition in Cortex” Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006), “Robust Object Recognition with Cortex-Like Mechanisms” HMAX: A hierarchical neural-network model of object recognition. Meant to model human vision at level of “immediate recognition” capabilities of ventral visual pathway, independent of attention or other top-down processes. Also called “Standard Model” (because it incorporates the “standard model” of visual cortex) Inspired by earlier “Neocognitron” model of Fukushima (1980)

General ideas behind model “Immediate” visual processing is feedforward and hierachical: low levels detect simple features, which are combined hierarchically into increasingly complex features to be detected Layers of hierarchy alternate between “sensitivity” (to detecting features) and “invariance” (to position, scale, orientation) Size of receptive fields increases along the hierarchy Degree of invariance increases along the hierarchy

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.)

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Image (gray-scale)

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) S1 layer Edge detectors Image (gray-scale)

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

The HMAX model for object recognition (Riesenhuber, Poggio, Serre, et al.) Classification layer Object or image classification C2 layer Max activation over each prototype Job of HMAX is to produce a higher-level representation of an image that will be useful for classification. Layers alternate between “specificity” and “invariance” over position, scale, orientation S2 layer Prototypes (small image patches) C1 layer Max over local S1 units S1 layer Edge detectors Image (gray-scale)

S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)

One S1 receptive field: Etc.: 16 scales

Max activation over local S1 units (local position, scale) C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales MAX MAX S1 layer Edge detectors 4 orientations, 16 scales Image (gray-scale)

… S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

… Prototypes S2 layer Calculate similarity to prototype (~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

… Prototypes S2 layer Calculate similarity to prototype (~1000, chosen from image collection, translated to C1 features) S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales … Similarity: Radial basis function: C1 layer Max activation over local S1 units (local position, scale) 4 orientations, 8 scales S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.

… … C2 layer Max activation over position, orientation, scale S2 layer (1 value) MAX (1 value) S21 S22 … S2 layer Calculate similarity to prototype (radial basis function) 4 orientations, 8 scales …

… classification (e.g., dog / not dog) C2 layer Support Vector Machine classification (e.g., dog / not dog) C2 layer Max over position, orientation, scale .11 .78 … .32

Streetscenes “scene understanding” system (Bileschi, 2006) Use HMAX + SVM to identify object classes: Car, Pedestrian, Bicycle, Building, Tree

How Streetscenes Works (Bileschi, 2006) 1. Densely tile the image with windows of different sizes. 2. C1 and C2 features are computed in each window. 3. The features in each window are given as input to each of five trained support vector machines 4. If any return a classification with score above a learned threshold, that object is said to be “detected” . …

Object detection (here, “car”) with HMAX model (Bileschi, 2006)

Sample of results from HMAX model (Serre et al., 2006)