Sharing features for multi-class object detection Antonio Torralba, Kevin Murphy and Bill Freeman MIT Computer Science and Artificial Intelligence Laboratory.

Slides:

Advertisements

Similar presentations

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.

Advertisements

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Detecting Faces in Images: A Survey

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Face detection Behold a state-of-the-art face detector! (Courtesy Boris Babenko)Boris Babenko.

CMPUT 466/551 Principal Source: CMU

AdaBoost & Its Applications

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.

Detecting Pedestrians by Learning Shapelet Features

The Viola/Jones Face Detector (2001)

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.

1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

Learning and Vision: Discriminative Models

Adaboost and its application

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

Viola and Jones Object Detector Ruxandra Paun EE/CS/CNS Presentation

Sparse vs. Ensemble Approaches to Supervised Learning

Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.

Face Detection using the Viola-Jones Method

CS 231A Section 1: Linear Algebra & Probability Review

Learning Based Hierarchical Vessel Segmentation

EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.

Part 3: discriminative methods Antonio Torralba. Overview of section Object detection with classifiers Boosting –Gentle boosting –Weak detectors –Object.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

Recognition using Boosting Modified from various sources including

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

CSC2515 Fall 2008 Introduction to Machine Learning Lecture 11a Boosting and Naïve Bayes All lecture slides will be available as.ppt,.ps, &.htm at

Face detection Slides adapted Grauman & Liebe’s tutorial

Visual Object Recognition

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

UNBIASED LOOK AT DATASET BIAS Antonio Torralba Massachusetts Institute of Technology Alexei A. Efros Carnegie Mellon University CVPR 2011.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.

FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.

Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin.

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

Face detection Many slides adapted from P. Viola.

AdaBoost Algorithm and its Application on Object Detection Fayin Li.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Cascade for Fast Detection

Session 7: Face Detection (cont.)

Part 3: discriminative methods

Lit part of blue dress and shadowed part of white dress are the same color

Object detection as supervised classification

Learning to Detect Faces Rapidly and Robustly

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

Lecture 18: Bagging and Boosting

Presented by: Chang Jia As for: Pattern Recognition

Lecture 06: Bagging and Boosting

Lecture 29: Face Detection Revisited

Presentation transcript:

Sharing features for multi-class object detection Antonio Torralba, Kevin Murphy and Bill Freeman MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) Sept. 1, 2004

The lead author: Antonio Torralba

Goal We want a machine to be able to identify thousands of different objects as it looks around the world.

Multi-class object detection: Local features no car Classifier P( car | v p ) VpVp no cow Classifier P( cow | v p ) no person Classifier P(person | v p ) … Bookshelf Desk Screen Desired detector outputs: One patch

Need to detect as well as to recognize Detection: localize in the image screens, keyboards and bottles In detection, the big problem is to differentiate the sparse objects against the background Recognition: name each object In recognition, the problem is to discriminate between objects.

There are efficient solutions for detecting a single object category and view: Viola & Jones 2001; Papageorgiou & Poggio 2000, … Lowe, 1999 detecting particular objects:. But the problem of multi-class and multi-view object detection is still largely unsolved. Leibe & Schiele, 2003; detecting objects in isolation

Why multi-object detection is a hard problem viewpoints Need to detect Nclasses * Nviews * Nstyles. Lots of variability within classes, and across viewpoints. Object classes Styles, lighting conditions, etc, etc, etc…

Existing approaches for multiclass object detection (vision community) Using a set of independent binary classifiers is the dominant strategy: Viola-Jones extension for dealing with rotations - two cascades for each view Schneiderman-Kanade multiclass object detection a) One detector for each class

Promising approaches… Fei-Fei, Fergus, & Perona, 2003 Look for a vocabulary of edges that reduces the number of features Krempp, Geman, & Amit, 2002

Characteristics of one-vs-all multiclass approaches: cost Computational cost grows linearly with Nclasses * Nviews * Nstyles … Surely, this will not scale well to 30,000 object classes.

Characteristics of one-vs-all multiclass approaches: representation Some of these parts can only be used for this object. What is the best representation to detect a traffic sign? Very regular object: template matching will do the job Parts derived from training a binary classifier. ~100% detection rate with 0 false alarms

Meaningful parts, in one-vs-all approaches Part-based object representation (looking for meaningful parts): A. Agarwal and D. Roth Ullman, Vidal-Naquet, and Sali, 2004: features of intermediate complexity are most informative for (single-object) classification. M. Weber, M. Welling and P. Perona …

Multi-class classifiers (machine learning community) Error correcting output codes (Dietterich & Bakiri, 1995; …) But only use classification decisions (1/-1), not real values. Reducing multi-class to binary (Allwein et al, 2000) Showed that the best code matrix is problem-dependent; don’t address how to design code matrix. Bunching algorithm (Dekel and Singer, 2002) Also learns code matrix and classifies, but more complicated than our algorithm and not applied to object detection. Multitask learning (Caruana, 1997; …) Train tasks in parallel to improve generalization, share features. But not applied to object detection, nor in a boosting framework.

Our approach Share features across objects, automatically selecting the best sharing pattern. Benefits of shared features: –Efficiency –Accuracy –Generalization ability

Algorithm goals, for object recognition. We want to find the vocabulary of parts that can be shared We want share across different objects generic knowledge about detecting objects (eg, from the background). We want to share computations across classes so that computational cost < O(Number of classes)

Object class 1 Object class 2 Object class 3 Object class 4 Total number of hyperplanes (features): 4 x 6 = 24. Scales linearly with number of classes Independent features

Total number of shared hyperplanes (features): 8 May scale sub-linearly with number of classes, and may generalize better. Shared features

Note: sharing is a graph, not a tree b R 3 b 3 3 Objects: {R, b, 3} This defines a vocabulary of parts shared across objects

At the algorithmic level Our approach is a variation on boosting that allows for sharing features in a natural way. So let’s review boosting (ada-boost demo)

Boosting demo

Joint boosting, outside of the context of images. Additive models for classification +1/-1 classification classes feature responses

Feature sharing in additive models 1)Simple to have sharing between additive models 2)Each term h m can be mapped to a single feature H 1 = G 1,2 + G 1 H 2 = G 1,2 + G 2

Flavors of boosting Different boosting algorithms use different loss functions or minimization procedures (Freund & Shapire, 1995; Friedman, Hastie, Tibshhirani, 1998). We base our approach on Gentle boosting: learns faster than others (Friedman, Hastie, Tibshhirani, 1998; Lienahart, Kuranov, & Pisarevsky, 2003).

Joint Boosting We use the exponential multi-class cost function At each boosting round, we add a function: classes classifier output for class c membership in class c, +1/-1

Newton’s method Treat h m as a perturbation, and expand loss J to second order in h m classifier with perturbation squared error reweighting

Joint Boosting weightsquared errorWeight squared error over training data

 a+b b vfvf Given a sharing pattern, the decision stump parameters are obtained analytically Feature output, v For a trial sharing pattern, set weak learner parameters to optimize overall classification h m (v,c)

Response histograms for background (blue) and class members (red) h m (v,c) k c=1 k c=2 k c=5 The constants k prevent sharing features just due to an asymmetry between positive and negative examples for each class. They only appear during training. Joint Boosting: select sharing pattern and weak learner to minimize cost. Algorithm details in CVPR 2004, Torralba, Murphy & Freeman

Approximate best sharing But this requires exploring 2 C –1 possible sharing patterns Instead we use a first-best search: S = [] 1) We fit stumps for each class independently 2) take best class - c i S = [S c i ] fit stumps for [S c i ] with c i not in S go to 2, until length(S) = Nclasses 3) select the sharing with smallest WLS error

Effect of pattern of feature sharing on number of features required (synthetic example)

2-d synthetic example 3 classes + 1 background class

No feature sharing Three one-vs-all binary classifiers This is with only 8 separation lines

With feature sharing This is with only 8 separation lines Some lines can be shared across classes.

The shared features

Comparison of the classifiers Shared features: note better isolation of individual classes. Non-shared features.

Now, apply this to images. Image features (weak learners) 32x32 training image of an object 12x12 patch Location of that patch within the 32x32 object g f (x) Mean = 0 Energy = 1 w f (x) Binary mask Feature output

The candidate features template position

The candidate features Dictionary of 2000 candidate patches and position masks, randomly sampled from the training images template position

Database of 2500 images Annotated instances

Multiclass object detection 21 objects We use [20, 50] training samples per object, and about 20 times as many background examples as object examples.

Feature sharing at each boosting round during training

Example shared feature (weak classifier) At each round of running joint boosting on training set we get a feature and a sharing pattern. Response histograms for background (blue) and class members (red)

Non-shared feature Shared feature

Non-shared feature Shared feature

How the features were shared across objects (features sorted left-to-right from generic to specific)

Performance evaluation Area under ROC (shown is.9) False alarm rate Correct detection rate

Performance improvement over training Significant benefit to sharing features using joint boosting.

ROC curves for our 21 object database How will this work under training-starved or feature-starved conditions? Presumably, in the real world, we will always be starved for training data and for features.

70 features, 20 training examples (left) Shared features Non-shared features

15 features, 20 training examples (mid) 70 features, 20 training examples (left) Shared features Non-shared features

15 features, 2 training examples (right) 15 features, 20 training examples (middle) 70 features, 20 training examples (left) Shared features Non-shared features

Scaling Joint Boosting shows sub-linear scaling of features with objects (for area under ROC = 0.9). Results averaged over 8 training sets, and different combinations of objects. Error bars show variability.

Red: shared features Blue: independent features

Examples of correct detections

What makes good features? Depends on whether we are doing single- class or multi-class detection…

Generic vs. specific features Parts derived from training a binary classifier. In both cases ~100% detection rate with 0 false alarms Parts derived from training a joint classifier with 20 more objects.

Qualitative comparison of features, for single-class and multi-class detectors

Multi-view object detection train for object and orientation View invariant features View specific features Sharing features is a natural approach to view-invariant object detection.

Multi-view object detection Sharing is not a tree. Depends also on 3D symmetries. … …

Multi-view object detection

Strong learner H response for car as function of assumed view angle

Visual summary…

Features Object units

Summary Argued that feature sharing will be an essential part of scaling up object detection to hundreds or thousands of objects (and viewpoints). We introduced joint boosting, a generalization to boosting that incorporates feature sharing in a natural way. Initial results (up to 30 objects) show the desired scaling behavior for # features vs # objects. The shared features are observed to generalize better, allowing learning from fewer examples, using fewer features.

end