1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Face Alignment by Explicit Shape Regression
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
G53MLE | Machine Learning | Dr Guoping Qiu
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Face detection Many slides adapted from P. Viola.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Simple Neural Nets For Pattern Classification
A simple classifier Ridge regression A variation on standard linear regression Adds a “ridge” term that has the effect of “smoothing” the weights Equivalent.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Oral Defense by Sunny Tang 15 Aug 2003
Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Generic object detection with deformable part-based models
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Face Detection using the Viola-Jones Method
Multiclass object recognition
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
M ULTIFRAME P OINT C ORRESPONDENCE By Naseem Mahajna & Muhammad Zoabi.
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
Kumar Srijan ( ) Syed Ahsan( ). Problem Statement To create a Neural Networks based multiclass object classifier which can do rotation,
NEURAL NETWORKS FOR DATA MINING
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Line detection Assume there is a binary image, we use F(ά,X)=0 as the parametric equation of a curve with a vector of parameters ά=[α 1, …, α m ] and X=[x.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Pedestrian Detection and Localization
Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.
BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
A Face processing system Based on Committee Machine: The Approach and Experimental Results Presented by: Harvest Jang 29 Jan 2003.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Stable Multi-Target Tracking in Real-Time Surveillance Video
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Timo Ahonen, Abdenour Hadid, and Matti Pietikainen
Wonjun Kim and Changick Kim, Member, IEEE
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Data Mining, Neural Network and Genetic Programming
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Trees, bagging, boosting, and stacking
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
CSSE463: Image Recognition Day 11
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
Artificial Intelligence Chapter 3 Neural Networks
ADABOOST(Adaptative Boosting)
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Roc curves By Vittoria Cozza, matr
EM Algorithm and its Applications
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and Machine Intelligence Antón Escobedo cse252c

2 Outline  Introduction  Problem Specification  Related Work  Overview of the Approach  Evaluation  Experimental Results and Analysis  Conclusion and Future Scope

3 Introduction  Automatic detection of objects in images  Different objects belonging to the same category can vary  Successful object detection system  Proposed solution – Sparse-Part based representation  Part-based representation is computationally efficient and has its roots in biological vision

4 Problem Specification  Input: An image  Output: A list of locations at which instances of the object class are detected in the image  The experiments are performed on images of side views of cars but can be applied to any object that consists of distinguishable parts arranged in a relatively fixed spatial configuration  The present problem is a “detection” problem rather than a simple “classification” problem

5 Previous Related Work  Raw Pixel Intensities  Global Image  Local features  Part Based Representations using hand labeled features

6 Algorithm Overview Four Stages:  Vocabulary Construction: Building a vocabulary of parts that will represent objects  Image Representation: Input images are represented in terms of binary feature vectors  Learning a Classifier: Two target classes +feature vector (object) and –feature vector (nonobject)  Detection Hypothesis Using the Learned Classifier: Classifier activation map for the single-scale case Classifier activation pyramid for multiscale cases

7 Vocabulary Construction  Extraction of interest points using Forstner interest operator  Experiments carried out on 50 representative images of size 100 x 40 pixels. A total of 400 patches, each of size 13 x 13 pixels were extracted  To facilitate learning, a bottom-up clustering procedure was adopted where similarity was measured by normalized correlation  Similarity between two clusters C 1 and C 2 is finally measured by the average similarity between their respective patches:

8 Vocabulary Construction Forstner applied to sample image Sample patches Clusters from sample patches

9 Image Representation  For each patch q in an image, a similarity-based indexing is performed into the part vocabulary P using:  For each highlighted patch q, the most similar vocabulary part P*(q) is given by:

10 Image Representation: Feature Vector  Spatial relations among the parts detected in an image are defined in terms of distance (5 bins) and directions (8 ranges of 45 degrees each) giving 20 possible relations between 2 parts.  2-6 parts per Positive Window  Each 100x40 training image is represented as a feature vector with 290 elements. P n (i) : i th occurrence of a part of type n in the image (1≤n≤270; n is a particular part-cluster) R m (j) (P n1, P n2 ): j th occurrence of relation R m between a part of type n1 and a part of type n2 (1≤m≤20; m is a distance-direction combination)

11 Learning a Classifier  Train classifier using 1000 labeled images, each 100 x 40 pixels in size No synthetic training images  +ve examples: Various cars with varied backgrounds  - ve examples: Natural scenes like buildings, roads  High dimensionality of feature vector: 270 types, 20 relations, repeats.  Use of Sparse Network of Winnows (SNoW) learning architecture.  Winnow: to reduce in number until only the best are left

12 SNoW: Sparse network of linear units over a Boolean or real valued feature space Target Nodes set of examples e (represented as a list of active features) Input Layer= Feature Layer Edges are allocated dynamically (activation)

13 SNoW: Predicted target t* for example e Activation calculated by the summation for target node t Learning Algorithm Specific Sigmoid function whose transition from an output close to 0 to an output close to 1, centers around θ. θ - Ω

14 SNoW: Basic Learning Rules  Several weight update rules can be used: update rules are variations of Winnow and Perceptron  Winnow update rule: The number of examples required to learn a linear function grows linearly with the number of relevant features and only logarithmically with the total number of features.

15 A Training Example 1, 1 1, 1001, 1006: 1, 1, 1 2, 1002, 1007, 1008: 1, 1 3, 1006, 1004: , 22, 2, 2 1, 1004, 1007: 2, 1, 2, 1 2, 2, 2, 2 2, 22, 1, 1, 2 3, 1004, 1005, 1009: 2, 1, 2, 1 Update rule: Winnow α = 2, β = ½, θ = , 1005, 1007: = 4= 2= 1

16 Detection Hypothesis using Learned Classifier  Classifier Activation Map – for single scale Neighborhood Suppresion: Based on nonmaximum suppression. Repeated Part Elimination: Greedy algorithm, uses windows around highest activation points.

17 Detection: Classifier Activation Pyramid  Scale the input image a number of times to form a multi-scale image pyramid  Apply the learned classifier to fixed-size windows in each image in the pyramid  Form a three-dimensional classifier activation pyramid instead of the earlier two-dimensional classifier activation map.

18 Evaluation Criteria  Test Set I consists of 170 images containing 200 cars of same size and is tested for single scale case. In this case for each car in the test images, the location of best 100 x 40 window containing the car is determined.  Test Set II consists of 108 images containing 139 cars of different sizes and is tested for multi scale case. In this case for each car in the test images, the location and scale of the best 100 x 40 window containing the car is determined.

19 Performance Measures  Goal is to maximize the number of correct detections and minimize the number of false detections.  One method for expressing the trade-off between correct and false detections is to use the receiver operating characteristics (ROC) curve. This curve plots the true positive rate vs. the false positive rate. # of true positive (TP) True positive rate = Total # of positives in the data set (nP) # of false positive (FP) False positive rate = Total # of negatives in the data set (nN) This measures the accuracy of the system as a “classifier” rather than a “detector”.

20 Performance Measures (contd.)  We are really interested in knowing how many of the objects it detects (given by recall), and how often the detections it makes are false (given by 1-precision). This trade-off is thus captured very accurately by (recall) vs. (1-precision) curve; where TP TP Recall = ; 1 – Precision = nP TP + FP The threshold parameter that achieves the best trade-off between the two quantities is measured by the point of highest F-measure, where 2 * Recall * Precision F-measure = Recall + Precision

21 Experimental Results Activation Threshold Recall (R) TP/200 Precision (P) TP/(TP+FP) F-measure 2*R*P/(R+P) Single-scale detection with Neighborhood Suppression Algorithm Activation Threshold Recall (R) TP/200 Precision (P) TP/(TP+FP) F-measure 2*R*P/(R+P) Single-scale detection with Repeated Part Elimination Algorithm

22 Experimental Results (contd.) Activation Threshold Recall (R) TP/139 Precision (P) TP/(TP+FP) F-measure 2*R*P/(R+P) Multi-scale detection with Neighborhood Suppression Algorithm Activation Threshold Recall (R) TP/139 Precision (P) TP/(TP+FP) F-measure 2*R*P/(R+P) Multi-scale detection with Repeated Part Elimination Algorithm

23 Some Graphical Results

24 Analysis: A. Performance of Interest Operator

25 Analysis: B. Performance of Part Matching Process

26 Analysis: C. Performance of Learned Classifier

27 Conclusion  Automatic vocabulary construction from sample images  Methodologies for object detection Detector from Classifier Standardizing evaluation criterion  Good for classification of objects with distinguishable parts

28 Questions? Slides adapted from and