Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

On-line learning and Boosting
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Supervised Learning Recap
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
My Group’s Current Research on Image Understanding.
Chapter 4: Linear Models for Classification
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Hierarchical Saliency Detection School of Electronic Information Engineering Tianjin University 1 Wang Bingren.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Detecting Pedestrians by Learning Shapelet Features
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Recovering Intrinsic Images from a Single Image 28/12/05 Dagan Aviv Shadows Removal Seminar.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Generic Object Recognition -- by Yatharth Saraf A Project on.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Adaboost and its application
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Christian Siagian Laurent Itti Univ. Southern California, CA, USA
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Machine Learning: Ensemble Methods
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.
Recognition using Boosting Modified from various sources including
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Benk Erika Kelemen Zsolt
Face detection Slides adapted Grauman & Liebe’s tutorial
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
ECE738 Advanced Image Processing Face Detection IEEE Trans. PAMI, July 1997.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Putting Context into Vision Derek Hoiem September 15, 2004.
Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.
Limitations of Cotemporary Classification Algorithms Major limitations of classification algorithms like Adaboost, SVMs, or Naïve Bayes include, Requirement.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Face Detection Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Lecture 09 03/01/2012 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Context-based vision system for place and object recognition Antonio Torralba Kevin Murphy Bill Freeman Mark Rubin Presented by David Lee Some slides borrowed.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Contextual models for object detection using boosted random fields by Antonio Torralba, Kevin P. Murphy and William T. Freeman.
Optimal Eye Movement Strategies In Visual Search.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Using the Forest to see the Trees: A computational model relating features, objects and scenes Antonio Torralba CSAIL-MIT Joint work with Aude Oliva, Kevin.
 Mentor : Prof. Amitabha Mukerjee Learning to Detect Salient Objects Team Members - Avinash Koyya Diwakar Chauhan.
Computer vision: models, learning and inference
Reading: R. Schapire, A brief introduction to boosting
Session 7: Face Detection (cont.)
Part 3: discriminative methods
Journal of Vision. 2009;9(3):5. doi: /9.3.5 Figure Legend:
Semantic Video Classification
Context-based vision system for place and object recognition
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
CS 1674: Intro to Computer Vision Scene Recognition
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
ADABOOST(Adaptative Boosting)
EM Algorithm and its Applications
Presentation transcript:

Essence The of the Scene Gist Trayambaka Karra KT and Garold Fuks

The “Gist” of a scene If this is a street this must be a pedestrian

Physiological Evidence People are excellent in identifying pictures (Standing L., QL. Exp. Psychol. 1973) Gist : abstract meaning of scene Obtained within 150 ms (Biederman, 1981, Thorpe S. et.al 1996 ) Obtained without attention (Oliva & Schyns, 1997, Wolfe,J.M. 1998) Possibly derived via statistics of low-level structures (e.g. Swain & Ballard, 1991) Change Blindness (seconds) (Simons DJ,Levin DT,Trends Cog.Sci. 97)

What is the “gist” Inventory of the objects (2-3 objects in 150 msec Luck & Vogel, Nature 390, 1997 ) Relation between objects (layout) (J. Wolfe, Curr. Bio. 1998, 8 ) Presence of other objects “Visual stuff” – impression of low level features

How does the “Gist” works Statistical Properties Object Properties R.A. Rensink, lecture notes

Outline Context Modeling –Previous Models –Scene based Context Model Context Based Applications –Place Identification –Object Priming –Control of Focus of Attention –Scale Selection –Scene Classification Joint Local and Global Features Applications –Object Detection and Localization Summary

Probabilistic Framework MAP Estimator v – image measurements O – object property Category (o) Location (x) Scale (σ)

Object-Centered Object Detection B. Moghaddam, A. Petland IEEE, PAMI The only image features relevant to object detection are those belonging to the object and not the background

The “Gist” of a scene Local features can be ambiguous Context can provide prior

Scene Based Context Model Background provides a likelihood of finding an object Prob(Car/image) = low Prob(Person/image) = high

Context Modeling  Previous Context Models (Fu, Hammond and Swain, 1994,Haralick, 1983; Song et al, 2000)  Rule Based Context Model  Object Based Context Model  Scene centered context representation (Oliva and Torralba, 2001,2002)

Structural Description O2 O4 O3 O1 O4 Above Right-of Left-of Touch Rule Based Context Model

Fu, Hammond and Swain, 1994

Object Based Context Model Context is incorporated only through prior probability of object combinations in the world R. Harralick, IEEE, PAMI

Scene Based Context Model What are the features representing scene - ? Statistics of local low level features Color histograms Oriented band pass filters

Context Features - Vc g 1 (x) g 2 (x) g K (x) v(x,1) v(x,2) v(x,K)

Context Features - Vc Car, no people People, no car Gabor filter

Context Features - Vc PCA

PCA Detour Natural Images DB Calculate v(x,k) Arrange v(x,k) s in a matrix Calculate Correlation of V Perform SVD Use columns of U as basis

Context Features - Summary Bank Of Filters Dimension Reduction PCA I(x)

Probability from Features How to obtain context based probability priors P(O/v c ) on object properties - ? GMM - Gaussian Mixture Model Logistic regression Parzen window

Probability from Features GMM P(Object Property/Context) Need to study two probabilities: P(v c /O) – likelihood of the features given the presence of an object P(v c /¬O) – likelihood of the features given the absence of an object Gaussian Mixture Model: The unknown parameters are learnt by EM algorithm

Probability from Features How to obtain context based probability priors P(O/v c ) on object properties - ? GMM - Gaussian Mixture Model Logistic regression Parzen window

Probability from Features Logistic Regression

O = having back problems v c = age - The log odds for 20 year old person - The log odds ratio when comparing two persons who differ 1 year in age Example Training Stage Working Stage

Probability from Features How to obtain context based probability priors P(O/v c ) on object properties - ? GMM - Gaussian Mixture Model Logistic regression Parzen window

Probability from Features Parzen Window Radial Gaussian Kernel

What did we have so far… Context Modeling Context Based Applications –Place Identification –Object Priming –Control of Focus of Attention –Scale Selection –Scene Classification

Place Identification Goal: Recognize specific locations

Place Identification A.Torralba, K.Murphy, W. Freeman, M. Rubin ICCV 2003

Place Identification Decide only when Precision vs. Recall rate: A.Torralba, P. Sinha, MIT AIM

Object Priming How do we detect objects in an image? –Search the whole image for the object model. –What if I am searching in images where the object doesn’t exist at all? Obviously, wasting “my precious” computational resources GOLUM. Can we do better and if so, how? –Use the “great eye”, the contextual features of the image (v C ), to predict the probability of finding our object of interest, o in the image i.e. P(o / v C ).

Object Priming ….. What to do? –Use my experience to learn from a database of images with How to do it? –Learn the PDF, by a mixture of Gaussians –Also, learn the PDF

Object Priming …..

Control of Focus of Attention How do biological visual systems use to deal with the analysis of complex real-world scenes? –by focusing attention into image regions that require detailed analysis.

Modeling the Control of Focus of Attention How to decide which regions are “more” important than others? Local–type methods 1.Low level saliency maps – regions that have different properties than their neighborhood are considered salient. 2.Object centered methods. Global-type methods 1.Contextual control of focus of attention

Contextual Control of Focus of Attention Contextual control is both –Task driven (looking for a particular object o) and –Context driven (given global context information: v C ) No use of object models (i.e. ignores object centered features)

Contextual Control of Focus of Attention …

Focus on spatial regions that have high probability of containing the target object o given context information (v C ) For each location x, lets calculate the probability of presence of the object o given the context v C. Evaluate the PDF based on the past experience of the system.

Contextual Control of Focus of Attention … Learning Stage: Use the Swiss Army Knife, the EM algorithm, to estimate the parameters

Contextual Control of Focus of Attention … Learning Stage: - - models the distribution of object locations. - models the distribution of contextual features. Given training data is { v t } t = 1, N and { x t } t = 1, N where v t are the contextual features of picture t and x t is the location of object o in the scene. Use the Swiss Army Knife, the EM algorithm, to estimate the parameters

Contextual Control of Focus of Attention …

Scale Selection Scale selection is a fundamental problem in computer vision. a key bottleneck for object-centered object detection algorithms. Can we estimate scale in a pre-processing stage? Yes, using saliency measures of low-level operators across spatial scales. Other methods? Of course, …..

Context-Driven Scale Selection Preferred Scale,

Context-Driven Scale Selection ….

Scene Classification Strong correlation between the presence of many types of objects. Do not model this correlation directly. Rather, use a “common” cause, which we shall call “scene”. Train a Classifier to identify scenes. Then all we need is to calculate

What did we have so far… Context Modeling Context Based Applications Joint Local and Global Features Applications –Object Detection and Localization Need new tools: Learning and Boosting

Weak Learners Given (x 1,y 1 ),…,(x m,y m ) where Can we extract “rules of thumb” for classification purposes? Weak learner finds a weak hypothesis (rule of thumb) h : X {spam, non-spam}

Decision Stumps Consider the following simple family of component classifiers generating ±1 labels: h(x;p) = a[x k > t] - b where p = {a, b, k, t}. These are called decision stumps. Sign (h) for classification and mag (h) for a confidence measure. Each decision stump pays attention to only a single component of the input vector.

Ponders his maker, ponders his will Can we combine weak classifiers to produce a single strong classifier in a simple manner: h m (x) = h(x;p 1 ) + …. + h(x;p m ) where the predicted label for x is the sign of h m (x). Is it beneficial to allow some of the weak classifiers to have more “votes” than others: h m (x) = α 1 h(x;p 1 ) + …. + α m h(x;p m ) where the non-negative votes α i can be used to emphasize the components more reliable than others.

Boosting What is boosting? –A general method for improving the accuracy of any given weak learning algorithm. –Introduced in the framework of PAC learning model. –But, works with any weak learner (in our case the decision stumps).

Boosting ….. A boosting algorithm sequentially estimates and combines classifiers by re-weighting training examples (each time concentrating on the harder examples) – each component classifier is presented with a slightly different problem depending on the weights Base Algorithms –a set of “weak” binary (±1) classifiers h(x;p) such as decision stumps –normalized weights D 1 (i) on the training examples, initially set to uniform (D 1 (i) = 1 / m)

AdaBoost 1.At the t th iteration we find a weak classifier h(x;p t ) for which the classification error is better than chance. 2.The new component classifier is assigned “votes” based on its performance 3.The weights on the training examples are updated according to where Z t is a normalization factor.

AdaBoost

Gambling Uri Gari KT

Object Detection and Localization 3 Families of Approaches –Parts based Object defined as spatial arrangement of small parts. –Region based Use segmentation to extract a region of image from the background and deduce shape and texture info from its local features. –Patch based Use local features to classify each rectangular image region as object or background. Object detection is reduced to a binary classification problem i.e compute just P(O C i = 1 / v C i ) where O C i = 1 if patch i contains (part of) an object of class C v C i = the feature vector for patch i computed for class C.

Feature Vector for a Patch: Step 1

Feature Vector for a Patch: Step 2

Feature Vector for a Patch: Step 3

Summary: Feature Vector Extraction 12 * 30 *2 = 720 features

Filters and Spatial Templates

Object Detection ….. Do I need all the features for a given object class? If so, what features should I extract for a given object class? –Use training to learn which features are more important than others.

Classifier: Boosted Features What is available? –Training data is v = the features of the patch containing an object o. Weak learners pay attention to single features: –h t (v) picks best feature and threshold: Output is –h t (v) = output of weak classifier at round t –α t = weight assigned by boosting ~100 rounds of boosting

Examples of Learned Features

Example Detections

Using the Gist for Object Localization Use gist to predict the possible location of the object. Should I run my detectors only in that region? –No! Misses detection if the object is at any other location. –So, search everywhere but penalize those that are far from predicted locations. But how?

Using the Gist for Object Localization …. Construct a feature vector which combines the output of the boosted classifier, and the difference. Train another classifier to compute

Using the Gist for Object Localization ….

Summary Context Modeling –Previous Models –Scene based Context Model

Summary Context Modeling Context Based Applications –Place Identification –Object Priming –Control of Focus of Attention –Scale Selection –Scene Classification

Summary Context Modeling Context Based Applications Joint Local and Global Features Applications –Object Detection and Localization