Taking Computer Vision Into The Wild Neeraj Kumar October 4, 2011 CSE 590V – Fall 2011 University of Washington.

Slides:

Advertisements

Similar presentations

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Linear Classifiers (perceptrons)

Machine learning continued Image source:

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Lecture 31: Modern object recognition

1 Building a Dictionary of Image Fragments Zicheng Liao Ali Farhadi Yang Wang Ian Endres David Forsyth Department of Computer Science, University of Illinois.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Discriminative and generative methods for bags of features

Recognition: A machine learning approach

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

1 Model Fitting Hao Jiang Computer Science Department Oct 8, 2009.

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Fitting a Model to Data Reading: 15.1,

Recap: HOGgles Data, Representation, and Learning matter. – This work looked just at representation By creating a human-understandable HoG visualization,

Predicting Matchability - CVPR 2014 Paper -

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Generic object detection with deformable part-based models

Crash Course on Machine Learning

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Exercise Session 10 – Image Categorization

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Data mining and machine learning A brief introduction.

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

Human Computation and Computer Vision CS143 Computer Vision James Hays, Brown University.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.

Locality-constrained Linear Coding for Image Classification

School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Methods for classification and image representation

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Recognition Using Visual Phrases

Chapter 13 (Prototype Methods and Nearest-Neighbors )

CS 1699: Intro to Computer Vision Active Learning Prof. Adriana Kovashka University of Pittsburgh November 24, 2015.

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.

CS 2750: Machine Learning Active Learning and Crowdsourcing

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Strong Supervision from Weak Annotation: Interactive Training of Deformable Part Models S. Branson, P. Perona, S. Belongie.

Object detection with deformable part-based models

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Data Driven Attributes for Action Detection

Learning Mid-Level Features For Recognition

Presented by Minh Hoai Nguyen Date: 28 March 2007

Nonparametric Semantic Segmentation

Human-in-the-loop ECE6504 Xiao Lin.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Taking Computer Vision Into The Wild Neeraj Kumar October 4, 2011 CSE 590V – Fall 2011 University of Washington

A Joke Q. What is computer vision? A. If it doesn’t work (in the wild), it’s computer vision. (I’m only half-joking)

Instant Object Recognition Paper*

1.Design new algorithm 2.Pick dataset(s) to evaluate on 3.Repeat until conference deadline: a.Train classifiers b.Evaluate on test set c.Tune parameters and tweak algorithm 4.Brag about results with ROC curves - Fixed set of training examples - Fixed set of classes/objects *Just add grad students - Training examples only have one object, often in center of image - Fixed test set, usually from same overall dataset as training - MTurk filtering, pruning responses, long training times, … - How does it do on real data? New classes?

Instant Object Recognition Paper 1.User proposes new object class 2.System gathers images from flickr 3.Repeat until convergence: a.Choose windows to label b.Get labels from MTurk c.Improve classifier (detector) 4.Also evaluate on Pascal VOC [S. Vijayanarasimhan & K. Grauman – Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds (CVPR 2011)] - Which windows to pick? - Which images to label? - What representation? - How does it compare to state of the art?

Object Representation Root from here Deformable Parts: Root + Parts + Context P=6 parts, from bootstrap set C=3 context windows, excluding object candidate, defined to the left, right, above

Features: Sparse Max Pooling Bag of WordsSparse Max Pooling Base featuresSIFT Build vocabulary tree ✔✔ Quantize featuresNearest neighbor, hard decision Weighted nearest neighbors, sparse coded Aggregate featuresSpatial pyramidMax pooling [Y.-L. Boureau, F. Bach, Y. LeCun, J. Ponce – Learning Mid-level Features for Recognition (CVPR 2010] [J. Yang, K. Yu, Y. Gong, T. Huang – Linear Spatial Pyramid Matching Sparse Coding for Image Classification (CVPR 2009)]

How to Generate Root Windows? 100,000s of possible locations, aspect ratios, sizes 1000s of images X = too many possibilities!

Jumping Windows Training ImageNovel Query Image Build lookup table of how frequently given feature in a grid cell predicts bounding box Use lookup table to vote for candidate windows in query image a la generalized Hough transform

Pick Examples via Hyperplane Hashing [P. Jain, S. Vijayanarasimhan & K. Grauman – Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning (NIPS 2010)] Want to label “hard” examples near the hyperplane boundary But hyperplane keeps changing, so have to recompute distances… Instead, hash all unlabeled examples into table At run-time, hash current hyperplane to get index into table, to pick examples close to it

Comparison on Pascal VOC Comparable to state-of-the-art, better on few classes Many fewer annotations required! Training time is 15mins vs 7 hours (LSVM) vs 1 week (SP+MKL)

Online Live Learning for Pascal Comparable to state-of-the-art, better on fewer classes But using flickr data vs. Pascal data, and automatically

Sample Results Correct Incorrect

Lessons Learned It is possible to leave the sandbox And still do well on sandbox evaluations Sparse max pooling with a part model works well Linear SVMs can be competitive with these features Jumping windows is MUCH faster than sliding Picking examples to get labeled is a big win Linear SVMs also allow for fast hyperplane hashing

Limitations “Hell is other people” Jean-Paul Sartre users With apologies to

It doesn’t work well enough Solving Real Problems for Users Users want to do stuff Users express their displeasure gracefully *With apologies to John Gabriel

…And Never The Twain Shall Meet? Pascal VOC Results from Previous Paper Current Best of Vision Algorithms User Expectation ?

Segmentation Detection Shape Estimation Optical Flow Recognition Stereo Tracking Classification Geometry Simplify Problem! Unsolved Vision Problems

Columbia University University of Maryland Smithsonian Institution

Easier Segmentation for Leafsnap

Plants vs Birds 2d3d Doesn’t moveMoves Okay to pluck from treeNot okay to pluck from tree Mostly single colorMany colors Very few partsMany parts Adequately described by boundaryNot well described by boundary Relatively easy to segmentHard to segment

Human-Computer Cooperation [S. Branson, C. Wah, F. Schroff, B. Babenko, P. Welinder, P. Perona, S. Belongie – Visual Recognition with Humans in the Loop (ECCV 2010)] What color is it? Red! Where’s the beak? Top-right! Where’s the tail? Bottom-left! Describe its beak Uh, it’s pointy? Where is it? Okay. Bottom-left!

20 Questions

Information Gain for 20Q Pick most informative question to ask next Expected information gain of class c, given image & previous responses Probability of getting response u i, given image & previous responses Entropy of class c, given image and possible new response u i Entropy of class c right now

Answers make distribution peakier

Incorporating Computer Vision Probability of class c, given image and any set of responses Bayes’ rule Assume variations in user responses are NOT image-dependent Probabilities affect entropies!

Incorporating Computer Vision… …leads to different questions

Ask for User Confidences

Modeling User Responses is Effective!

Birds-200 Dataset

Results

With fewer questions, CV does better With more questions, humans do better

Lessons Learned Computer vision is not (yet) good enough for users But users can meet vision halfway Minimizing user effort is key! Users are not to be trusted (fully) Adding vision improves recognition For fine-scale categorization, attributes do better than 1-vs-all classifiers if there are enough of them Classifier 200 (1-vs-all) 288 attr.100 attr.50 attr.20 attr.10 attr. Avg # Questions

Limitations Real system still requires much human effort Only birds Collecting and labeling data Crowdsourcing? Experts? Building usable system Minimizing

Visipedia