NEIL: Extracting Visual Knowledge from Web Data Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta Carnegie Mellon University CS381V Visual Recognition - Paper Presentation
How to Train Object Detectors? …. Collect and label training data Train Detectors …. Time consuming Expensive
At a Large Scale? 800M+ images daily!
NEIL: Never Ending Image Learner Running 24 hours a day, 7 days a week Semantically understand Web images Build visual knowledge base Continue to build better classifiers and detectors
NEIL’s Knowledge Base Visual Instances Labeled by NEIL Object categories Scenes Attributes Relationships Object - Objects Scenes - Objects Objects – Attributes Scenes – Attributes
Objects Camry Slide credit: Xinlei Chen
Scenes Parking Lot Raceway Slide credit: Xinlei Chen
Attributes Round Shape Crowded Slide credit: Xinlei Chen
Object - Object Partonomy Taxonomy or Similarity Wheel is a part of Car Corolla is a kind of/looks similar to of Car Relationships Slide credit: Xinlei Chen
Object - Scene Car is found in Raceway Relationships Slide credit: Xinlei Chen
Object - Attribute Wheel is/has Round shape Relationships Slide credit: Xinlei Chen
Scene – Attribute Bamboo forest is/has Vertical lines Relationships Slide credit: Xinlei Chen
(0) Seed Images Desktop Computer Monitor Keyboard 1.No Bounding-boxes 2.Noise 3.Multiple Meanings (Polysemy) Slide credit: Xinlei Chen
(0) Seed Images Desktop Computer Monitor Keyboard (1) (2) (3) Desktop Computer (1) (2) (3) Monitor (1) (2) (3) Keyboard (1) (2) (3) Television (1) Subcategory Discovery Slide credit: Xinlei Chen
Car Slide credit: Xinlei Chen Exemplar Detectors
Affinity Graph Slide credit: Xinlei Chen
Falcon Slide credit: Xinlei Chen Polysemy
(0) Seed Images Desktop Computer Monitor Keyboard (1) (2) (3) Desktop Computer (1) (2) (3) Monitor (1) (2) (3) Keyboard (1) (2) (3) Television Desktop Computer (1) Desktop Computer (2) Desktop Computer (3) … Monitor (1) … (1) Subcategory Discovery (2) Train Models Slide credit: Xinlei Chen
Train Models Latent SVM Objects, Attributes CHOG Linear SVM Scenes, Attributes Color, Texton, HOG, SIFT, GIST … Your model? Slide credit: Xinlei Chen
(1) (2) (3) Desktop Computer (1) (2) (3) Monitor (1) (2) (3) Keyboard (1) (2) (3) Television Desktop Computer (1) Desktop Computer (2) Desktop Computer (3) … Monitor (1) … (1) Subcategory Discovery (2) Train Models (3) Relationship Discovery (0) Seed Images Desktop Computer Monitor Keyboard Slide credit: Xinlei Chen
Relationship Discovery Keyboard is a part of Desktop Computer Monitor is a part of Desktop Computer Television looks similar to Monitor Learned relationships: N Concepts Keyboard Desktop Computer Keyboard Desktop Computer Macro Vision Slide credit: Xinlei Chen
(1) (2) (3) Desktop Computer (1) (2) (3) Monitor (1) (2) (3) Keyboard (1) (2) (3) Television Keyboard is a part of Desktop Computer Monitor is a part of Desktop Computer Television looks similar to Monitor Learned relationships: Desktop Computer (1) Desktop Computer (2) Desktop Computer (3) … Monitor (1) … (1) Subcategory Discovery (2) Train Models (3) Relationship Discovery (0) Seed Images Desktop Computer Monitor Keyboard Slide credit: Xinlei Chen
(2) Retrain Models (1) (2) (3) Desktop Computer (1) (2) (3) Monitor (1) (2) (3) Keyboard (1) (2) (3) Television Desktop Computer (1) Desktop Computer (2) Desktop Computer (3) … Monitor (1) … (1) Subcategory Discovery (2) Train Models (3) Relationship Discovery Desktop ComputerMonitorTelevision (4) Add New Instances (0) Seed Images Desktop Computer Monitor Keyboard Keyboard is a part of Desktop Computer Monitor is a part of Desktop Computer Television looks similar to Monitor Learned relationships: Slide credit: Xinlei Chen
Experimental Results Scene Classification A dataset of 600 images 12 scene categories Flickr images mAP Seed Classifier (15 Google Images)0.52 Bootstrapping (without relationships)0.54 NEIL Scene Classifiers0.57 NEIL (Classifiers + Relationships)0.62
Experimental Results Object Detection A dataset of 1000 images 15 object categories Flickr images mAP Latent SVM (450 Google Images)0.28 Latent SVM (450, Aspect Ratio Clustering)0.30 Latent SVM (450, HOG-based Clustering)0.33 Seed Detector (NEIL Clustering)0.44 Bootstrapping (without relationships)0.45 NEIL Detector0.49 NEIL Detector + Relationships0.51
Examples of Bounding Box Labeling
Examples of extracted common sense relationships
Discussion Points When do trained detectors converge Incorporate language models How to utilize deep neural networks for NEIL Train models on Internet images and test on existing benchmarks, possibly add domain adaptation to improve performance Utilize google extended image search for subcategory discovery
Related Work Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta. Enriching Visual Knowledge Bases via Object Discovery and Segmentation. CVPR 2014.
Related Work Chen Xinlei, and Abhinav Gupta. Webly supervised learning of convolutional networks. ICCV 2015
Related Work Divvala, Santosh, Ali Farhadi, and Carlos Guestrin. Learning everything about anything: Webly- supervised visual concept learning. CVPR 2014
Related Work Carlson, Andrew, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr, and Tom M. Mitchell. Toward an Architecture for Never-Ending Language Learning. AAAI (NELL)