Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based image retrieval Datasets & Conclusions
Databases Caltech 101 Caltech 256 Pascal Visual Object Classes (VOC) LabelMe Slides from Andrew Zisserman
Caltech 101 Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. The size of each image is roughly 300 x 200 pixels. Collected in September 2003 by Fei-Fei Li, Marco Andreetto, and Marc 'Aurelio Ranzato. Train on 5, 10, 15, 20 or 30 images Test on rest – report results per class
Caltech 101 images
Smallest category size is 31 images: Too easy? –left-right aligned –Rotation artifacts –Soon will saturate performance Caltech-101: Drawbacks
Caltech-256 Smallest category size now 80 images About 30K images Harder –Not left-right aligned –No artifacts –Performance is halved –More categories New and larger clutter category
Caltech 256 images baseball-bat basketball-hoop dog kayac traffic light
The PASCAL Visual Object Classes (VOC) Dataset and Challenge Mark Everingham Luc Van Gool Chris Williams John Winn Andrew Zisserman
The PASCAL VOC Challenge Challenge in visual object recognition funded by PASCAL network of excellence Publicly available dataset of annotated images. Development kit available. Main competitions in classification (is there an X in this image) and detection (where are the X’s) “Taster competitions” in segmentation and 2-D human “pose estimation” (2007-present)
Dataset Content 20 classes: aeroplane, bicycle, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, train, TV Real images downloaded from flickr, not filtered for “quality” Complex scenes, scale, pose, lighting, occlusion,...
Annotation Complete annotation of all objects Annotated in one session with written guidelines Truncated Object extends beyond BB Occluded Object is significantly occluded within BB Pose Facing left Difficult Not scored in evaluation
Examples Aeroplane Bus BicycleBirdBoatBottle CarCatChairCow
History New dataset annotated annually –Annotation of test set is withheld until after challenge ImagesObjectsClassesEntries 20052,2322, Collection of existing and some new data ,3049, Completely new dataset from flickr (+MSRC) 20079,96324, Increased classes to 20. Introduced tasters ,77620,73920 Added “occlusion” flag. Reuse of taster data. Release detailed results to support “meta-analysis”
Main Challenge Tasks Classification –Is there a dog in this image? –Evaluation by precision/recall Detection –Localize all the people (if any) in this image –Evaluation by precision/recall based on bounding box overlap
Person detection Example Precision/Recall: 2007
Russell, Torralba, Freman, 2005 LabelMe
CMU/MIT frontal facesvasc.ri.cmu.edu/idb/html/face/frontal_images cbcl.mit.edu/software-datasets/FaceData2.html PatchesFrontal faces Graz-02 Databasewww.emt.tugraz.at/~pinz/data/GRAZ_02/Segmentation masksBikes, cars, people UIUC Image Databasel2r.cs.uiuc.edu/~cogcomp/Data/Car/Bounding boxesCars TU Darmstadt Databasewww.vision.ethz.ch/leibe/data/Segmentation masksMotorbikes, cars, cows LabelMe datasetpeople.csail.mit.edu/brussell/research/LabelMe/intro.htmlPolygonal boundary>500 Categories Caltech 101www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.htmlSegmentation masks101 categories Caltech 256 COIL www1.cs.columbia.edu/CAVE/research/softlib/coil-100.html Bounding Box Patches 256 Categories 100 instances NORBwww.cs.nyu.edu/~ylclab/data/norb-v1.0/Bounding box50 toys Databases for object localization Databases for object recognition On-line annotation tools ESP gamewww.espgame.orgGlobal image descriptionsWeb images LabelMepeople.csail.mit.edu/brussell/research/LabelMe/intro.htmlPolygonal boundaryHigh resolution images The next tables summarize some of the available datasets for training and testing object detection and recognition algorithms. These lists are far from exhaustive. Links to datasets Collections PASCALhttp:// boxesvarious
Topics not covered Context –Scene –Inter-object relations Video –Tracking & detection Multiple viewpoints
Summary Methods reviewed here –Bag of words –Bag of words with location –Parts and structure –Discriminative methods –Combined Segmentation and recognition –Recognition for retrieval Resources online: –Slides –Code –Links to datasets