Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT ICCV 2005 Beijing Recognizing and Learning Object Categories Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT
For complete version of slides and source code from demos, visit the course web site: http://people.csail.mit.edu/torralba/iccv2005/
Program Introduction Session 1: Session 2: Session 3: Session 4: bag of words models Session 2: parts-based models Session 3: discriminative methods Session 4: concurrent segmentation and recognition Summary
Bag of words models An image is represented by a collection of “visual words” and their corresponding counts given a universal dictionary. Object categories are modeled by the distributions of these visual words. Although “bag of words” models can use both generative and discriminative approaches, here we will focus on generative models.
Part-based models An object in an image is represented by a collection of parts, characterized by both their visual appearances and locations. Object categories are modeled by the appearance and spatial distributions of these characteristic parts. Issues for such models include efficient methods for finding correspondences between the object and the scene.
Discriminative methods Object detection and recognition is formulated as a classification problem. The image is partitioned into a set of overlapping windows, and a decision is taken at each window about if it contains a target object or not. Each window is represented by extracting a large number of features that encode information such as boundaries, textures, color, spatial structure. The classification function, that maps an image window into a binary decision, is learnt using methods such as SVMs, boosting or neural networks. Zebra Non-zebra
Segmentation and Recognition The goal is to segment the image, at the pixel level, into foreground object and background clutter. To assist the segmentation, probabilistic models of the object category may be learnt. The problem may be formulated as one of graphical model inference, or graph partitioning.
Some chairs Related by function, not form
Some challenges
Some challenges
Some challenges
Links to datasets The next tables summarize some of the available datasets for training and testing object detection and recognition algorithms. These lists are far from exhaustive. Databases for object localization CMU/MIT frontal faces vasc.ri.cmu.edu/idb/html/face/frontal_images cbcl.mit.edu/software-datasets/FaceData2.html Patches Frontal faces Graz-02 Database www.emt.tugraz.at/~pinz/data/GRAZ_02/ Segmentation masks Bikes, cars, people UIUC Image Database l2r.cs.uiuc.edu/~cogcomp/Data/Car/ Bounding boxes Cars TU Darmstadt Database www.vision.ethz.ch/leibe/data/ Motorbikes, cars, cows LabelMe dataset people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary >500 Categories Databases for object recognition Caltech 101 www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Segmentation masks 101 categories COIL-100 www1.cs.columbia.edu/CAVE/research/softlib/coil-100.html Patches 100 instances NORB www.cs.nyu.edu/~ylclab/data/norb-v1.0/ Bounding box 50 toys On-line annotation tools ESP game www.espgame.org Global image descriptions Web images LabelMe people.csail.mit.edu/brussell/research/LabelMe/intro.html Polygonal boundary High resolution images