Recognition: A machine learning approach Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, Kristen Grauman, and Derek Hoiem
The machine learning framework Apply a prediction function to a feature representation of the image to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow”
The machine learning framework y = f(x) Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) output prediction function Image feature
Steps Training Testing Training Labels Training Images Image Features Learned model Learned model Testing Image Features Prediction Test Image Slide credit: D. Hoiem
Features Raw pixels Histograms GIST descriptors …
Classifiers: Nearest neighbor Training examples from class 2 Training examples from class 1 Test example f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required!
Classifiers: Linear Find a linear function to separate the classes: f(x) = sgn(w x + b)
Recognition task and supervision Images in the training set must be annotated with the “correct answer” that the model is expected to produce Contains a motorbike
Unsupervised “Weakly” supervised Fully supervised Definition depends on task
Test set (labels unknown) Generalization Training set (labels known) Test set (labels unknown) How well does a learned model generalize from the data it was trained on to a new test set?
Generalization Components of generalization error Bias: how much the average model over all training sets differ from the true model? Error due to inaccurate assumptions/simplifications made by the model Variance: how much models estimated from different training sets differ from each other Underfitting: model is too “simple” to represent all the relevant class characteristics High bias and low variance High training error and high test error Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data Low bias and high variance Low training error and high test error
Bias-variance tradeoff Underfitting Overfitting Complexity Low Bias High Variance High Bias Low Variance Error Test error Training error Slide credit: D. Hoiem
Bias-variance tradeoff Complexity Low Bias High Variance High Bias Low Variance Test Error Few training examples Many training examples Note: these figures don’t work in pdf Slide credit: D. Hoiem
Effect of Training Size Fixed prediction model Number of Training Examples Error Testing Generalization Error Training Slide credit: D. Hoiem
Datasets Circa 2001: 5 categories, 100s of images per category Today: up to thousands of categories, millions of images
Caltech 101 & 256 http://www.vision.caltech.edu/Image_Datasets/Caltech101/ http://www.vision.caltech.edu/Image_Datasets/Caltech256/ Griffin, Holub, Perona, 2007 Fei-Fei, Fergus, Perona, 2004
Caltech-101: Intraclass variability
The PASCAL Visual Object Classes Challenge (2005-present) http://pascallin.ecs.soton.ac.uk/challenges/VOC/ Challenge classes: Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor
The PASCAL Visual Object Classes Challenge (2005-present) http://pascallin.ecs.soton.ac.uk/challenges/VOC/ Main competitions Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image
The PASCAL Visual Object Classes Challenge (2005-present) http://pascallin.ecs.soton.ac.uk/challenges/VOC/ “Taster” challenges Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise Person layout: Predicting the bounding box and label of each part of a person (head, hands, feet)
The PASCAL Visual Object Classes Challenge (2005-present) http://pascallin.ecs.soton.ac.uk/challenges/VOC/ “Taster” challenges Action classification
LabelMe http://labelme.csail.mit.edu/ Russell, Torralba, Murphy, Freeman, 2008
80 Million Tiny Images http://people.csail.mit.edu/torralba/tinyimages/
ImageNet http://www.image-net.org/