A Generic Approach for Image Classification Based on Decision Tree Ensembles and Local Sub-windows Raphaël Marée, Pierre Geurts, Justus Piater, Louis Wehenkel University of Liège, Belgium Problem Many application domains require classification of characters, symbols, faces, 3D objects, textures, … Specific feature extraction methods must be manually adapted when considering a new application Approach Recent and generic ML algorithm based on decision tree ensembles and working directly on pixel values Extension with local sub-window extraction Results Competitive with the state of the art on four well known datasets: MNIST, ORL, COIL-100, OUTEX Encouraging results for robustness (generalisation, rotation, scaling, occlusion) Abstract 1
Image classification Many different kind of problems 2 Usually tackled using: Problem-specific feature extraction ie. extracting a reduced set of « interesting » features from the initially huge number of pixels + Learning or matching algorithm Our generic approach: Working directly on pixel values ie. without any feature extraction ie. images are described by integer values (grey or RGB intensities) of all pixels + Ensemble of decision trees
3 Ensemble of extremely randomized trees (extra-trees) Learning Top-down induction algorithm like classical decision tree (with tests at the internal nodes of the form [a k,l < a th ] that compare the value of the pixel at position (k,l) to a threshold a th ) but: Test attributes and thresholds in internal nodes are chosen randomly, Each tree is fully developed until it perfectly classifies images in the learning sample, Several extra-trees are built from the same learning sample. Testing Propagate the entire test image successively into all the trees (involves comparing pixel values to thresholds in test nodes) and assign to the image the majority class among the classes given by the trees. Global generic approach
Local generic approach 4 Extra-trees and Sub-windows Learning Given a window size w 1 x w 2 and a large number N w : Extract N w sub-windows at random from learning set images and assign to each sub-window the classification of its parent image; Build a model to classify these N w sub-windows by using the w 1 x w 2 pixel values that characterize them Testing Given the window size w 1 x w 2 : Extract all possible sub-windows of size w 1 x w 2 from test image; Apply the model on each sub-window; Assign to the image the majority class among the classes assigned to the sub-windows by the model
Experiments: description 5 Database specification Every image in each database is described by all its pixel values and belong to one class. DBs# images# features# classes MNIST (28x28x1) 10 ORL (92x112x1) 40 COIL (32x32x3) 100 OUTEX (128x128x3) 54 Database protocols Separation of each database in two independent sets: the learning set (LS) of pre-classified images used to build a model and the test set (TS) used to evaluate the model. MNIST LS: first images TS: last remaining images ORL 100 random runs: LS: 200 images TS: 200 remaining images COIL-100 LS: 1800 images (k*20°, k=0..17) TS: 5400 remaining images OUTEX LS: 432 images TS: 432 remaining images
6 Experiments: results DBsExtra-trees + Sub-windows State-of-the-art MNIST3.26%2.63% (w 1 =w 2 =24) 12% … 0.7% [1] ORL4.56% ± % ± 1.18 (w 1 =w 2 =32) 7.5% … 0% [2] COIL %0.39% (w 1 =w 2 =16) 12.5% … 0.1% [3] OUTEX64.35%2.78% (w 1 =w 2 =4) 9.5% … 0.2% [4] Error rates on test sets Computing times Learning on OUTEX Extra-trees: ± 5 sec Extra-trees + Sub-Windows: ± 8min Testing on OUTEX (one image) Extra-trees: < 1 msec Extra-trees + Sub-Windows: ± 0,6 sec [1] Y. LeCun and L. Bottou and Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, 1998 [2] R. Paredes and A. Perez-Cortes, Local representations and a direct voting scheme for face recognition, 2001 [3] S. Obrzalek and J. Matas, Object Recognition using Local Affine Frames on Distinguished Regions, 2002 [4] T. Mäenpää, M. Pietikäinen, and J. Viertola, Separating color and pattern information for color texture discrimination, 2002
7 Evaluation of Robustness Generalisation Rotation Scaling Occlusion Considering different learning sample sizes (COIL-100) Image-plane rotation of the test images (COIL-100) Scaled version of the test images, with model built from 32x32 images (COIL-100) Erasing right parts of the test images (COIL-100)
8 Conclusion Novel, generic, and simple method Competitive accuracy Our local generic method (Extra-trees + Sub-windows) is close to state-of-the-art methods without any problem-specific feature extraction but still slightly inferior to best results In practice, is it necessary to develop specific methods to have a slightly better accuracy ? Invariance Robustness to small transformations in test images Local approach more robust than global approach (many local feature vectors are left more or less intact by a given image transformation) 9 Future work directions Improving robustness Augmenting the learning sample with transformed versions of the original images Normalization of sub-window sizes and orientations Speed/accuracy trade-off for prediction Combining Sub-windows with other Machine Learning algorithms