Data Driven Attributes for Action Detection Week 5 Presented by Christina Peterson
Bag of Words Implemented the Bag of Words in Matlab For each video, collect low level features within the ground truth bounding box At each frame, the box is divided into 3 x 3 cells For each cell, create a histogram for each feature (STIP, color, texture) For each bounding box, create a histogram for each feature Feature Vector is the concatenation of the histograms One Feature Vector for every bounding box
Exemplar-SVM Obtained source code from Tomasz Malisiewicz’s website for ‘Ensemble of Exemplar SVMS for Object Detection and Beyond’ Do implementation based on this code, but designed for the UCF Sports dataset For each action class, create Exemplar-SVM for each bounding box of every video Optimize by reducing bounding box to one cycle of the action
Calibration Run each Exemplar-SVM on a validation set Use non-maximum suppression to remove redundant detections Compute the overlap score between resulting detections and ground-truth bounding-boxes Detections that overlap by more than 0.5 with ground truth are positive Detections with overlap by less than 0.2 with ground truth are negative Fit a logistic function to these scores
Goals Implement the Exemplar-SVMs Implement the calibration step as detailed in Malisiewicz et al. [1] Research other calibration methods to improve this step
References [1] T. Malisiewicz, A. Gupta, and A. A. Efros. Ensemble of Exemplar SVMS for Object Detection and Beyond. In Proc. ICCV, 2011. [2] Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal Deformable Part Models for Action Detection. In CVPR, 2013.