Action Recognition
Dataset UCF101 HMDB51 Kinetics
HMDB51 51 classes 7,000 clips
Kinetics 400 classes 300,000 clips
Architectures 3D Convnet 2D convnet → LSTM
3D Convnet Uses 3d kernel to interpret temporal data Is slower to train as it
2D Convnet → LSTM Can use image recongnition 2D convnet as a starting point to speed training Can be very deep due to using LSTM
Python, Tensorflow, Caffe, Examples Comfortable with Python Have used Tensorflow Need to finish installing Caffe Coding examples. (https://github.com/yjxiong/temporal-segment-networks) HMDB51