Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
Outline Introduction Method Experiments Conclusions 2/15
Outline Introduction Method Experiments Conclusions 3/15
Action Recognition Action definition – a series of temporal motions Local motion – appearance evolution Global motion – motion evolution 4/15
Traditional Method Traditional method – local spatio-temporal features – encoding schemes Advantages – discriminative local motion – state-of-the-art performance Disadvantages – no global motion 5/15
Deep Method Feature learning – replace hand-crafted features with learned features – no global motion End-to-end architecture – hard to learn motion feature – high computational complexity 6/15
VideoDarwin VideoDarwin method [1] – function capable of ordering the frames temporally captures appearance evolution – regard frame sequence as ordered list, learn a ranking function – use the parameters as video representation 7/15 [1] B. Fernando et al., Modeling video evolution for action recognition. In CVPR, 2015.
Outline Introduction Method Experiments Conclusions 8/15
Hiearchical Motion Evolution (1/3) The weakness of VideoDarwin – one ranking machine can not capture the global ordering for long video sequence – sensitive to large appearance changes Proposed hierarchical motion evolution structure – abstract semantic information in a hierarchical way – capture global and high-level ordering of motion evolution – robust to large appearance changes 9/15
Hiearchical Motion Evolution (2/3) Hiearchical motion evolution – first layer: different ranking machines to model local order for video clips – second layer: another ranking machine to model global order 10/15
Hiearchical Motion Evolution (3/3) Robust to large appearance changes – action is composed of a series of ordered motions – output of first layer: local motion representation – second layer: model motion evolution 11/15
Outline Introduction Method Experiments Conclusions 12/15
Experiments MPII cooking activities dataset [3] ChaLearn 2013 Gesture dataset [6] 13/15 [1] B. Fernando et al., Modeling video evolution for action recognition. In CVPR, [2] T. Pfister et al., Domain-adaptive discriminative one-shot learning of gestures. In ECCV, [3] M. Rohrbach et al., A database for fine grained activity detection of cooking activities. In CVPR, [4] J. Wu et al., Fusing multi-modal features for gesture recognition. In ICMI, [5] A. Yao et al., Gesture recognition portfolios for personalization. In CVPR, [6] S. Escalera et al., Multi-modal gesture recognition challenge 2013: Dataset and results. In ICMI, 2013.
Parameter Evaluation 14/15
Outline Introduction Method Experiments Conclusions 15/15
Conclusions Propose a novel hierarchical method to learn video representation, considers both local motion and global motion. Our video representation achieve the state-of-the art results in fine-grained action and gesture recognition. 16/15
THANK YOU Suggestions Questions