Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,

Slides:



Advertisements
Similar presentations
Practical Camera Auto-Calibration Based on Object Appearance and Motion for Traffic Scene Visual Surveillance Zhaoxiang Zhang, Min Li, Kaiqi Huang and.
Advertisements

A Discriminative Key Pose Sequence Model for Recognizing Human Interactions Arash Vahdat, Bo Gao, Mani Ranjbar, and Greg Mori ICCV2011.
DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Limin Wang, Yu Qiao, and Xiaoou Tang
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
Patch to the Future: Unsupervised Visual Prediction
Juergen Gall Action Recognition.
BoVDW: Bag-of-Visual-and-Depth- Words for Gesture Recognition All rights reserved HuBPA© Human Pose Recovery and Behavior Analysis Antonio Hernández-Vela.
Probability-based Dynamic Time Warping for Gesture Recognition on RGB-D data All rights reserved HuBPA© Human Pose Recovery and Behavior Analysis Group.
Robust Object Tracking via Sparsity-based Collaborative Model
SOMM: Self Organizing Markov Map for Gesture Recognition Pattern Recognition 2010 Spring Seung-Hyun Lee G. Caridakis et al., Pattern Recognition, Vol.
Potential Projects RGBD gesture recognition with the Microsoft Kinect Person recognition by parts.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
Agenda The Subspace Clustering Problem Computer Vision Applications
DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,
Online Dictionary Learning for Sparse Coding International Conference on Machine Learning, 2009 Julien Mairal, Francis Bach, Jean Ponce and Guillermo Sapiro.
Bag of Video-Words Video Representation
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
End-to-End Text Recognition with Convolutional Neural Networks
Video Tracking Using Learned Hierarchical Features
Tactic Analysis in Football Instructors: Nima Najafzadeh Mahdi Oraei Spring
Week 9 Presented by Christina Peterson. Recognition Accuracies on UCF Sports data set Method Accuracy (%)DivingGolfingKickingLiftingRidingRunningSkating.
Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
COT: Contextual Operating Tensor for Context-aware Recommender Systems Center for Research on Intelligent Perception And Computing (CRIPAC) National Lab.
A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
模式识别国家重点实验室 中国科学院自动化研究所 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences Context Enhancement of Nighttime.
Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.
模式识别国家重点实验室 中国科学院自动化研究所 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences Matching Tracking Sequences Across.
ACADS-SVMConclusions Introduction CMU-MMAC Unsupervised and weakly-supervised discovery of events in video (and audio) Fernando De la Torre.
Recognition Using Visual Phrases
AAM based Face Tracking with Temporal Matching and Face Segmentation Mingcai Zhou 1 、 Lin Liang 2 、 Jian Sun 2 、 Yangsheng Wang 1 1 Institute of Automation.
Skeleton Based Action Recognition with Convolutional Neural Network
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Facial Smile Detection Based on Deep Learning Features Authors: Kaihao Zhang, Yongzhen Huang, Hong Wu and Liang Wang Center for Research on Intelligent.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Deeply learned face representations are sparse, selective, and robust
Guillaume-Alexandre Bilodeau
Saliency-guided Video Classification via Adaptively weighted learning
Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Jun Ye Kai Li Guo-Jun Qi Kien.
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Compositional Human Pose Regression
Face recognition using improved local texture pattern
ICCV Hierarchical Part Matching for Fine-Grained Image Classification
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Mean Euclidean Distance Error (mm)
IEEE ICIP Feature Normalization for Part-Based Image Classification
Rob Fergus Computer Vision
Two-Stream Convolutional Networks for Action Recognition in Videos
Papers 15/08.
Outline Background Motivation Proposed Model Experimental Results
Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences 1, Zhizhong.
Paper Reading Dalong Du April.08, 2011.
Comparison of EET and Rank Pooling on UCF101 (split 1)
Heterogeneous convolutional neural networks for visual recognition
Human-object interaction
Presented By: Harshul Gupta
Week 3 Presentation Ngoc Ta Aidean Sharghi.
Bidirectional LSTM-CRF Models for Sequence Tagging
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Presentation transcript:

Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences

Outline Introduction Method Experiments Conclusions 2/15

Outline Introduction Method Experiments Conclusions 3/15

Action Recognition Action definition – a series of temporal motions Local motion – appearance evolution Global motion – motion evolution 4/15

Traditional Method Traditional method – local spatio-temporal features – encoding schemes Advantages – discriminative local motion – state-of-the-art performance Disadvantages – no global motion 5/15

Deep Method Feature learning – replace hand-crafted features with learned features – no global motion End-to-end architecture – hard to learn motion feature – high computational complexity 6/15

VideoDarwin VideoDarwin method [1] – function capable of ordering the frames temporally captures appearance evolution – regard frame sequence as ordered list, learn a ranking function – use the parameters as video representation 7/15 [1] B. Fernando et al., Modeling video evolution for action recognition. In CVPR, 2015.

Outline Introduction Method Experiments Conclusions 8/15

Hiearchical Motion Evolution (1/3) The weakness of VideoDarwin – one ranking machine can not capture the global ordering for long video sequence – sensitive to large appearance changes Proposed hierarchical motion evolution structure – abstract semantic information in a hierarchical way – capture global and high-level ordering of motion evolution – robust to large appearance changes 9/15

Hiearchical Motion Evolution (2/3) Hiearchical motion evolution – first layer: different ranking machines to model local order for video clips – second layer: another ranking machine to model global order 10/15

Hiearchical Motion Evolution (3/3) Robust to large appearance changes – action is composed of a series of ordered motions – output of first layer: local motion representation – second layer: model motion evolution 11/15

Outline Introduction Method Experiments Conclusions 12/15

Experiments MPII cooking activities dataset [3] ChaLearn 2013 Gesture dataset [6] 13/15 [1] B. Fernando et al., Modeling video evolution for action recognition. In CVPR, [2] T. Pfister et al., Domain-adaptive discriminative one-shot learning of gestures. In ECCV, [3] M. Rohrbach et al., A database for fine grained activity detection of cooking activities. In CVPR, [4] J. Wu et al., Fusing multi-modal features for gesture recognition. In ICMI, [5] A. Yao et al., Gesture recognition portfolios for personalization. In CVPR, [6] S. Escalera et al., Multi-modal gesture recognition challenge 2013: Dataset and results. In ICMI, 2013.

Parameter Evaluation 14/15

Outline Introduction Method Experiments Conclusions 15/15

Conclusions Propose a novel hierarchical method to learn video representation, considers both local motion and global motion. Our video representation achieve the state-of-the art results in fine-grained action and gesture recognition. 16/15

THANK YOU Suggestions Questions