CNN-based Action Recognition Using Adaptive Multiscale Depth Motion Maps And Stable Joint Distance Maps Junyou He, Hailun Xia, Chunyan Feng, Yunfei Chu.

Slides:

Advertisements

Similar presentations

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.

Advertisements

Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Large Lump Detection by SVM Sharmin Nilufar Nilanjan Ray.

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Vision Computing An Introduction. Visual Perception Sight is our most impressive sense. It gives us, without conscious effort, detailed information about.

TelosCAM: Identifying Burglar Through Networked Sensor-Camera Mates with Privacy Protection Presented by Qixin Wang Shaojie Tang, Xiang-Yang Li, Haitao.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Oral Defense by Sunny Tang 15 Aug 2003

DVMM Lab, Columbia UniversityVideo Event Recognition Video Event Recognition: Multilevel Pyramid Matching Dong Xu and Shih-Fu Chang Digital Video and Multimedia.

Computer vision.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Presented by Tienwei Tsai July, 2005

COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.

Video Based Palmprint Recognition Chhaya Methani and Anoop M. Namboodiri Center for Visual Information Technology International Institute of Information.

Multimodal Information Analysis for Emotion Recognition

Spatio-temporal constraints for recognizing 3D objects in videos Nicoletta Noceti Università degli Studi di Genova.

AUTOMATIC TARGET RECOGNITION OF CIVILIAN TARGETS September 28 th, 2004 Bala Lakshminarayanan.

Human Action Recognition from RGB-D Videos Oliver MacNeely YSP 2015.

Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.

Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.

Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.

1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.

Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Case Study 1 Semantic Analysis of Soccer Video Using Dynamic Bayesian Network C.-L Huang, et al. IEEE Transactions on Multimedia, vol. 8, no. 4, 2006 Fuzzy.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Skeleton Based Action Recognition with Convolutional Neural Network

Dense Color Moment: A New Discriminative Color Descriptor Kylie Gorman, Mentor: Yang Zhang University of Central Florida I.Problem:  Create Robust Discriminative.

Stereo Vision Local Map Alignment for Robot Environment Mapping Computer Vision Center Dept. Ciències de la Computació UAB Ricardo Toledo Morales (CVC)

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

3D Motion Classification Partial Image Retrieval and Download Multimedia Project Multimedia and Network Lab, Department of Computer Science.

1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

EE368 Final Project Spring 2003

Naifan Zhuang, Jun Ye, Kien A. Hua

Unsupervised Learning of Video Representations using LSTMs

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Gait Recognition Gökhan ŞENGÜL.

Project 1: hybrid images

Deep Predictive Model for Autonomous Driving

Saliency-guided Video Classification via Adaptively weighted learning

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

Gender Classification Using Scaled Conjugate Gradient Back Propagation

Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Jun Ye Kai Li Guo-Jun Qi Kien.

Speech Enhancement with Binaural Cues Derived from a Priori Codebook

Supervised Time Series Pattern Discovery through Local Importance

Compositional Human Pose Regression

Yun-FuLiu Jing-MingGuo Che-HaoChang

Neural networks (3) Regularization Autoencoder

Real-Time Human Pose Recognition in Parts from Single Depth Image

Self-Organizing Maps for Content-Based Image Database Retrieval

Dynamical Statistical Shape Priors for Level Set Based Tracking

Context-Aware Modeling and Recognition of Activities in Video

Globally Optimal Generalized Maximum Multi Clique Problem (GMMCP) using Python code for Pedestrian Object Tracking By Beni Mulyana.

Two-Stream Convolutional Networks for Action Recognition in Videos

RGB-D Image for Scene Recognition by Jiaqi Guo

Oral presentation for ACM International Conference on Multimedia, 2014

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

Outline Background Motivation Proposed Model Experimental Results

Neural networks (3) Regularization Autoencoder

Automatic Handwriting Generation

Human-object interaction

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

End-to-End Speech-Driven Facial Animation with Temporal GANs

Week 7 Presentation Ngoc Ta Aidean Sharghi

Presentation transcript:

CNN-based Action Recognition Using Adaptive Multiscale Depth Motion Maps And Stable Joint Distance Maps Junyou He, Hailun Xia, Chunyan Feng, Yunfei Chu Beijing University of Posts and Telecommunications GlobalSIP 2018 Nov. 27, 2018 2

Adaptive Multiscale Depth Motion Maps(AM-DMMs) OUTLINE Motivations The Proposed Method： Adaptive Multiscale Depth Motion Maps(AM-DMMs) Stable Joint Distance Maps(SJDMs) Input Preprocessing Network Training & Class Score Fusion Experiments Results Conclusions CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Advantages of depth modality: providing 3D structural information Motivations Advantages of depth modality: providing 3D structural information insensitive to variations in lighting Contains significant flicker noises But Depth map Skeleton data : more robust to noise But Not always reliable Each modality can capture a certain kind of information that is likely to be complementary to the other Thus, integrating the information from depth and skeleton is expected to improve the recognition performance CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

The Spatio-Temporal Information Motivations Action Recognition The Spatio-Temporal Information Key Handcrafted features: SIFT, color histogram, edge direction … Domain knowledge Shallow & Dataset-dependent But Difficult to memorize the entire sequence information Difficult to extract high-level features But RNN-based methods Thus, we propose a compact and effective CNN based method to capture the spatio-temporal information . CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

The Proposed Method Generate AM-DMMs Generate SJDMs Input Preprocessing Network Training & Class Score Fusion CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Adaptive Multiscale Depth Motion Maps(AM-DMMs) To capture more details of shape and motion information and cope with speed variations in actions Suffer from loss of temporal information DMMs AM-DMMs capture the detailed motion cues cope with speed variations CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Adaptive Multiscale Depth Motion Maps(AM-DMMs) AM-DMMs generated from a sample video of the action Swipeleft on three views DMM of a depth video sequence with N frames The motion energy E(i) of ith frame returns the number of non-zero elements in a binary map represents frame index is the projected map of Frame under projection view CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

The Proposed Method Generate AM-DMMs Generate SJDMs Input Preprocessing Network Training & Class Score Fusion CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Stable Joint Distance Maps(SJDMs) To avoid excessive noise, three reference joints which are stable in most actions are used to compute relative distances of the other joints The Euclidean distance at frame t is the joint indices is one of three stable joints CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Stable Joint Distance Maps(SJDMs) The distances to different stable joints contain different spatial relationships and useful structural information of the skeleton is expressed as follows: corresponding to CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

The Proposed Method Generate AM-DMMs Generate SJDMs Input Preprocessing Network Training & Class Score Fusion CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Input Preprocessing Resized maps to make them compatible with the pre-trained CNN model and solve the variable-length problem HSV-color coding has highlighted the differences in texture and edges Sample color coded AM-DMMs and SJDMs generated by the proposed method on UTD-MHAD dataset CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

The Proposed Method Generate AM-DMMs Generate SJDMs Input Preprocessing Network Training & Class Score Fusion CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Network Training and Class Score Fusion A multi-channel CNN is adopted to exploit the discriminative features Two fusion methods are expressed as follows are score probability vectors is the element-wise multiplication are the accuracy of the corresponding network is a function to find the index of the element having the maximum score CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

skeleton joint positions inertial sensor signals Experiments: dataset UTD-MHAD dataset:multimodal action dataset RGB videos depth videos skeleton joint positions inertial sensor signals UTD-MHAD dataset: contains 27 different actions and each action is performed by 8 subjects (4 females and 4 males) and with up to 4 repetitions CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Experiments Result The effectiveness of different schemes and the results of individual CNN and two fusion methods Comparisons of the different scheme on UTDMHAD dataset CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Experiments Result The performance of the proposed method and the results reported before on UTD-MHAD dataset CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Experiments Result Confusion matrix of proposed method on the UTD-MHAD dataset CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

The proposed AM-DMMs capture more shape clues and details of motion. Conclusions Presents an effective method for action recognition using a nine-channel CNN The fusion of depth and skeleton modalities is proposed to improve the classification accuracy The proposed AM-DMMs capture more shape clues and details of motion. Transform one skeleton sequence into three SJDMs which describe different spatial relationships between joints CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS

Thanks! Junyou He @BUPT 12211006@bupt.edu.cn CNN-BASED ACTION RECOGNITION USING ADAPTIVE MULTISCALE DEPTH MOTION MAPS AND STABLE JOINT DISTANCE MAPS