Download presentation
Presentation is loading. Please wait.
Published byDenis Giles Dennis Modified over 6 years ago
1
Temporal Order-Preserving Dynamic Quantization for Human Action Recognition from Multimodal Sensor Streams Jun Ye Kai Li Guo-Jun Qi Kien A. Hua University of Central Florida
2
Outline Background Problem, existing methods, challenges Our algorithm
Dynamic Temporal Quantization Multimodal Feature Fusion Performance study MSR-Action3D UTKinect-Action MSR-ActionPairs Conclusions
3
Background Depth sensors becomes affordable and popular
New human-computer interaction Gesture recognition Speech recognition Application domain Video games, education, business, healthcare
4
Problem and Challenges
Key problem: modeling the temporal dynamics of 3D human action/gestures Existing methods Histogram-based methods do not preserve order (bag-of-3d-words [5, 21], HOJ3D [16], HON4D [9] ) Temporal modeling suffer from video misalignment (motion template [7, 20], temporal pyramid [9, 14]) Challenge: temporal misalignment due to Temporal translation Execution rate variation
5
Dynamic Temporal Quantization Algorithm
Objective Modeling the temporal patterns of 3D actions according to the transition of sub-actions satisfying Frames with similar postures are clustered together (sub-action constraint) Temporal order of the sequence must be preserved (order-preserving) Dynamic Temporal Quantization Algorithm
6
Dynamic Temporal Quantization
Quantization: videos X1,X2,… Xn of varied length n quantized vector V1,V2,…Vm of fixed length m. Optimal frame assignment a Objective function: Optimal quantization can be obtained by jointly optimizing a and V
7
Dynamic Temporal Quantization (cont’d)
Nontrivial to jointly solve the frame assignment a Initialization: uniform partition Aggregation step: given fixed assignment a, vj is computed by the aggregation Assignment step: fixed the quantized vector V, update the assignment a by DTW Iterate until convergence.
8
Hierarchical representation
Multilayers of the Dynamic Quantization Top layers: global temporal patterns Bottom layers: local temporal patterns Concatenate all layers
9
Multimodel Feature Fusion
Multimodal features: joint coordinate pairwise angle joint offset [21] histogram of velocity components (HVC) Supervised learning for all quantized vectors Multiclass SVM Fusion by regression (softmax)
10
Experiments Experiments on three public 3D human action datasets
MSR-Action3D UTKinect-Action MSR-ActionPairs
11
Experiment: dynamic quantization VS deterministic quantization
outperforms deterministic quantization. MSR-Action3D dataset Feature Accuracy Dynamic quantization Deterministic quantization position 81.61% 76.24% angle 73.95% 71.65% offset 68.20% velocity 80.84% 72.80% fused 90.42% 83.15% Similar performances can be observed in the other two datasets.
12
Experiment: hierarchical representation
MSR-Action3D dataset with the joint coordinate feature Layers 1 2 3 4 5 Accuracy 66.28% 67.82% 71.26% 81.61% 77.39% More layers generally produce higher accuracy though need to take care of the overfitting.
13
Experiment: Comparison with state-of-the-art results
Method Accuracy Actionlet Ensemble [14] HON4D [9] DCSF [15] Lie Group [13] Super Normal Vector [18] Proposed method 88.2% 88.89% 89.3% 89.48% 93.09% 90.42% Method Accuracy Actionlet Ensemble [14] HON4D [9] HON4D + Ddisc [9] Super Normal Vector [18] Proposed method 82.22% 93.33% 96.67% 98.89% 93.71% MSR-Action3D dataset MSR-ActionPairs dataset Method Accuracy Histogram of 3D joints [17] Combined features with random forest [21] Lie Group [13] Proposed method 90.92% 91.9% 97.08% 100% UTKinect-Action dataset (100% accuracy)
14
Conclusions A novel algorithm for 3D human action sequence recognition from the perspective of dynamic temporal quantization. Extensive experiments on three public datasets demonstrate the effectiveness of the proposed technique for temporal modeling.
15
Thank you. Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.