Download presentation
Presentation is loading. Please wait.
Published byCiera Grater Modified over 9 years ago
1
Improving Human Action Recognition using Score Distribution and Ranking Minh Hoai Nguyen Joint work with Andrew Zisserman 1
2
2 Inherent Ambiguity: When does an action begin and end?
3
Precise Starting Moment? 3 -Hands are being extended? -Hands are in contact?
4
4 When Does the Action End? -Action extends over multiple shots -Camera shows a third person in the middle
5
Video clip Latent location of action Consider subsequences Max HandShake classifier Action Location as Latent Information HandShake scores Recognition score (in testing) Update the classifier (in training)
6
Poor Performance of Max 6 DatasetWholeMax Hollywood266.764.8 TVHID66.665.0 Mean Average Precision (higher is better) Possible reasons: The learned action classifier is far from perfect The output scores are noisy The maximum score is not robust Action recognition is … a hard problem
7
Video clip Latent location of action Considered subsequences HandShake classifier Can We Use Mean Instead? HandShake scores Mean On Hollywood2, Mean is generally better than Max WholeMaxMean Hollywood2-Handshake48.057.150.3 But not always
8
Another HandShake Example 8 The proportion of HandShake is small For Whole and Mean, the Signal-to-Noise ratio is small
9
Latent location of actionVideo clip HandShake scores Sampled subsequences Sort Improved HandShake score Distribution-based classification Base HandShake classifier Proposed Method: Use the Distribution
10
Case 1: equivalent to using Mean Learning Formulation Subsequence-score distribution Video label weights bias Hinge loss Weights for Distribution Emphasize the relative importance of classifier scores Special cases: Case 2: equivalent to using Max
11
Controlled Experiments 11 Random action location Synthetic video Two controlled parameters: -The action percentage -, the separation between non-action and action features
12
Controlled Experiments 12
13
Hollywood2 – Progress over Time 13 8.6%9.3% Best Published Results Mean Average Precision (higher is better)
14
Hollywood2 – State-of-the-art Methods 14 Dataset Introduction (STIP + scene context) Deep Learning features Mined compound features Dense Trajectory Descriptor (DTD) Improved DTD (better motion est.) DTD + saliency same Mean Average Precision (higher is better)
15
Results on TVHI Dataset 15 14.8% Mean Average Precision (higher is better)
16
Weights for SSD classifiers 16
17
AnswerPhone Example 1 17
18
AnswerPhone Example 2 18
19
The End 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.