Download presentation
Presentation is loading. Please wait.
Published byAdriana Pennings Modified over 10 years ago
1
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei 1 Stanford University
2
2 Action Classification in Still Images Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 Riding bike
3
3 Action Classification in Still Images Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … - Semantic concepts – Attributes Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike
4
4 Action Classification in Still Images - Semantic concepts – Attributes - Objects Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike
5
5 Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike
6
6 Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Riding Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike
7
7 Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts High-level representation Parts riding a bike wearing a helmet Peddling the pedal sitting on bike seat Farhadi et al., 2009 Lampert et al., 2009 Berg et al., 2010 Parikh & Grauman, 2011 Gupta et al., 2009 Yao & Fei-Fei, 2010 Torresani et al., 2010 Li et al., 2010 Yang et al., 2010 Maji et al., 2011 Liu et al., 2011 Incorporate human knowledge; More understanding of image content; More discriminative classifier. Action Classification in Still Images Riding bike
8
Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 8
9
Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 9
10
10 Action Attributes and Parts Attributes: …… semantic descriptions of human actions
11
11 Action Attributes and Parts Attributes: …… semantic descriptions of human actions Riding bike Not riding bike Lampert et al., 2009 Berg et al., 2010 Discriminative classifier, e.g. SVM
12
12 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… A pre-trained detector Object Bank, Li et al., 2010 Poselet, Bourdev & Malik, 2009
13
13 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a : Image feature vector
14
14 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a : Image feature vector … Action bases Φ
15
15 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a : Image feature vector … Action bases Φ
16
16 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a : Image feature vector … Action bases Φ
17
17 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector
18
18 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector Sparse Encodes context Robust to initially weak detections
19
Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 19
20
20 Bases of Atr. & Parts: Training w Φ a Input: Output: sparse L1 regularization, sparsity of W Elastic net, sparsity of [Zou & Hasti, 2005] Accurate approximation Jointly estimate and : ΦW Optimization: stochastic gradient descent. Φ …
21
21 Bases of Atr. & Parts: Testing … w Φ a Input: Output:sparse Estimate w : Optimization: stochastic gradient descent. L1 regularization, sparsity of W Accurate approximation
22
Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 22
23
23 PASCAL VOC 2010 Action Dataset Figure credit: Ivan Laptev 9 classes, 50-100 trainval / testing images per class 14 attributes – trained from the trainval images; 27 objects – taken from Li et al, NIPS 2010; 150 poselets – taken from Bourdev & Malik, ICCV 2009.
24
24 VOC 2010: Classification Result Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking Average precision Our method, use “a” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP … w Φ a
25
25 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer VOC 2010: Classification Result
26
26 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets VOC 2010: Analysis of Bases
27
27 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets VOC 2010: Analysis of Bases
28
28 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets VOC 2010: Analysis of Bases
29
29 VOC 2010: Control Experiment … w Φ a Mean average precision Use “a” Use “w” A: attribute O: object P: poselet
30
30 PASCAL VOC 2011 Result Our method ranks the first in nine out of ten classes in comp10. Others’ best in comp9 Others’ best in comp10 Our method Jumping71.659.566.7 Phoning50.731.341.1 Playing instrument77.545.660.8 Reading37.827.842.2 Riding bike88.884.490.5 Riding horse90.288.392.2 Running87.977.686.2 Taking photo25.731.028.8 Using computer58.947.463.5 Walking59.557.664.2
31
31 PASCAL VOC 2011 Result Others’ best in comp9 Others’ best in comp10 Our method Jumping71.659.566.7 Phoning50.731.341.1 Playing instrument77.545.660.8 Reading37.827.842.2 Riding bike88.884.490.5 Riding horse90.288.392.2 Running87.977.686.2 Taking photo25.731.028.8 Using computer58.947.463.5 Walking59.557.664.2 Our method achieves the best performance in five out of ten classes if we consider both comp9 and comp10.
32
32 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc.
33
33 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc. Riding bike Fixing bike
34
34 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc. Writing on board Writing on paper
35
35 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc. Drinking Gardening Smoking Cigarette
36
36 Stanford 40 Actions: Result We use 45 attributes, 81 objects, and 150 poselets. Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline. Average precision
37
37 Stanford 40 Actions: Result Average precision
38
Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 38
39
39 Conclusion Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector
40
40 Acknowledgement
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.