Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and Li Fei-Fei 1 Stanford University

2 Action Classification in Still Images Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 Riding bike

3 Action Classification in Still Images Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … - Semantic concepts – Attributes Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

4 Action Classification in Still Images - Semantic concepts – Attributes - Objects Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

5 Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

6 Action Classification in Still Images - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals … Riding Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 High-level representation Riding bike

7 Low level feature Yao & Fei-Fei, 2010 Koniusz et al., 2010 Delaitre et al., 2010 Yao et al., 2011 - Semantic concepts – Attributes - Objects - Human poses - Contexts of attributes & parts High-level representation Parts riding a bike wearing a helmet Peddling the pedal sitting on bike seat Farhadi et al., 2009 Lampert et al., 2009 Berg et al., 2010 Parikh & Grauman, 2011 Gupta et al., 2009 Yao & Fei-Fei, 2010 Torresani et al., 2010 Li et al., 2010 Yang et al., 2010 Maji et al., 2011 Liu et al., 2011 Incorporate human knowledge; More understanding of image content; More discriminative classifier. Action Classification in Still Images Riding bike

Intuition: Action Attributes and Parts Algorithm: Learning Bases of Attributes and Parts Experiments: PASCAL VOC & Stanford 40 Actions Conclusion Outline 8

10 Action Attributes and Parts Attributes: …… semantic descriptions of human actions

11 Action Attributes and Parts Attributes: …… semantic descriptions of human actions Riding bike Not riding bike Lampert et al., 2009 Berg et al., 2010 Discriminative classifier, e.g. SVM

12 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… A pre-trained detector Object Bank, Li et al., 2010 Poselet, Bourdev & Malik, 2009

13 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a : Image feature vector

14 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… Attribute classification Object detection Poselet detection a : Image feature vector … Action bases Φ

15 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a : Image feature vector … Action bases Φ

16 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… a : Image feature vector … Action bases Φ

17 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector

18 Action Attributes and Parts Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector Sparse Encodes context Robust to initially weak detections

20 Bases of Atr. & Parts: Training w Φ a Input: Output: sparse L1 regularization, sparsity of W Elastic net, sparsity of [Zou & Hasti, 2005] Accurate approximation Jointly estimate and : ΦW Optimization: stochastic gradient descent. Φ …

21 Bases of Atr. & Parts: Testing … w Φ a Input: Output:sparse Estimate w : Optimization: stochastic gradient descent. L1 regularization, sparsity of W Accurate approximation

23 PASCAL VOC 2010 Action Dataset Figure credit: Ivan Laptev 9 classes, 50-100 trainval / testing images per class 14 attributes – trained from the trainval images; 27 objects – taken from Li et al, NIPS 2010; 150 poselets – taken from Bourdev & Malik, ICCV 2009.

24 VOC 2010: Classification Result Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Using computer Walking Average precision Our method, use “a” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP … w Φ a

25 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer VOC 2010: Classification Result

26 … w Φ a Phoning Playing instrument Reading Riding bike Riding horse Running Taking photo Walking Our method, use “a” Our method, use “w” Poselet, Maji et al, 2011 SURREY_MK UCLEAR_DOSP Average precision Using computer 400 action bases attributes objects poselets VOC 2010: Analysis of Bases

29 VOC 2010: Control Experiment … w Φ a Mean average precision Use “a” Use “w” A: attribute O: object P: poselet

30 PASCAL VOC 2011 Result Our method ranks the first in nine out of ten classes in comp10. Others’ best in comp9 Others’ best in comp10 Our method Jumping71.659.566.7 Phoning50.731.341.1 Playing instrument77.545.660.8 Reading37.827.842.2 Riding bike88.884.490.5 Riding horse90.288.392.2 Running87.977.686.2 Taking photo25.731.028.8 Using computer58.947.463.5 Walking59.557.664.2

31 PASCAL VOC 2011 Result Others’ best in comp9 Others’ best in comp10 Our method Jumping71.659.566.7 Phoning50.731.341.1 Playing instrument77.545.660.8 Reading37.827.842.2 Riding bike88.884.490.5 Riding horse90.288.392.2 Running87.977.686.2 Taking photo25.731.028.8 Using computer58.947.463.5 Walking59.557.664.2 Our method achieves the best performance in five out of ten classes if we consider both comp9 and comp10.

32 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc.

33 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc. Riding bike Fixing bike

34 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc. Writing on board Writing on paper

35 Stanford 40 Actions ApplaudingBlowing bubbles Brushing teeth Calling Cleaning floor Climbing wall CookingCutting trees Cutting vegetables DrinkingFeeding horse FishingFixing bike GardeningHolding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart ReadingRepairing car Riding bike Riding horse RowingRunningShooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html 40 actions classes, 9532 real world images from Google, Flickr, etc. Drinking Gardening Smoking Cigarette

36 Stanford 40 Actions: Result We use 45 attributes, 81 objects, and 150 poselets. Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline. Average precision

37 Stanford 40 Actions: Result Average precision

39 Conclusion Attributes: …… Parts-Objects: …… Parts-Poselets: …… … Action bases Bases coefficients w Φ a : Image feature vector

40 Acknowledgement

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Similar presentations

Presentation on theme: "Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Similar presentations

Presentation on theme: "Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and."— Presentation transcript:

Similar presentations

About project

Feedback