Action Recognition ECE6504 Xiao Lin.

Slides:



Advertisements
Similar presentations
Poselets: Body Part Detectors trained Using 3D Human Pose Annotations Lubomir Bourdev & Jitendra Malik ICCV 2009.
Advertisements

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Limin Wang, Yu Qiao, and Xiaoou Tang
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.
- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Lecture 31: Modern object recognition
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Ziming Zhang *, Ze-Nian Li, Mark Drew School of Computing Science, Simon Fraser University, Vancouver, B.C., Canada {zza27, li, Learning.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Biased Normalized Cuts 1 Subhransu Maji and Jithndra Malik University of California, Berkeley IEEE Conference on Computer Vision and Pattern Recognition.
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
Discriminative and generative methods for bags of features
Local Descriptors for Spatio-Temporal Recognition
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
CS294‐43: Visual Object and Activity Recognition Prof. Trevor Darrell Spring 2009 March 17 th, 2009.
An Introduction to Action Recognition/Detection Sami Benzaid November 17, 2009.
Generic object detection with deformable part-based models
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
ICCV 2003UC Berkeley Computer Vision Group Recognizing Action at a Distance A.A. Efros, A.C. Berg, G. Mori, J. Malik UC Berkeley.
Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.
School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,
Bag of Video-Words Video Representation
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
Week 9 Presented by Christina Peterson. Recognition Accuracies on UCF Sports data set Method Accuracy (%)DivingGolfingKickingLiftingRidingRunningSkating.
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Pedestrian Detection and Localization
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Methods for classification and image representation
Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Recognition Using Visual Phrases
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Lecture IX: Object Recognition (2)
Bangpeng Yao1, Xiaoye Jiang2, Aditya Khosla1,
Object detection with deformable part-based models
Data Driven Attributes for Action Detection
Learning Mid-Level Features For Recognition
Recognizing Humans: Action Recognition
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Object detection as supervised classification
CS 1674: Intro to Computer Vision Scene Recognition
CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes
Human Activity Analysis
Presentation transcript:

Action Recognition ECE6504 Xiao Lin

ECE6504 Action Recognition Xiao Lin Outline Introduction Static Image: “…From Pose and Appearance “ Video: “…Discriminative Space-Time Neighborhood…” Experiments 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin 9/17/2018 ECE6504 Action Recognition Xiao Lin

Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification Juan Carlos Niebles Chih-Wei Chen Li Fei-Fei Computer Science Dept. Stanford University

Construction of a building Activity landscape Long term event Snapshot Atomic action Activities Events Construction of a building Catch Run High Jump Football 10-1 100 101 103 107-8 Thurau & Hlavac, 2008 Gupta et al, 2009 Ikizler & Duygulu, 2009 Ikizler-Cinbis et al, 2009 Yao & Fei-Fei 2010a,b Yang, Wang and Mori, 2010 Bobick & Davis, 2001 Efros et al, 2003 Schuldt et al, 2004 Alper & Shah, 2005 Dollar et al, 2005 Blank et al, 2005 Niebles et al, 2006 Laptev et al, 2008 Wang & Mori, 2008 Rodriguez et al, 2008 Wang & Mori, 2009 Gupta et al, 2009 Liu et al, 2009 Marszalek et al, 2009 Ramanan & Forsyth, 2003 Laxton et al, 2007 Ikizler & Forsyth, 2008 Gupta et al, 2009 Choi & Savarese, 2009 Sridhar et al, 2010 Kuettel, 2010

Activity landscape Long term event Snapshot Atomic action Activities 10-1 100 101 103 107-8 Temporal Scale (seconds) Possible approaches: Pose-based recognition HMM, CRF Bag of features Simple action recognition: Fails when actions are complex Computationally intensive Ferrari et al 2008 Ramanan & Forsyth 2003 Nazli & Forsyth 2008 […] Laptev et al 2008 Niebles et al 2006 Liu et al 2009 Sminchisescu 2006 Blank et al 2005 Efros et al 2003 […]

Spatial Temporal Features Laptev, Ivan. "On space-time interest points." IJCV, 2005 9/17/2018 ECE6504 Action Recognition Xiao Lin

Activity landscape – related datasets Long term event Snapshot Atomic action Activities Events 10-1 100 101 103 107-8 Temporal Scale (seconds) Actions in still images [Ikizler 2009] PPMI [Yao & Fei-Fei 2010] UIUC Sports [Li & Fei-Fei 2007] KTH [Schuldt et al 2004] Hollywood [Laptev et al 2008] UCF Sports [Rodriguez et al 2008] Ballet [Yang et al 2009] New Olympic Sports Dataset

ECE6504 Action Recognition Xiao Lin Outline Introduction Static Image: “…From Pose and Appearance “ Video: “…Discriminative Space-Time Neighborhood…” Experiments 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Action Recognition from a Distributed Representation of Pose and Appearance Subhransu Maji1, Lubomir Bourdev2, and Jitendra Malik1 University of California, at Berkeley1 Adobe Systems, Inc. San Jose, CA2 CVPR 2011 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Problem Setting PASCAL VOC 2010 static image action classification challenge Additional training data used 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Motivation Recovering the stick figures is hard… Resolution Clothing Some parts not visible 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Motivation Poselets Easy to detect Poselets: easy to detect, good Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. L. Bourdev and J. Malik.  ICCV 2009. 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Motivation Poselets Good at predicting pose Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations. L. Bourdev and J. Malik.  ICCV 2009. 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Motivation Classic Image X Pose Action Yang, W., Wang, Y., & Mori, G. “Recognizing human actions from still images with latent poses”. CVPR 2010. Image Poselets Pose Action This paper Image Poselets X Action 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselets for action recognition Based on 2D pose, because of training data Bourdev, L., Maji, S., Brox, T., & Malik, J. (2010). Detecting people using mutually consistent poselet activations. ECCV 2010 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselets for action recognition Discriminativeness Bad 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselets for action recognition Discriminativeness Good 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselets for action recognition Category specific, query within the category 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselets for action recognition 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselets for action recognition 4 scales: 96x64, 64x64, 64x96, 128x64 300 poselets per scale 1200 in all 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselet activation vector Fits a model to predict bounding box of human Predicted bounding box overlap with given bounding box > α Sum up all such scores for each poselet 1200 poselets -> 1200 dimension vector 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Poselet activation vector to score function actionscore = pav2action(pav,W,W2) numactions = size(W,2); actionscore = zeros(numactions,1); for i=1:numactions score = pav*W(1:end-1,i) + W(end,i); actionscore(i) = 1./(1+exp(-(score*W2(i,1) + W2(i,2)))); end 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Object activation vector Fits a model to predict bounding box of human Predicted bounding box overlap with given bounding box > α Sum up all such scores for each object 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Centext-based rescoring: consider the action of other people in the image Highest score of all other people on each action Linear SVM On playing instrument and running 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Data H3D PASCAL VOC 2010 + Head & Torso Yaw labeling 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Yaw prediction Close on frontal views 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Action prediction: confusion matrix 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Confusions 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Confusions 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Average Precision 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Discussions Pros An interesting use of poselets Cons Manually selecting objects that are action-specific 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Outline Introduction Static Image: “…From Pose and Appearance “ Video: “…Discriminative Space-Time Neighborhood…” Experiments 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition Adriana Kovashka and Kristen Grauman University of Texas at Austin CVPR 2010 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Motivation Spatial temporal interest point representations are too “local” some of the times Motion trajectories Before-after relationships Solutions to the above problem suffer from other problems Sensitive to spatial temporal shifts Unknown spatial temporal scales 9/17/2018 ECE6504 Action Recognition Xiao Lin

Spatial Temporal Features Laptev, Ivan. "On space-time interest points." IJCV, 2005 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Motivation Sensitive to spatial temporal shifts 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Motivation Unknown spatial temporal scales 9/17/2018 ECE6504 Action Recognition Xiao Lin

Local features may not produce good matches… Lazebnik et al., BMVC 2004, Sivic & Zisserman, CVPR 2004, Agarwal & Triggs, ECCV 2006, Pantofaru et al., Beyond Patches Wkshp 2006, Quack et al., ICCV 2007 Semi-local features: Our proximity distribution descriptor: By Yong Jae Lee and Kristen Grauman, “Foreground Focus: Finding Meaningful Features in Unlabeled Images”, BMVC 2008

ECE6504 Action Recognition Xiao Lin Approach 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Hierarchical Recursively to generate multiple levels 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach Weighted Euclidean distance Sample different weights as well 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Approach M weight settings L+1 levels of Bag of Words histograms ML+1 Bag of Words histograms per feature type F different ways to extract features (HoG, HoF, HoG3D etc.) FML+F histograms for Multiple Kernel Learning (MKL), which assign weights to each “channel” (histogram) SVM for actual classification 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Data: KTH human action recognition Standard partition, average recognition rate per class UCF sports Leave-one-out cross validation Different parameters for different datasets 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Recognition performance Left: KTH; Right: UCF [32] 85.6% [29] 69.2% * [33] 79.3% * *Not directly comparable 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Sensitivity to parameters: Nearest-neighbor vs. uniformly scaled 3x3x3 grid cube 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Contribution of higher level vocabularies (>0) 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Experiments Most discriminative level-1 words for hand waving and riding horse 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Discussions Pros A sounding extension of Lee et al. ’s work to 2+1D Estimates spatial and temporal scales of actions Cons Relies too much on clustering and classification algorithms, lacks an intuitive explanation Parameters 9/17/2018 ECE6504 Action Recognition Xiao Lin

ECE6504 Action Recognition Xiao Lin Outline Introduction Static Image: “…From Pose and Appearance “ Video: “…Discriminative Space-Time Neighborhood…” Experiments 9/17/2018 ECE6504 Action Recognition Xiao Lin

Action Recognition Based on Poselets See demo 9/17/2018 ECE6504 Action Recognition Xiao Lin

Action Recognition Based on Poselets Good frontal face performance Limited variability and Strong confusion Maybe better with object detectors 9/17/2018 ECE6504 Action Recognition Xiao Lin