Download presentation
Presentation is loading. Please wait.
Published byPauline Riley Modified over 9 years ago
1
Some Recent Works of Human Activity Recognition 吴心筱 wuxinxiao@bit.edu.cn
2
Action Description Action, Object and Scene Multi-View Action Recognition Action Detection Complex Activity Recognition Multimedia Event Detection
3
Action Description
4
Extension of Interest Points Extension of Bag-of-Words Mid-level Attribute Feature Dense Trajectory Action Bank Action Description
5
Bregonzio et al., CVPR, 2009 Clouds of interest points accumulated over multiple temporal scales Extension of Interest Points Matteo Bregonzio, Shaogang Gong and Tao Xiang. Recognising Action as Clouds of Space-Time Interest Points. CVPR 2009.
6
Holistic features of the clouds as the spatio- temporal information of interest points: Extension of Interest Points Matteo Bregonzio, Shaogang Gong and Tao Xiang. Recognizing Action as Clouds of Space-Time Interest Points. CVPR, 2009.
8
Wu et al., CVPR, 2011 Multi-scale spatio-temporal (ST) context distribution feature Characterize the spatial and temporal context distributions of interest points over multiple space-time scales. Extension of Interest Points Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Luo. Action recognition using context and appearance distribution features. CVPR 2011.
9
A set of XYT relative coordinates between the center interest point and other interest points in a local region. Multi-scale local regions across multiple space- time scales. Extension of Interest Points Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Liu. Action recognition using context and appearance distribution features. CVPR 2011.
11
Wu et al., CVPR, 2011 A global GMM is trained using all local features from all the training videos. The video-specific GMM for a given video is generated from the global GMM via a Maximum A Posterior adaption process. Extension of Bag-of-Words Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Luo. Action recognition using context and appearance distribution features. CVPR 2011.
12
GMM vs Bag-of-Words
13
Kovashka and Grauman, CVPR, 2010 Exploit multiple “bag-of-words” model to represent the hierarchy of space-time configurations at different scales. Extension of Bag-of-Words A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space- time neighborhood features for human action recognition. CVPR, 2010.
14
Kovashka and Grauman, CVPR, 2010 A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space- time neighborhood features for human action recognition. CVPR, 2010.
15
Kovashka and Grauman, CVPR, 2010 A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space- time neighborhood features for human action recognition. CVPR, 2010.
16
Savarese, WMVC, 2008 Use a local histogram to capture co-occurences of words in a local region. Extension of Bag-of-Words S. Savarese, A. Delpozo, J.C. Niebles and L. Fei-Fei. Spatial-temporal correlatons for unsupervised action classification. WMVC, 2008.
17
M. Ryoo and J. Aggarwal, ICCV, 2009. Propose a “featuretype X featuretype X relationship” histogram to capture both appearance and relationship information between pairwise visual words. Extension of Bag-of-Words M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. ICCV, 2009.
19
Liu et al., CVPR, 2011. Action attributes: a set of inter mediate concepts. A unified framework: action attributes are effectively selected in a discriminative fashion. Data-driven Attributes. Mid-level Attribute Feature Jingen Liu, Benjamin Kuipers and Silvio Savarese. Recognizing Human Actions by Attributes. CVPR, 2011.
21
Liu et al., CVPR, 2011. Data Driven
22
Wang et al., CVPR, 2011. Sample dense points from each frame and track them based on displacement information from a dense optical flow field. Dense Trajectory Heng Wang, Alexander Klaser, Cordelia Schmid and Cheng-Lin Liu. CVPR, 2011.
23
Wang et al., CVPR, 2011. Four descriptors: Trajectory; HOG; HOF; MBH. Heng Wang, Alexander Klaser, Cordelia Schmid and Cheng-Lin Liu. CVPR, 2011.
24
Sadanand and Corso, CVPR, 2011. Object Bank Action Bank Action Bank: a large set of action detectors. Action Bank Sreemanananth Sadanand and Jason J. Corso. Action Bank: A High-Level Representation of Activity in Video, CVPR, 2012.
28
Actions, Object and Scene
29
Nazli Ikizler-Cinbis and Stan Sclaroff, ECCV, 2010 Combine the information from person, object and scene Multiple instance learning + multiple kernel learning A bag contains all the instances extracted from a video for a particular feature channel. Different features have different kernel weights. Nazli Ikizler-Cinbis and Stan Sclaroff, Object, Scene and Actions: Combining Multiple Features for Human Action Recognition, ECCV, 2010.
33
Marcin Marszalek, Ivan Laptev and Cordelia Schmid, CVPR 2009. Automatically discover the relation between scene classes and human actions : using movie scripts Marcin Marszalek, Ivan Laptev and Cordelia Schmid, Actions in Context, CVPR, 2009.
35
Develop a joint framework for action and scene recognition in natural video
37
Multi-View Action Recognition
38
Multiple Views View-invariant Recognition View-cross Recognition
39
Weinland et al., ICCV, 2009. A 3D visual hull is proposed to represent an action exemplar using a system of 5 calibrated cameras. Daniel Weinland, Edmond Boyer and Remi Ronfard. Action recognition from arbitrary views using 3D exemplars. ICCV, 2009. View-invariant
40
Weinland et al., ICCV, 2009. 3D exemplar-based HMM for classification Daniel Weinland, Edmond Boyer and Remi Ronfard. Action recognition from arbitrary views using 3D exemplars. ICCV, 2009.
42
View-invariant Yan et al., CVPR, 2008. 4D action feature: 3D shapes over time (4D) Pingkun Yan, Saad M. Khan, Mubarak Shah. Learning 4D Action Feature Models for Arbitrary View Action Recognition. CVPR, 2008.
43
View-invariant Junejo et al., IEEE TPAMI, 2008. A novel view-invariant feature: self-similarity descriptor Frame-to-frame similarity Imran N. Junejo, Emilie Dexter, Ivan Laptev and Patrick Perez. View-independent action recognition from temporal self-similarities. IEEE T-PAMI, 2008.
46
View-invariant Lewandowski et al, ECCV, 2010. View-independent manifold representation A stylistic invariant embedded manifold is produced to describe an action for each view. All view-dependent manifolds are automatically combined to generate an unified manifold. Michal Lewandowski, Dimitrios Makris, and Jean-Christophe Nebel. View and style-independent action manifolds for human activity recognition, ECCV, 2010.
48
View-invariant Wu and Jia, ECCV, 2012. Propose a latent kernelized structural SVM. The view index is treated as a latent variable and inferred during both training and testing. Xinxiao Wu and Yunde Jia. View-Invariant action recognition using latent kernelized structural SVM. ECCV, 2012. kernelized
49
Cross-view Liu et al., CVPR, 2011. Learn the bilingual-words from both source view and target view. Transfer action models between two views via the bag-of-bilingual-words model. Jingen Liu, Mubarak Shah, Benjamin Kuipers and Silvio Savarese. Cross-View Action Recognition via View Knowledge Transfer. CVPR 2011.
52
Cross-view Li et al, CVPR, 2012. Propose “virtual views” to connect action descriptors from source view and target view. Each virtual view is associated with a linear transformation of the action descriptor,and the sequence of transformations arising from the sequence of virtual views aims at bridging the source and target views Xinxiao Wu and Yunde Jia. View-Invariant action recognition using latent kernelized structural SVM.
54
Cross-view Wu et al., PCM, 2012. Transfer Discriminant-Analysis of Canonical Correlations (Transfer DCC). Minimize the mismatch between data distributions of source and target views. Xinxiao Wu, Cuiwei Liu, and Yunde Jia. Transfer discriminant- analysis of canonical correlations for view-transfer action recognition, PCM, 2012.
56
Action Detection
57
Yuan et al., IEEE T-PAMI, 2010. A discriminative pattern matching criterion for action classification: naïve-Bayes mutual information maximization (NBMIM) An efficient search algorithm: spatio-temporal branch-and-bound (STBB) search algorithm Junsong Yuan, Zicheng Liu, and Ying Wu, Discriminative video pattern search for efficient action detection, IEEE T-PAMI, 2012.
59
Hu et al., ICCV, 2009. The candidate of regions of an action are treated as a bag of instances. A novel multiple-instance learning framework, named SMILE-SVM (Simulated annealing Multiple Instance Learning Support Vector Machines), is proposed for learning human action detector. Yuxiao Hu, Liangliang Cao, Fengjun Lv, Shuicheng Yan, Yihong Gong and Thomas, S. Huang. Action detection in complex scenes with spatial and temporal ambiguities. ICCV, 2009.
62
Complex Activity Recognition
63
Gaidon et al., CVPR, 2011. Actom Sequence Model: represent an activity as a sequence of atomic action- anchored visual features. Automatically detect atomic actions from an input activity video. A. Gaidon, Z. Harchaoui, and C. Schmid. Actom sequence models for efficient action detection. CVPR, 2011.
64
Hoai et al., CVPR, 2011. Jointly perform video segmentation and action recognition. M. Hoai, Z. Lan, and F. Torre. Joint segmentation and classification of human actions in video. CVPR, 2011.
66
Tang et al., CVPR, 2012. Each activity is modeled by a set of latent state variables and duration variables. The states are the cluster centers by clustering all the fixed-length video clips from training data. A max-margin based discriminative model is introduced to learning the temporal structure of complex events. K. Tang, F.-F. Li, and D. Koller. Learning latent temporal structure for complex event detection. CVPR, 2012.
70
Multimedia Event Detection
71
Izadinia and Shah, ECCV, 2012. A latent discriminative model is proposed to detect the low-level events by modeling the co- ocurrence relationship between different low- level events in a graph. Each video is divided into short clips and each clip is manually annotated using one low- level event label, which are used fro training the low-level detectors. H. Izadinia and M. Shah. Recognizing complex events using large margin joint low-level event model. ECCV, 2012.
75
Thanks for your attention! Q & A?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.