Some Recent Works of Human Activity Recognition 吴心筱

Slides:



Advertisements
Similar presentations
Visual Event Recognition in Videos by Learning from Web Data Lixin Duan, Dong Xu, Ivor Tsang, Jiebo Luo ¶ Nanyang Technological University, Singapore ¶
Advertisements

Spatio-Temporal Relationship Match: Video Structure Comparison for Recognition of Complex Human Activities M. S. Ryoo and J. K. Aggarwal ICCV2009.
A Discriminative Key Pose Sequence Model for Recognizing Human Interactions Arash Vahdat, Bo Gao, Mani Ranjbar, and Greg Mori ICCV2011.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
DONG XU, MEMBER, IEEE, AND SHIH-FU CHANG, FELLOW, IEEE Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Limin Wang, Yu Qiao, and Xiaoou Tang
By: Ryan Wendel.  It is an ongoing analysis in which videos are analyzed frame by frame  Most of the video recognition is pulled from 3-D graphic engines.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
Patch to the Future: Unsupervised Visual Prediction
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Yuanlu Xu Human Re-identification: A Survey.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Transferable Dictionary Pair based Cross-view Action Recognition Lin Hong.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Large-Scale Object Recognition with Weak Supervision
Discriminative and generative methods for bags of features
Local Descriptors for Spatio-Temporal Recognition
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Event prediction CS 590v. Applications Video search Surveillance – Detecting suspicious activities – Illegally parked cars – Abandoned bags Intelligent.
Quantifying and Transferring Contextual Information in Object Detection Professor: S. J. Wang Student : Y. S. Wang 1.
Personal Driving Diary: Constructing a Video Archive of Everyday Driving Events IEEE workshop on Motion and Video Computing ( WMVC) 2011 IEEE Workshop.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Learning to classify the visual dynamics of a scene Nicoletta Noceti Università degli Studi di Genova Corso di Dottorato.
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,
Bag of Video-Words Video Representation
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
Flow Based Action Recognition Papers to discuss: The Representation and Recognition of Action Using Temporal Templates (Bobbick & Davis 2001) Recognizing.
Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.
Action recognition with improved trajectories
IRISA / INRIA Rennes Computational Vision and Active Perception Laboratory (CVAP) KTH (Royal Institute of Technology)
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Characterizing activity in video shots based on salient points Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Periodic Motion Detection via Approximate Sequence Alignment Ivan Laptev*, Serge Belongie**, Patrick Perez* *IRISA/INRIA, Rennes, France **Univ. of California,
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Week 9 Presented by Christina Peterson. Recognition Accuracies on UCF Sports data set Method Accuracy (%)DivingGolfingKickingLiftingRidingRunningSkating.
Mentor: Salman Khokhar Action Recognition in Crowds Week 7.
Pedestrian Detection and Localization
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06.
E XEMPLAR -SVM FOR A CTION R ECOGNITION Week 11 Presented by Christina Peterson.
Multi-view Synchronization of Human Actions and Dynamic Scenes Emilie Dexter, Patrick Pérez, Ivan Laptev INRIA Rennes - Bretagne Atlantique
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Visual Event Recognition in Videos by Learning from Web Data
Data Driven Attributes for Action Detection
Saliency-guided Video Classification via Adaptively weighted learning
Recognizing Deformable Shapes
Nonparametric Semantic Segmentation
Action Recognition ECE6504 Xiao Lin.
Exemplar-SVM for Action Recognition
Paper Presentation: Shape and Matching
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Context-Aware Modeling and Recognition of Activities in Video
Human Activity Analysis
Extraction of Multi-scale Outlier Hierarchy From Spatio-temporal Data Stream Jianming Lv.
Human-object interaction
Presentation transcript:

Some Recent Works of Human Activity Recognition 吴心筱

Action Description Action, Object and Scene Multi-View Action Recognition Action Detection Complex Activity Recognition Multimedia Event Detection

Action Description

Extension of Interest Points Extension of Bag-of-Words Mid-level Attribute Feature Dense Trajectory Action Bank Action Description

Bregonzio et al., CVPR, 2009 Clouds of interest points accumulated over multiple temporal scales Extension of Interest Points Matteo Bregonzio, Shaogang Gong and Tao Xiang. Recognising Action as Clouds of Space-Time Interest Points. CVPR 2009.

Holistic features of the clouds as the spatio- temporal information of interest points: Extension of Interest Points Matteo Bregonzio, Shaogang Gong and Tao Xiang. Recognizing Action as Clouds of Space-Time Interest Points. CVPR, 2009.

Wu et al., CVPR, 2011 Multi-scale spatio-temporal (ST) context distribution feature Characterize the spatial and temporal context distributions of interest points over multiple space-time scales. Extension of Interest Points Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Luo. Action recognition using context and appearance distribution features. CVPR 2011.

A set of XYT relative coordinates between the center interest point and other interest points in a local region. Multi-scale local regions across multiple space- time scales. Extension of Interest Points Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Liu. Action recognition using context and appearance distribution features. CVPR 2011.

Wu et al., CVPR, 2011 A global GMM is trained using all local features from all the training videos. The video-specific GMM for a given video is generated from the global GMM via a Maximum A Posterior adaption process. Extension of Bag-of-Words Xinxiao Wu, Dong Xu, Lixin Duan and Jiebo Luo. Action recognition using context and appearance distribution features. CVPR 2011.

GMM vs Bag-of-Words

Kovashka and Grauman, CVPR, 2010 Exploit multiple “bag-of-words” model to represent the hierarchy of space-time configurations at different scales. Extension of Bag-of-Words A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space- time neighborhood features for human action recognition. CVPR, 2010.

Kovashka and Grauman, CVPR, 2010 A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space- time neighborhood features for human action recognition. CVPR, 2010.

Kovashka and Grauman, CVPR, 2010 A. Kovashka and K. Grauman. Learning a hierarchy of discriminative space- time neighborhood features for human action recognition. CVPR, 2010.

Savarese, WMVC, 2008 Use a local histogram to capture co-occurences of words in a local region. Extension of Bag-of-Words S. Savarese, A. Delpozo, J.C. Niebles and L. Fei-Fei. Spatial-temporal correlatons for unsupervised action classification. WMVC, 2008.

M. Ryoo and J. Aggarwal, ICCV, Propose a “featuretype X featuretype X relationship” histogram to capture both appearance and relationship information between pairwise visual words. Extension of Bag-of-Words M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. ICCV, 2009.

Liu et al., CVPR, Action attributes: a set of inter mediate concepts. A unified framework: action attributes are effectively selected in a discriminative fashion. Data-driven Attributes. Mid-level Attribute Feature Jingen Liu, Benjamin Kuipers and Silvio Savarese. Recognizing Human Actions by Attributes. CVPR, 2011.

Liu et al., CVPR, Data Driven

Wang et al., CVPR, Sample dense points from each frame and track them based on displacement information from a dense optical flow field. Dense Trajectory Heng Wang, Alexander Klaser, Cordelia Schmid and Cheng-Lin Liu. CVPR, 2011.

Wang et al., CVPR, Four descriptors: Trajectory; HOG; HOF; MBH. Heng Wang, Alexander Klaser, Cordelia Schmid and Cheng-Lin Liu. CVPR, 2011.

Sadanand and Corso, CVPR, Object Bank  Action Bank Action Bank: a large set of action detectors. Action Bank Sreemanananth Sadanand and Jason J. Corso. Action Bank: A High-Level Representation of Activity in Video, CVPR, 2012.

Actions, Object and Scene

Nazli Ikizler-Cinbis and Stan Sclaroff, ECCV, 2010 Combine the information from person, object and scene Multiple instance learning + multiple kernel learning A bag contains all the instances extracted from a video for a particular feature channel. Different features have different kernel weights. Nazli Ikizler-Cinbis and Stan Sclaroff, Object, Scene and Actions: Combining Multiple Features for Human Action Recognition, ECCV, 2010.

Marcin Marszalek, Ivan Laptev and Cordelia Schmid, CVPR Automatically discover the relation between scene classes and human actions : using movie scripts Marcin Marszalek, Ivan Laptev and Cordelia Schmid, Actions in Context, CVPR, 2009.

Develop a joint framework for action and scene recognition in natural video

Multi-View Action Recognition

Multiple Views View-invariant Recognition View-cross Recognition

Weinland et al., ICCV, A 3D visual hull is proposed to represent an action exemplar using a system of 5 calibrated cameras. Daniel Weinland, Edmond Boyer and Remi Ronfard. Action recognition from arbitrary views using 3D exemplars. ICCV, View-invariant

Weinland et al., ICCV, D exemplar-based HMM for classification Daniel Weinland, Edmond Boyer and Remi Ronfard. Action recognition from arbitrary views using 3D exemplars. ICCV, 2009.

View-invariant Yan et al., CVPR, D action feature: 3D shapes over time (4D) Pingkun Yan, Saad M. Khan, Mubarak Shah. Learning 4D Action Feature Models for Arbitrary View Action Recognition. CVPR, 2008.

View-invariant Junejo et al., IEEE TPAMI, A novel view-invariant feature: self-similarity descriptor Frame-to-frame similarity Imran N. Junejo, Emilie Dexter, Ivan Laptev and Patrick Perez. View-independent action recognition from temporal self-similarities. IEEE T-PAMI, 2008.

View-invariant Lewandowski et al, ECCV, View-independent manifold representation A stylistic invariant embedded manifold is produced to describe an action for each view. All view-dependent manifolds are automatically combined to generate an unified manifold. Michal Lewandowski, Dimitrios Makris, and Jean-Christophe Nebel. View and style-independent action manifolds for human activity recognition, ECCV, 2010.

View-invariant Wu and Jia, ECCV, Propose a latent kernelized structural SVM. The view index is treated as a latent variable and inferred during both training and testing. Xinxiao Wu and Yunde Jia. View-Invariant action recognition using latent kernelized structural SVM. ECCV, kernelized

Cross-view Liu et al., CVPR, Learn the bilingual-words from both source view and target view. Transfer action models between two views via the bag-of-bilingual-words model. Jingen Liu, Mubarak Shah, Benjamin Kuipers and Silvio Savarese. Cross-View Action Recognition via View Knowledge Transfer. CVPR 2011.

Cross-view Li et al, CVPR, Propose “virtual views” to connect action descriptors from source view and target view. Each virtual view is associated with a linear transformation of the action descriptor,and the sequence of transformations arising from the sequence of virtual views aims at bridging the source and target views Xinxiao Wu and Yunde Jia. View-Invariant action recognition using latent kernelized structural SVM.

Cross-view Wu et al., PCM, Transfer Discriminant-Analysis of Canonical Correlations (Transfer DCC). Minimize the mismatch between data distributions of source and target views. Xinxiao Wu, Cuiwei Liu, and Yunde Jia. Transfer discriminant- analysis of canonical correlations for view-transfer action recognition, PCM, 2012.

Action Detection

Yuan et al., IEEE T-PAMI, A discriminative pattern matching criterion for action classification: naïve-Bayes mutual information maximization (NBMIM) An efficient search algorithm: spatio-temporal branch-and-bound (STBB) search algorithm Junsong Yuan, Zicheng Liu, and Ying Wu, Discriminative video pattern search for efficient action detection, IEEE T-PAMI, 2012.

Hu et al., ICCV, The candidate of regions of an action are treated as a bag of instances. A novel multiple-instance learning framework, named SMILE-SVM (Simulated annealing Multiple Instance Learning Support Vector Machines), is proposed for learning human action detector. Yuxiao Hu, Liangliang Cao, Fengjun Lv, Shuicheng Yan, Yihong Gong and Thomas, S. Huang. Action detection in complex scenes with spatial and temporal ambiguities. ICCV, 2009.

Complex Activity Recognition

Gaidon et al., CVPR, Actom Sequence Model: represent an activity as a sequence of atomic action- anchored visual features. Automatically detect atomic actions from an input activity video. A. Gaidon, Z. Harchaoui, and C. Schmid. Actom sequence models for efficient action detection. CVPR, 2011.

Hoai et al., CVPR, Jointly perform video segmentation and action recognition. M. Hoai, Z. Lan, and F. Torre. Joint segmentation and classification of human actions in video. CVPR, 2011.

Tang et al., CVPR, Each activity is modeled by a set of latent state variables and duration variables. The states are the cluster centers by clustering all the fixed-length video clips from training data. A max-margin based discriminative model is introduced to learning the temporal structure of complex events. K. Tang, F.-F. Li, and D. Koller. Learning latent temporal structure for complex event detection. CVPR, 2012.

Multimedia Event Detection

Izadinia and Shah, ECCV, A latent discriminative model is proposed to detect the low-level events by modeling the co- ocurrence relationship between different low- level events in a graph. Each video is divided into short clips and each clip is manually annotated using one low- level event label, which are used fro training the low-level detectors. H. Izadinia and M. Shah. Recognizing complex events using large margin joint low-level event model. ECCV, 2012.

Thanks for your attention! Q & A?