Human Action Recognition by Learning Bases of Action Attributes and Parts.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.
Contributions A people dataset of 8035 images. Three layer attribute classification framework using poselets. 1 2.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Limin Wang, Yu Qiao, and Xiaoou Tang
A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,
3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.
Structured Sparse Principal Component Analysis Reading Group Presenter: Peng Zhang Cognitive Radio Institute Friday, October 01, 2010 Authors: Rodolphe.
Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.
INTRODUCTION Heesoo Myeong, Ju Yong Chang, and Kyoung Mu Lee Department of EECS, ASRI, Seoul National University, Seoul, Korea Learning.
Face Detection, Pose Estimation, and Landmark Localization in the Wild
Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.
Structural Human Action Recognition from Still Images Moin Nabi Computer Vision Lab. ©IPM - Oct
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
1 Building a Dictionary of Image Fragments Zicheng Liao Ali Farhadi Yang Wang Ian Endres David Forsyth Department of Computer Science, University of Illinois.
An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
Spatial Pyramid Pooling in Deep Convolutional
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Object Recognizing. Object Classes Individual Recognition.
School of Electronic Information Engineering, Tianjin University Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping.
A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Object Recognizing. Recognition -- topics Features Classifiers Example ‘winning’ system.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.
Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Object Detection with Discriminatively Trained Part Based Models
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Recognition Using Visual Phrases
Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Describing People: A Poselet-Based Approach to Attribute Classification.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
A REAL-TIME DEFORMABLE DETECTOR 謝汝欣 OUTLINE  Introduction  Related Work  Proposed Method  Experiments 2.
PANDA: Pose Aligned Networks for Deep Attribute Modeling Ning Zhang 1,2 Manohar Paluri 1 Marć Aurelio Ranzato 1 Trevor Darrell 2 Lumbomir Boudev 1 1 Facebook.
Bangpeng Yao1, Xiaoye Jiang2, Aditya Khosla1,
Data Driven Attributes for Action Detection
CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.
Thesis Advisor : Prof C.V. Jawahar
Group Norm for Learning Latent Structural SVMs
Domingo Mery Department of Computer Science
Introduction PCA (Principal Component Analysis) Characteristics:
Objects as Attributes for Scene Classification
Outline Background Motivation Proposed Model Experimental Results
Domingo Mery Department of Computer Science
Human-object interaction
Deep Object Co-Segmentation
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Deep Structured Scene Parsing by Learning with Image Descriptions
A Review of Researches on Deep Learning in Remote Sensing Application
Presentation transcript:

Human Action Recognition by Learning Bases of Action Attributes and Parts

Outline Introduction Action Recognition with Attributes & Parts Learning Experiments and Results

Introduction use attributes and parts for recognizing human actions in still images use the whole image to represent an action treat action recognition as a general image classification problem PASCAL challenge – spatial pyramid – random forest based methods – No explore the semantically meaningful components

Introduction some methods rely on labor-intensive annotations of objects and human body parts during training time Inspired by the recent work – using objects and body parts for action recognition – propose an attributes and parts based representation The action attributes are holistic image descriptions of human actions – associated with verbs in the human language – E.g. Riding,sitting,repairing,lifting…

Introduction

a large number of possible interactions among these attributes parts in terms of co-occurrence statistics. Our challenge is – represent image by using a sparse set of action bases – effectively learn these bases given far-from-perfect detections of action attributes – parts without meticulous human labeling as proposed in previous work

Introduction our method has theoretical foundations in sparse coding and compressed sensing. PASCAL action dataset Stanford 40 Actions dataset

Attributes and Parts in Human Actions Attribute: – Use are related to verbs in human language – E.x: rinding a bike can be “riding” and “sitting” – attribute to correspond to more than one action Parts: – Composed of objects – Human poses

Attributes and Parts in Human Actions an action image consists – the objects that are closely related to the action – The descriptive local human poses. A vector of the normalized confidence scores obtained from these classifiers and detectors is used to represent this image

Action Bases of Attributes and Parts Our method learns high-order interactions of image attributes and parts – carry richer information about human actions – improve recognition performance Riding – sitting – bike Using - keyboard - monitor - sitting

Action Bases of Attributes and Parts formalize the action bases in a mathematical framework P: attributes and parts 1 Action bases: Coefficients: 4 5

Action Classification Using the Action Bases the attributes and parts representation A – reconstructed from the sparse factorization coefficients w. – use the coefficients vector w to represent an image train an SVM classifier for action classification

Learning the Dual-Sparse Action Bases and Reconstruction Coefficients 1 Ai is the vector of confidence scores there exists a latent dictionary of bases – frequent co-occurrence of attributes – e.g. “cycling” and “bike” To identify a set of sparse bases Φ = [1..M]

Learning the Dual-Sparse Action Bases and Reconstruction Coefficients learn the bases Φ and find the reconstruction coefficients wi for each ai. (2) is non-convex,(3) is convex Eqn.2 is convex with respect to each of the two variables Φ and W when the other one is fixed

Learning the Dual-Sparse Action Bases and Reconstruction Coefficients This is called the elastic-net constraint set[29] λ= 0.1 ϒ= 0.15

Google, Bing, and Flickr 180 ∼ 300 images for each class

Experiments and Results

PASCAL Stanford 40 action attributes (A), objects (O), and poselets (P)

Experiments and Results

Discussion use attributes and parts for action recognition – The attributes are verbs – The parts are composed of objects and poselets reconstructed by a set of sparse coefficients our method achieves state-of-the-art performance on two datasets

Future work learned action bases for image tagging explore more detailed semantic understanding of human actions in images