Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA.

Slides:

Advertisements

Similar presentations

CVPR2013 Poster Modeling Actions through State Changes.

Advertisements

Human Action Recognition by Learning Bases of Action Attributes and Parts Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy Lai Lin, Leonidas Guibas, and.

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Image Repairing: Robust Image Synthesis by Adaptive ND Tensor Voting IEEE Computer Society Conference on Computer Vision and Pattern Recognition Jiaya.

Histograms of Oriented Gradients for Human Detection

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Activity Recognition Computer Vision CS 143, Brown James Hays 11/21/11 With slides by Derek Hoiem and Kristen Grauman.

Bangpeng Yao and Li Fei-Fei

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Human Action Recognition by Learning Bases of Action Attributes and Parts.

Robust Object Tracking via Sparsity-based Collaborative Model

2D Human Pose Estimation in TV Shows Vittorio Ferrari Manuel Marin Andrew Zisserman Dagstuhl Seminar July 2008.

A Robust Pedestrian Detection Approach Based on Shapelet Feature and Haar Detector Ensembles Wentao Yao, Zhidong Deng TSINGHUA SCIENCE AND TECHNOLOGY ISSNl.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Lecture 17: Parts-based models and context CS6670: Computer Vision Noah Snavely.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Visual Object Recognition Rob Fergus Courant Institute, New York University

Presented By : Murad Tukan

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

REALTIME OBJECT-OF-INTEREST TRACKING BY LEARNING COMPOSITE PATCH-BASED TEMPLATES Yuanlu Xu, Hongfei Zhou, Qing Wang*, Liang Lin Sun Yat-sen University,

Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.

Action Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/11.

Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.

3D LayoutCRF Derek Hoiem Carsten Rother John Winn.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Object Detection Sliding Window Based Approach Context Helps

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Computer Science Department Pacific University Artificial Intelligence -- Computer Vision.

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

An Asymptotic Analysis of Generative, Discriminative, and Pseudolikelihood Estimators by Percy Liang and Michael Jordan (ICML 2008 ) Presented by Lihan.

Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

Histograms of Oriented Gradients for Human Detection(HOG)

Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.

Recognition Using Visual Phrases

Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

CS332 Visual Processing Department of Computer Science Wellesley College High-Level Vision Face Recognition I.

MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.

Frank Bergschneider February 21, 2014 Presented to National Instruments.

Fast Human Detection in Crowded Scenes by Contour Integration and Local Shape Estimation Csaba Beleznai, Horst Bischof Computer Vision and Pattern Recognition,

National Taiwan Normal A System to Detect Complex Motion of Nearby Vehicles on Freeways C. Y. Fang Department of Information.

Computer vision: models, learning and inference

Automatic Video Shot Detection from MPEG Bit Stream

Performance of Computer Vision

Article Review Todd Hricik.

Adversarially Tuned Scene Generation

A Tutorial on HOG Human Detection

CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes

Marked Point Processes for Crowd Counting

Walter J. Scheirer, Samuel E. Anthony, Ken Nakayama & David D. Cox

Brief Review of Recognition + Context

Progress report 2019/1/14 PHHung.

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

LECTURE 23: INFORMATION THEORY REVIEW

Human-object interaction

Deep Object Co-Segmentation

Learning to Detect Human-Object Interactions with Knowledge

Presentation transcript:

Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA

Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion

Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion

Human pose estimation & Object detection Right-arm Left-arm Torso Right-leg Left-leg Tennis racket

Challenging ：

Mutual context ： Human pose estimation & Object detection - facilitate the recognition of each other

Mutual context V.S no mutual context

Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion

A ： Activity class, ex : tennis server, volleyball smash O ： Object, ex : tennis racket, volleyball H ： Human pose P ： Body parts f ： visual feature Each A have more than one type of H

: edge of the model : potential function : weight : Freguencies of co- occurrence between A, O, and H,, : Spatial relationship among object and body parts, compute by : (position, orientation, scale)

: model the dependence of the object and a body part with their corresponding image evidence

Co-occurrence context for the activity class, object, and human pose Multiple types of human pose for each activity Spatial context between object and body parts

Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion

Learning step needs to achieve two goals ： structure learning & parameter estimation Structure learning ： discover the hidden human pose and the connectivity among the object, human pose, and body parts Parameter estimation ： for the potential weight to maximize the discrimination between different activities

Objective ： Connectivity pattern between the object, the human pose, and the body parts Method ： hill-climbing approach with tabu list

Hill-climbing approach adds or removes edges one at a time until maximum is reached Human pose

Objective ： obtain a set of potential weight that maximize the discrimination between different classes of activities Training sample : : is potential function value, disconnected edge set 0 : is the human pose H : is the class label A If, then : is a weight vector for the r-th sub-class

: is L2 norm : normalization constant

Using only one human pose for each HOI class is not enough to characterize well all the image in this class

Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion

Given a new testing image, our objective is : - estimate the pose of the human - detect the object that is interacting with the human

Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion

Cricket - defensive shot (player and cricket bat) Cricket - bowling (player and cricket ball) Croquet - shot (player and croquet mallet) Tennis - forehand (player and tennis racket) Tennis – serve (player and tennis racket) Volleyball - smash (player and volleyball) 30 images for training, 20 for testing

Sliding window Pedestrian as context Our method detector

Pose estimation still difficult Multiple pose is better than only one pose

Upper ： our method Lower left ： object detection by a scanning window Lower right ： pose estimation by the state-of-art pictorial structure method

Note Gupta et.al. uses predominantly the background scene context

Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion

Treat object and human pose as the context of each other in different HOI activity classes Structure learning method - connectivity important patterns between objects and human pose Further improve : - incorporate useful background scene context to facilitate the recognition of foreground object and activity - deal with more than one object