Bangpeng Yao Li Fei-Fei Computer Science Department, Stanford University, USA
Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion
Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion
Human pose estimation & Object detection Right-arm Left-arm Torso Right-leg Left-leg Tennis racket
Challenging :
Mutual context : Human pose estimation & Object detection - facilitate the recognition of each other
Mutual context V.S no mutual context
Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion
A : Activity class, ex : tennis server, volleyball smash O : Object, ex : tennis racket, volleyball H : Human pose P : Body parts f : visual feature Each A have more than one type of H
: edge of the model : potential function : weight : Freguencies of co- occurrence between A, O, and H,, : Spatial relationship among object and body parts, compute by : (position, orientation, scale)
: model the dependence of the object and a body part with their corresponding image evidence
Co-occurrence context for the activity class, object, and human pose Multiple types of human pose for each activity Spatial context between object and body parts
Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion
Learning step needs to achieve two goals : structure learning & parameter estimation Structure learning : discover the hidden human pose and the connectivity among the object, human pose, and body parts Parameter estimation : for the potential weight to maximize the discrimination between different activities
Objective : Connectivity pattern between the object, the human pose, and the body parts Method : hill-climbing approach with tabu list
Hill-climbing approach adds or removes edges one at a time until maximum is reached Human pose
Objective : obtain a set of potential weight that maximize the discrimination between different classes of activities Training sample : : is potential function value, disconnected edge set 0 : is the human pose H : is the class label A If, then : is a weight vector for the r-th sub-class
: is L2 norm : normalization constant
Using only one human pose for each HOI class is not enough to characterize well all the image in this class
Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion
Given a new testing image, our objective is : - estimate the pose of the human - detect the object that is interacting with the human
Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion
Cricket - defensive shot (player and cricket bat) Cricket - bowling (player and cricket ball) Croquet - shot (player and croquet mallet) Tennis - forehand (player and tennis racket) Tennis – serve (player and tennis racket) Volleyball - smash (player and volleyball) 30 images for training, 20 for testing
Sliding window Pedestrian as context Our method detector
Pose estimation still difficult Multiple pose is better than only one pose
Upper : our method Lower left : object detection by a scanning window Lower right : pose estimation by the state-of-art pictorial structure method
Note Gupta et.al. uses predominantly the background scene context
Introduction Modeling mutual context of object and pose Model learning Model inference, object detection, and human pose estimation Experiments Conclusion
Treat object and human pose as the context of each other in different HOI activity classes Structure learning method - connectivity important patterns between objects and human pose Further improve : - incorporate useful background scene context to facilitate the recognition of foreground object and activity - deal with more than one object