Download presentation
Presentation is loading. Please wait.
Published byAnatole Laberge Modified over 6 years ago
1
Identifying Human-Object Interaction in Range and Video Data
Ben Packer, Varun Ganapathi, Suchi Saria, and Daphne Koller First Stage: Results of Full Model Aim: Understand and classify human actions while simultaneously tracking objects of interaction Capture initial depth with no foreground Capture video/depth of action involving object Pose tracker runs simultaneously in real-time Every pixel is either background (same depth as initial image), pose, or possible object Train visual object detector from most “confident” candidate objects Use (smoothed) detector on the full sequence Kinect Video Data Depth Image Video and Tracked Pose Pick Up Base Model Full Model Put Down Tasks: What action is being performed? Where is the manipulated object? Drop Colored blobs indicate candidate objects, ranging from red (least likely) to yellow (most likely) Why is this easy? Full Model of Action and Interaction Depth sensor allows us to easily detect foreground/background Existing pose tracker accurately finds human Extremely efficient, runs in real-time, so a large amount of data can be easily collected Knowing the action will help track object Use “spatio-temporal interaction primitives” e.g. “moving away from foot,” “in hand” Model each action as HMM over primitives Allows for simple learning and inference Kick A Why is this hard? Even with background subtraction and pose estimation, object may still be in many places Generic object tracking can help locate the object, but often fails Action recognition involving human-object interaction is largely unsolved Toss First Attempt: C,F: candidate positions and appearance (obs.) J: human joint positions (obs.) A: action of entire sequence, S: state O: object position, P: active primitive Action Classification
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.