Robot Anticipation of Human Intentions through Continuous Gesture Recognition San Diego May 22, 2013 Giovanni Saponaro Giampiero Salvi Alexandre Bernardino 1
Motivation Recognize human task actions in real time, enable robots to provide appropriate collaborative support. Approach based on computer vision, statistical models and cognitive robotics. example human-robot collaboration scenario assume Push-Tap-Grasp sequence as correct strategy (intention) human user has to move object avoiding collisions robot recognizes human gesture, provides appropriate support
Outline Overview of affordances Action (Gesture) recognition Conclusions
Affordances
Affordances All action possibilities latent in the environment, in relation to the actor capabilities (Gibson 77). Action 1 Environment items Action 2 Action n
Object Affordances Effects are related to properties of objects: Graphical Model 7
Learning Object Affordances Effect Actions Actions: Grasp, Tap and Touch Object features: shape, size, color Effects: object velocity, object-hand distance, hand velocity, contact 8
Learned object affordances Learn structure, then learn parameters A – Action C – Color Sh – Shape S – Size Di – Distance Ct – Contact V – Velocity Object Effect Actions 10
Using Affordances Generative model: Allows inferring any set of variables, given any others: Object, Action -> Effect (prediction: self or others) Object, Effect -> Action (recognition, planning) Effect, Action -> Object (selection, recognition)
Which action gives the same effect? Imitation games Objective: select action and object to obtain the same effect Demonstration (grasp on small box) Which action gives the same effect? 13
Action Recognition
Action Recognition
Action Recognition Goal: Trainable statistical model able to recognize individual gestures in a continuous sequence. Motivation: extend affordance model with body gestures equip robots with action recognition capabilities use body gestures as a cue to recognize actions and intention anticipate effects while partner's action is still taking place robot learning by demonstration, instruct robot with gestures Assumptions: sensor, actions repertoire and model
Action Recognition Issues finding gesture boundaries (temporal segmentation) accurate detection of joints trade-off between precision and invasiveness of sensors sub-gesture (substring) problem other common assumptions: small, known vocabulary; spatial restrictions; availability of whole data (offline processing) Examples library of human actions for human-robot interactive scenario: push, tap, grasp human user can speak and move at the same time, the two modalities are linked previously: robot actions described by external narrator future: human actions described by the human herself
Action Recognition Feature space: 3D position coordinates of hand joint in time
Action Recognition Vocabulary of simple manipulation gestures
Statistical models (Hidden Markov Models) Action Recognition Statistical models (Hidden Markov Models)
Action Recognition Inference algorithms:
Action Recognition Example 1: correct local action recognition (lexicon), correct global strategy (syntax)
Action Recognition Example 2: correct local action recognition (lexicon), incorrect global strategy (syntax)
Action Recognition Example 3: incorrect recognition
Action Recognition Conclusions: capture predictive power of actions build a model that understands gestures in order to improve interactions with robots, extend affordance model Future work: relax assumptions on sensor, make statistical models more robust (features, training) estimate attributes of actions and objects