Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab.

Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab

Learning Environments Unreliable Feedback Unconstrained Environment High Penalty for Failure Unstructured Learning Continuous Feedback Constrained Environment Minimized Risk Structuring of Task and Solution

Grand Challenge: Social Learning Exploit the knowledge and assistance of people Recognize and respond to appropriate social cues Utilize natural social dynamics other

What would this Require? Machine Vision - object recognition - face finding Artificial Intelligence - behavior selection - planning Human- Machine Interfaces - social scripts - dynamics Real-Time Systems - embedded control - parallelism Motor Control - response fidelity - flexible control - safety issues Machine Learning - sequence learning - feedback cues Theory of Mind - beliefs and desires - joint reference

Outline Existing Models of Theory of Mind Embodied Theory of Mind Robot Hardware Implementation Application to Mimicry

Development of Theory of Mind Pretend Play Declarative Pointing Eye contact Simple Gaze Detection Complex Gaze Detection False Belief Tasks Normal Children Non-Human Animals Autistic Children ~ 12 months < 12 months < 3 months < 9 months < 18 months < 48 months vertebrates monkeys - ? great apes - yes subgroup A limited Very limited

Leslie’s Model Three spheres of causation: –Theory of Body (ToBY) –Theory of Mind Mechanism (ToMM) »System 1 applies rules of goals and desires »System 2 applies rules of belief and knowledge Mechanical agency Actional agency Attitudinal agency Inanimate Objects Animate Objects ToBY ToMM-1 ToMM-2

Baron-Cohen’s Model Requires two types of input stimuli: –Eye-like stimuli –Self-propelled (animate) stimuli Proposes that autism is an impairment of either SAM (subgroup A) or ToMM (subgroup B) Eye Direction Detector (EDD) Intentionality Detector (ID) Shared Attention Mechanism (SAM) Theory of Mind Mechanism (ToMM) Eye-like stimuliSelf-Propelled Stimuli Dyadic Representations (sees) Dyadic representations (desire, goals)

Implications to Robotics Both models –Offer an encouraging task decomposition –Provide an evaluation metric –Are approachable by our current technologies Neither model –Grounded in real perceptions –Accounts for behavioral selection

Embodied Theory of Mind EDDID ToBY Visual Input SAM Object Trajectories Eye- like Stimuli Animat e Stimuli Animat e Stimuli

Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Animat e Stimuli Animat e Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters

Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects

Embodied Theory of Mind EDDID ToBY Visual Input SAM Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder

Embodied Theory of Mind EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder SAM f

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Three Robotic Platforms Kismet Cog Lazlo

Hardware – Cog’s Arms Williamson (1998), Adams (2001) 6 DOF in each arm Series elastic actuator Force control Spring law

Hardware – Cog’s Head 7 degrees of freedom Human speed and range of motion

Visual and Inertial Sensors Peripheral View Foveal View Peripheral View Foveal View 3-axis inertial sensor

Computational System Designed for real-time responses Network of 24 PC’s ranging from 200-800 MHz QNX real-time operating system Implementation shown today consists of –~26 QNX processes –~75 QNX threads

The Problem of Saliency How do you know what to attend to? Inherent Properties –Saturated Color –Movement –Skin Color Task Constraints Joint Reference Context-based attention (Breazeal & Scassellati, 1999)

A Model of Visual Search and Attention – (Wolfe 1998) Activation Map High level Goals Visual Input Color w Skin w Feature Detectors Feature Maps Motion w  Motor System

Motion Detection Image differencing produces a raw motion map Motion detection is inhibited for 300 msec following an eye movement Optic flow methods provide more local detail, but are much more computationally expensive D -

High Color Saturation Saliency is the maximum of the four opponent-color channels

Skin Color Saliency Skin tones can be (approximately) located within an (R,G,B) space:

Habituation Purpose: –Initially enhance the target of attention (foveated object) –Gradually decrease activation –Eventually suppress so that new target is selected Eye movement resets the habituation Contribution to the human model time

Implemented Model of Visual Search and Attention Activation Map Color w Skin w Motion w  Motor System Motivation System Habituation w Visual Input

“Seek face” high skin gain, low color saliency gain Looking time 28% block, 72% face “Seek toy” low skin gain, high saturated-color gain Looking time 28% face, 72% block Internal Influences on Attention  Internal influences bias how salience is measured  The robot is not a slave to its environment

Context-Based Attention Identical computation system on both robots Attention system drives the gaze direction Generation of social cues

Trajectory Formation Each frame produces a set of target points tt+1t+2t+3

Motion Correspondence tt+1t+2t+3 Each frame produces a set of target points Objective is to identify sequences through a subset of the frames

Multiple Hypothesis Tracking (Reid, 1979)(Cox and Hingorani, 1996) Allows for –Trajectory Initiation –Trajectory Termination –Minor occlusion Modified for continuous, real-time operation Matching based on »Area »Overall Saliency »Saliency among the individual feature channels Feature Extraction Generate k-best Hypotheses Management (pruning, merging) Delay Generate Predictions Matching

Trajectory Example Real-time (30 Hz) Maximum of 5 target points in each frame Search range limited to 60 frames (2 seconds)

Theory of Body (ToBY) Must distinguish between animate and inanimate objects Criteria : Self-propelled motion Laws of Naïve Physics Motion studies of Michotte (1963) with adults and Cohen and Amsel (1986) with children Launching Spatial Gap Temporal Gap (movies courtesy of Brian Scholl, Yale)

ToBY Architecture Static Object Expert Straight Line Expert Energy Expert Elastic Collision Expert Acceleration Sign Change Expert Trajectories Min Length Arbiter reject no yes Animacy Judgment

Straight Line Expert Elastic Collision Expert Energy Expert Acceleration Sign Change Expert Minimize the sum of the deviations from the mean velocity Look for transfer in velocities before and after collision Look for multiple sign changes in the acceleration Constant mass and the inertial system provides the gravity vector ToBY Agents

Animacy Results Animate InanimateAnimate Arbitration methods Weighted sum Winner-take-all

Human Baseline Responses Removed all context 45 Subjects on a web- based system Unnatural stimulus for the human subjects Exact stimuli processed by the robot

Comparing Human and Machine Results Hard task for subjects, but high inter-subject correlation Strong results on –Falling stimuli –Straight-line motion ToBY matched human judgment on all stimuli except #13 Mixed results on #10 678 910 1112 131415 1234 5

Post-Attentive Visual Processing: Finding Faces and Eyes Can you find the face in this image? Can you tell where I am looking?

Post-Attentive Visual Processing: Finding Faces and Eyes Locate target In wide field Foveate Target Apply Face Filter Software Zoom Feature Extraction 300 msec66 msec

Two Sensory-Motor Mappings Saccade Map Maps image positions to motor commands necessary to center that location in the visual image. Learned using standard self- supervised techniques (lookup tables, neural nets, etc.) Peripheral-Foveal Map Maps pixels in the peripheral image to pixels in the foveal image. Scale is learned using optic flow rates obtained while the cameras are moving. Position is learned using correlation.

Face Finding Skin Filter Foveal Image Ratio Template (Sinha, 1996) Oval Detector (Banks, Arsenio, & Fitzpatrick) Detected Faces

Software Zoom From full 640x480 image, extract the relevant 128x128 sub-image Introduces the majority of the system delay

Feature Finding Locate eyes and mouth by looking for centroid of luminance minima Mouth: –Iterative algorithm with adaptive regions provides performance similar to simulated annealing Eyes: –Add symmetry requirement

Head Pose Derivation Obtain estimate of head pose from the positions of the mouth, nostrils, and eyes Failure modes –Match to nostril –Vertical Lengthening Accuracy of +/- 5 degrees at a distance of 6 meters

Basic Intentionality Gigerenzer & Todd –Basic representations of intent in a simulation game Approach –Non-increasing distance –Matched relative heading Avoidance –Non-decreasing distance –Opposed relative heading

An Application to Mimicry (Scassellati & Adams) Mimicry of arm trajectories –No body model –Based on animate motion trajectories –Movement range based on perceived scale of face

Mapping Visual Trajectories to Arm Movements Postural primitives define a sub- space for positioning Positions within that sub-space can be represented as linear combinations of the basis vectors Based on findings of spinal force fields in frog (Bizzi, Mussa-Ivaldi) Mapping is based on perceived head position and robot’s own symmetry axis Postural Primitive Arm Coordinates Visual Coordinates

Basic Mimicry Autonomous operation Visually identified trajectories First step toward social learning

Mimicry based on Animacy Only animate trajectories are possible targets Match to human face scale or to perceived object extent

Mimicry based on Joint Reference Target selection is based on head orientation and animacy constraints Responds to natural social cues Use of joint reference as a saliency metric

Reaching based on Intent Head Orientation drives eye position Intent drives pointing Instructions –Get the robot’s attention –Look at the block –Get the robot’s attention again –Reach for the block

Evaluating Social Behaviors (Audley, Scassellati & Turkle) Do naïve subjects produce and recognize the appropriate social cues? Can they successfully instruct the robot to perform simple actions among many distractors? Future: degrading performances to match autistic behavior

End of the Road? EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Conclusions Proposed an embodied, perceptually grounded model of theory of mind Implemented system that –Determines saliency –Judges animacy –Engages in joint reference –Attributes basic intent Demonstrated an application to simple social mimicry as a first step toward social learning

The future Increases in computational power Drive for interactive technology Integration of many sub- disciplines Theory of mind skills will be central to any technology that interacts with people

Acknowledgements Committee –Rodney Brooks –Leslie Pack Kaelbling –Eric Grimson Cog Team –Bryan Adams –Aaron Edsinger –Matt Marjanovic Kismet Team –Cynthia Breazeal –Paul Fitzpatrick –Lijin Aryananda –Paulina Varchavskaia Lazlo Team –Aaron Edsinger –Una-May O’Reilly

Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab.

Similar presentations

Presentation on theme: "Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab.

Similar presentations

Presentation on theme: "Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab."— Presentation transcript:

Similar presentations

About project

Feedback