Download presentation
Presentation is loading. Please wait.
Published byStephanie Mitchell Modified over 9 years ago
1
Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab
2
Learning Environments Unreliable Feedback Unconstrained Environment High Penalty for Failure Unstructured Learning Continuous Feedback Constrained Environment Minimized Risk Structuring of Task and Solution
3
Grand Challenge: Social Learning Exploit the knowledge and assistance of people Recognize and respond to appropriate social cues Utilize natural social dynamics other
4
What would this Require? Machine Vision - object recognition - face finding Artificial Intelligence - behavior selection - planning Human- Machine Interfaces - social scripts - dynamics Real-Time Systems - embedded control - parallelism Motor Control - response fidelity - flexible control - safety issues Machine Learning - sequence learning - feedback cues Theory of Mind - beliefs and desires - joint reference
5
Outline Existing Models of Theory of Mind Embodied Theory of Mind Robot Hardware Implementation Application to Mimicry
6
Development of Theory of Mind Pretend Play Declarative Pointing Eye contact Simple Gaze Detection Complex Gaze Detection False Belief Tasks Normal Children Non-Human Animals Autistic Children ~ 12 months < 12 months < 3 months < 9 months < 18 months < 48 months vertebrates monkeys - ? great apes - yes subgroup A limited Very limited
7
Leslie’s Model Three spheres of causation: –Theory of Body (ToBY) –Theory of Mind Mechanism (ToMM) »System 1 applies rules of goals and desires »System 2 applies rules of belief and knowledge Mechanical agency Actional agency Attitudinal agency Inanimate Objects Animate Objects ToBY ToMM-1 ToMM-2
8
Baron-Cohen’s Model Requires two types of input stimuli: –Eye-like stimuli –Self-propelled (animate) stimuli Proposes that autism is an impairment of either SAM (subgroup A) or ToMM (subgroup B) Eye Direction Detector (EDD) Intentionality Detector (ID) Shared Attention Mechanism (SAM) Theory of Mind Mechanism (ToMM) Eye-like stimuliSelf-Propelled Stimuli Dyadic Representations (sees) Dyadic representations (desire, goals)
9
Implications to Robotics Both models –Offer an encouraging task decomposition –Provide an evaluation metric –Are approachable by our current technologies Neither model –Grounded in real perceptions –Accounts for behavioral selection
10
Embodied Theory of Mind EDDID ToBY Visual Input SAM Object Trajectories Eye- like Stimuli Animat e Stimuli Animat e Stimuli
11
Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Animat e Stimuli Animat e Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters
12
Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects
13
Embodied Theory of Mind EDDID ToBY Visual Input SAM Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder
14
Embodied Theory of Mind EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder SAM f
15
Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
16
Three Robotic Platforms Kismet Cog Lazlo
17
Hardware – Cog’s Arms Williamson (1998), Adams (2001) 6 DOF in each arm Series elastic actuator Force control Spring law
18
Hardware – Cog’s Head 7 degrees of freedom Human speed and range of motion
19
Visual and Inertial Sensors Peripheral View Foveal View Peripheral View Foveal View 3-axis inertial sensor
20
Computational System Designed for real-time responses Network of 24 PC’s ranging from 200-800 MHz QNX real-time operating system Implementation shown today consists of –~26 QNX processes –~75 QNX threads
21
Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
22
The Problem of Saliency How do you know what to attend to? Inherent Properties –Saturated Color –Movement –Skin Color Task Constraints Joint Reference Context-based attention (Breazeal & Scassellati, 1999)
23
A Model of Visual Search and Attention – (Wolfe 1998) Activation Map High level Goals Visual Input Color w Skin w Feature Detectors Feature Maps Motion w Motor System
24
Motion Detection Image differencing produces a raw motion map Motion detection is inhibited for 300 msec following an eye movement Optic flow methods provide more local detail, but are much more computationally expensive D -
25
High Color Saturation Saliency is the maximum of the four opponent-color channels
26
Skin Color Saliency Skin tones can be (approximately) located within an (R,G,B) space:
27
Habituation Purpose: –Initially enhance the target of attention (foveated object) –Gradually decrease activation –Eventually suppress so that new target is selected Eye movement resets the habituation Contribution to the human model time
28
Implemented Model of Visual Search and Attention Activation Map Color w Skin w Motion w Motor System Motivation System Habituation w Visual Input
29
“Seek face” high skin gain, low color saliency gain Looking time 28% block, 72% face “Seek toy” low skin gain, high saturated-color gain Looking time 28% face, 72% block Internal Influences on Attention Internal influences bias how salience is measured The robot is not a slave to its environment
30
Context-Based Attention Identical computation system on both robots Attention system drives the gaze direction Generation of social cues
31
Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
32
Trajectory Formation Each frame produces a set of target points tt+1t+2t+3
33
Motion Correspondence tt+1t+2t+3 Each frame produces a set of target points Objective is to identify sequences through a subset of the frames
34
Multiple Hypothesis Tracking (Reid, 1979)(Cox and Hingorani, 1996) Allows for –Trajectory Initiation –Trajectory Termination –Minor occlusion Modified for continuous, real-time operation Matching based on »Area »Overall Saliency »Saliency among the individual feature channels Feature Extraction Generate k-best Hypotheses Management (pruning, merging) Delay Generate Predictions Matching
35
Trajectory Example Real-time (30 Hz) Maximum of 5 target points in each frame Search range limited to 60 frames (2 seconds)
36
Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
37
Theory of Body (ToBY) Must distinguish between animate and inanimate objects Criteria : Self-propelled motion Laws of Naïve Physics Motion studies of Michotte (1963) with adults and Cohen and Amsel (1986) with children Launching Spatial Gap Temporal Gap (movies courtesy of Brian Scholl, Yale)
38
ToBY Architecture Static Object Expert Straight Line Expert Energy Expert Elastic Collision Expert Acceleration Sign Change Expert Trajectories Min Length Arbiter reject no yes Animacy Judgment
39
Straight Line Expert Elastic Collision Expert Energy Expert Acceleration Sign Change Expert Minimize the sum of the deviations from the mean velocity Look for transfer in velocities before and after collision Look for multiple sign changes in the acceleration Constant mass and the inertial system provides the gravity vector ToBY Agents
40
Animacy Results Animate InanimateAnimate Arbitration methods Weighted sum Winner-take-all
41
Human Baseline Responses Removed all context 45 Subjects on a web- based system Unnatural stimulus for the human subjects Exact stimuli processed by the robot
42
Comparing Human and Machine Results Hard task for subjects, but high inter-subject correlation Strong results on –Falling stimuli –Straight-line motion ToBY matched human judgment on all stimuli except #13 Mixed results on #10 678 910 1112 131415 1234 5
43
Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
44
Post-Attentive Visual Processing: Finding Faces and Eyes Can you find the face in this image? Can you tell where I am looking?
45
Post-Attentive Visual Processing: Finding Faces and Eyes Locate target In wide field Foveate Target Apply Face Filter Software Zoom Feature Extraction 300 msec66 msec
46
Two Sensory-Motor Mappings Saccade Map Maps image positions to motor commands necessary to center that location in the visual image. Learned using standard self- supervised techniques (lookup tables, neural nets, etc.) Peripheral-Foveal Map Maps pixels in the peripheral image to pixels in the foveal image. Scale is learned using optic flow rates obtained while the cameras are moving. Position is learned using correlation.
47
Face Finding Skin Filter Foveal Image Ratio Template (Sinha, 1996) Oval Detector (Banks, Arsenio, & Fitzpatrick) Detected Faces
48
Software Zoom From full 640x480 image, extract the relevant 128x128 sub-image Introduces the majority of the system delay
49
Feature Finding Locate eyes and mouth by looking for centroid of luminance minima Mouth: –Iterative algorithm with adaptive regions provides performance similar to simulated annealing Eyes: –Add symmetry requirement
50
Head Pose Derivation Obtain estimate of head pose from the positions of the mouth, nostrils, and eyes Failure modes –Match to nostril –Vertical Lengthening Accuracy of +/- 5 degrees at a distance of 6 meters
51
Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
52
Basic Intentionality Gigerenzer & Todd –Basic representations of intent in a simulation game Approach –Non-increasing distance –Matched relative heading Avoidance –Non-decreasing distance –Opposed relative heading
53
Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
54
An Application to Mimicry (Scassellati & Adams) Mimicry of arm trajectories –No body model –Based on animate motion trajectories –Movement range based on perceived scale of face
55
Mapping Visual Trajectories to Arm Movements Postural primitives define a sub- space for positioning Positions within that sub-space can be represented as linear combinations of the basis vectors Based on findings of spinal force fields in frog (Bizzi, Mussa-Ivaldi) Mapping is based on perceived head position and robot’s own symmetry axis Postural Primitive Arm Coordinates Visual Coordinates
56
Basic Mimicry Autonomous operation Visually identified trajectories First step toward social learning
57
Mimicry based on Animacy Only animate trajectories are possible targets Match to human face scale or to perceived object extent
58
Mimicry based on Joint Reference Target selection is based on head orientation and animacy constraints Responds to natural social cues Use of joint reference as a saliency metric
59
Reaching based on Intent Head Orientation drives eye position Intent drives pointing Instructions –Get the robot’s attention –Look at the block –Get the robot’s attention again –Reach for the block
60
Evaluating Social Behaviors (Audley, Scassellati & Turkle) Do naïve subjects produce and recognize the appropriate social cues? Can they successfully instruct the robot to perform simple actions among many distractors? Future: degrading performances to match autistic behavior
61
End of the Road? EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System
62
Conclusions Proposed an embodied, perceptually grounded model of theory of mind Implemented system that –Determines saliency –Judges animacy –Engages in joint reference –Attributes basic intent Demonstrated an application to simple social mimicry as a first step toward social learning
63
The future Increases in computational power Drive for interactive technology Integration of many sub- disciplines Theory of mind skills will be central to any technology that interacts with people
64
Acknowledgements Committee –Rodney Brooks –Leslie Pack Kaelbling –Eric Grimson Cog Team –Bryan Adams –Aaron Edsinger –Matt Marjanovic Kismet Team –Cynthia Breazeal –Paul Fitzpatrick –Lijin Aryananda –Paulina Varchavskaia Lazlo Team –Aaron Edsinger –Una-May O’Reilly
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.