Presentation is loading. Please wait.

Presentation is loading. Please wait.

Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab.

Similar presentations


Presentation on theme: "Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab."— Presentation transcript:

1 Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab

2 Learning Environments Unreliable Feedback Unconstrained Environment High Penalty for Failure Unstructured Learning Continuous Feedback Constrained Environment Minimized Risk Structuring of Task and Solution

3 Grand Challenge: Social Learning Exploit the knowledge and assistance of people Recognize and respond to appropriate social cues Utilize natural social dynamics other

4 What would this Require? Machine Vision - object recognition - face finding Artificial Intelligence - behavior selection - planning Human- Machine Interfaces - social scripts - dynamics Real-Time Systems - embedded control - parallelism Motor Control - response fidelity - flexible control - safety issues Machine Learning - sequence learning - feedback cues Theory of Mind - beliefs and desires - joint reference

5 Outline Existing Models of Theory of Mind Embodied Theory of Mind Robot Hardware Implementation Application to Mimicry

6 Development of Theory of Mind Pretend Play Declarative Pointing Eye contact Simple Gaze Detection Complex Gaze Detection False Belief Tasks Normal Children Non-Human Animals Autistic Children ~ 12 months < 12 months < 3 months < 9 months < 18 months < 48 months vertebrates monkeys - ? great apes - yes subgroup A limited Very limited

7 Leslie’s Model Three spheres of causation: –Theory of Body (ToBY) –Theory of Mind Mechanism (ToMM) »System 1 applies rules of goals and desires »System 2 applies rules of belief and knowledge Mechanical agency Actional agency Attitudinal agency Inanimate Objects Animate Objects ToBY ToMM-1 ToMM-2

8 Baron-Cohen’s Model Requires two types of input stimuli: –Eye-like stimuli –Self-propelled (animate) stimuli Proposes that autism is an impairment of either SAM (subgroup A) or ToMM (subgroup B) Eye Direction Detector (EDD) Intentionality Detector (ID) Shared Attention Mechanism (SAM) Theory of Mind Mechanism (ToMM) Eye-like stimuliSelf-Propelled Stimuli Dyadic Representations (sees) Dyadic representations (desire, goals)

9 Implications to Robotics Both models –Offer an encouraging task decomposition –Provide an evaluation metric –Are approachable by our current technologies Neither model –Grounded in real perceptions –Accounts for behavioral selection

10 Embodied Theory of Mind EDDID ToBY Visual Input SAM Object Trajectories Eye- like Stimuli Animat e Stimuli Animat e Stimuli

11 Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Animat e Stimuli Animat e Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters

12 Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects

13 Embodied Theory of Mind EDDID ToBY Visual Input SAM Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder

14 Embodied Theory of Mind EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder SAM f

15 Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

16 Three Robotic Platforms Kismet Cog Lazlo

17 Hardware – Cog’s Arms Williamson (1998), Adams (2001) 6 DOF in each arm Series elastic actuator Force control Spring law

18 Hardware – Cog’s Head 7 degrees of freedom Human speed and range of motion

19 Visual and Inertial Sensors Peripheral View Foveal View Peripheral View Foveal View 3-axis inertial sensor

20 Computational System Designed for real-time responses Network of 24 PC’s ranging from 200-800 MHz QNX real-time operating system Implementation shown today consists of –~26 QNX processes –~75 QNX threads

21 Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

22 The Problem of Saliency How do you know what to attend to? Inherent Properties –Saturated Color –Movement –Skin Color Task Constraints Joint Reference Context-based attention (Breazeal & Scassellati, 1999)

23 A Model of Visual Search and Attention – (Wolfe 1998) Activation Map High level Goals Visual Input Color w Skin w Feature Detectors Feature Maps Motion w  Motor System

24 Motion Detection Image differencing produces a raw motion map Motion detection is inhibited for 300 msec following an eye movement Optic flow methods provide more local detail, but are much more computationally expensive D -

25 High Color Saturation Saliency is the maximum of the four opponent-color channels

26 Skin Color Saliency Skin tones can be (approximately) located within an (R,G,B) space:

27 Habituation Purpose: –Initially enhance the target of attention (foveated object) –Gradually decrease activation –Eventually suppress so that new target is selected Eye movement resets the habituation Contribution to the human model time

28 Implemented Model of Visual Search and Attention Activation Map Color w Skin w Motion w  Motor System Motivation System Habituation w Visual Input

29 “Seek face” high skin gain, low color saliency gain Looking time 28% block, 72% face “Seek toy” low skin gain, high saturated-color gain Looking time 28% face, 72% block Internal Influences on Attention  Internal influences bias how salience is measured  The robot is not a slave to its environment

30 Context-Based Attention Identical computation system on both robots Attention system drives the gaze direction Generation of social cues

31 Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

32 Trajectory Formation Each frame produces a set of target points tt+1t+2t+3

33 Motion Correspondence tt+1t+2t+3 Each frame produces a set of target points Objective is to identify sequences through a subset of the frames

34 Multiple Hypothesis Tracking (Reid, 1979)(Cox and Hingorani, 1996) Allows for –Trajectory Initiation –Trajectory Termination –Minor occlusion Modified for continuous, real-time operation Matching based on »Area »Overall Saliency »Saliency among the individual feature channels Feature Extraction Generate k-best Hypotheses Management (pruning, merging) Delay Generate Predictions Matching

35 Trajectory Example Real-time (30 Hz) Maximum of 5 target points in each frame Search range limited to 60 frames (2 seconds)

36 Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

37 Theory of Body (ToBY) Must distinguish between animate and inanimate objects Criteria : Self-propelled motion Laws of Naïve Physics Motion studies of Michotte (1963) with adults and Cohen and Amsel (1986) with children Launching Spatial Gap Temporal Gap (movies courtesy of Brian Scholl, Yale)

38 ToBY Architecture Static Object Expert Straight Line Expert Energy Expert Elastic Collision Expert Acceleration Sign Change Expert Trajectories Min Length Arbiter reject no yes Animacy Judgment

39 Straight Line Expert Elastic Collision Expert Energy Expert Acceleration Sign Change Expert Minimize the sum of the deviations from the mean velocity Look for transfer in velocities before and after collision Look for multiple sign changes in the acceleration Constant mass and the inertial system provides the gravity vector ToBY Agents

40 Animacy Results Animate InanimateAnimate Arbitration methods Weighted sum Winner-take-all

41 Human Baseline Responses Removed all context 45 Subjects on a web- based system Unnatural stimulus for the human subjects Exact stimuli processed by the robot

42 Comparing Human and Machine Results Hard task for subjects, but high inter-subject correlation Strong results on –Falling stimuli –Straight-line motion ToBY matched human judgment on all stimuli except #13 Mixed results on #10 678 910 1112 131415 1234 5

43 Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

44 Post-Attentive Visual Processing: Finding Faces and Eyes Can you find the face in this image? Can you tell where I am looking?

45 Post-Attentive Visual Processing: Finding Faces and Eyes Locate target In wide field Foveate Target Apply Face Filter Software Zoom Feature Extraction 300 msec66 msec

46 Two Sensory-Motor Mappings Saccade Map Maps image positions to motor commands necessary to center that location in the visual image. Learned using standard self- supervised techniques (lookup tables, neural nets, etc.) Peripheral-Foveal Map Maps pixels in the peripheral image to pixels in the foveal image. Scale is learned using optic flow rates obtained while the cameras are moving. Position is learned using correlation.

47 Face Finding Skin Filter Foveal Image Ratio Template (Sinha, 1996) Oval Detector (Banks, Arsenio, & Fitzpatrick) Detected Faces

48 Software Zoom From full 640x480 image, extract the relevant 128x128 sub-image Introduces the majority of the system delay

49 Feature Finding Locate eyes and mouth by looking for centroid of luminance minima Mouth: –Iterative algorithm with adaptive regions provides performance similar to simulated annealing Eyes: –Add symmetry requirement

50 Head Pose Derivation Obtain estimate of head pose from the positions of the mouth, nostrils, and eyes Failure modes –Match to nostril –Vertical Lengthening Accuracy of +/- 5 degrees at a distance of 6 meters

51 Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

52 Basic Intentionality Gigerenzer & Todd –Basic representations of intent in a simulation game Approach –Non-increasing distance –Matched relative heading Avoidance –Non-decreasing distance –Opposed relative heading

53 Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

54 An Application to Mimicry (Scassellati & Adams) Mimicry of arm trajectories –No body model –Based on animate motion trajectories –Movement range based on perceived scale of face

55 Mapping Visual Trajectories to Arm Movements Postural primitives define a sub- space for positioning Positions within that sub-space can be represented as linear combinations of the basis vectors Based on findings of spinal force fields in frog (Bizzi, Mussa-Ivaldi) Mapping is based on perceived head position and robot’s own symmetry axis Postural Primitive Arm Coordinates Visual Coordinates

56 Basic Mimicry Autonomous operation Visually identified trajectories First step toward social learning

57 Mimicry based on Animacy Only animate trajectories are possible targets Match to human face scale or to perceived object extent

58 Mimicry based on Joint Reference Target selection is based on head orientation and animacy constraints Responds to natural social cues Use of joint reference as a saliency metric

59 Reaching based on Intent Head Orientation drives eye position Intent drives pointing Instructions –Get the robot’s attention –Look at the block –Get the robot’s attention again –Reach for the block

60 Evaluating Social Behaviors (Audley, Scassellati & Turkle) Do naïve subjects produce and recognize the appropriate social cues? Can they successfully instruct the robot to perform simple actions among many distractors? Future: degrading performances to match autistic behavior

61 End of the Road? EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

62 Conclusions Proposed an embodied, perceptually grounded model of theory of mind Implemented system that –Determines saliency –Judges animacy –Engages in joint reference –Attributes basic intent Demonstrated an application to simple social mimicry as a first step toward social learning

63 The future Increases in computational power Drive for interactive technology Integration of many sub- disciplines Theory of mind skills will be central to any technology that interacts with people

64 Acknowledgements Committee –Rodney Brooks –Leslie Pack Kaelbling –Eric Grimson Cog Team –Bryan Adams –Aaron Edsinger –Matt Marjanovic Kismet Team –Cynthia Breazeal –Paul Fitzpatrick –Lijin Aryananda –Paulina Varchavskaia Lazlo Team –Aaron Edsinger –Una-May O’Reilly


Download ppt "Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab."

Similar presentations


Ads by Google