Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab.

Slides:



Advertisements
Similar presentations
Attention and neglect.
Advertisements

Perception and Perspective in Robotics Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group Goal To build.
Paul Fitzpatrick lbr-vision – Face to Face – Robot vision in social settings.
Mental Modules, Mindreading & Male-Female Brain-Based Differences Autism and Baron-Cohen’s “Four Evolutionary Steps” that underlie the Human Mindreading.
Vision Based Control Motion Matt Baker Kevin VanDyke.
Chapter 6: Visual Attention. Scanning a Scene Visual scanning – looking from place to place –Fixation –Saccadic eye movement Overt attention involves.
Embedded System Lab Kim Jong Hwi Chonbuk National University Introduction to Intelligent Robots.
Current Trends in Image Quality Perception Mason Macklem Simon Fraser University
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
CS 561, Sessions 27 1 Towards intelligent machines Thanks to CSCI561, we now know how to… - Search (and play games) - Build a knowledge base using FOL.
Probabilistic video stabilization using Kalman filtering and mosaicking.
Animat Vision: Active Vision in Artificial Animals by Demetri Terzopoulos and Tamer F. Rabie.
Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.
Tracking multiple independent targets: Evidence for a parallel tracking mechanism Zenon Pylyshyn and Ron Storm presented by Nick Howe.
Human-robot interaction Michal de Vries. Humanoid robots as cooperative partners for people Breazeal, Brooks, Gray, Hoffman, Kidd, Lee, Lieberman, Lockerd.
Visual Odometry for Ground Vehicle Applications David Nister, Oleg Naroditsky, James Bergen Sarnoff Corporation, CN5300 Princeton, NJ CPSC 643, Presentation.
Teleoperation Interfaces. Introduction Interface between the operator and teleoperator! Teleoperation interface is like any other HMI H(mobile)RI = TI.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Computer-Based Animation. ● To animate something – to bring it to life ● Animation covers all changes that have visual effects – Positon (motion dynamic)
Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Cognitive Robots © 2014, SNU CSE Biointelligence Lab.,
1 Motivation Video Communication over Heterogeneous Networks –Diverse client devices –Various network connection bandwidths Limitations of Scalable Video.
CIS 601 Fall 2004 Introduction to Computer Vision and Intelligent Systems Longin Jan Latecki Parts are based on lectures of Rolf Lakaemper and David Young.
IMPLEMENTATION ISSUES REGARDING A 3D ROBOT – BASED LASER SCANNING SYSTEM Theodor Borangiu, Anamaria Dogar, Alexandru Dumitrache University Politehnica.
Constraints-based Motion Planning for an Automatic, Flexible Laser Scanning Robotized Platform Th. Borangiu, A. Dogar, A. Dumitrache University Politehnica.
Irfan Essa, Alex Pentland Facial Expression Recognition using a Dynamic Model and Motion Energy (a review by Paul Fitzpatrick for 6.892)
Sensing self motion Key points: Why robots need self-sensing Sensors for proprioception in biological systems in robot systems Position sensing Velocity.
Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.
CIS 601 Fall 2003 Introduction to Computer Vision Longin Jan Latecki Based on the lectures of Rolf Lakaemper and David Young.
Towards Cognitive Robotics Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Christian.
Cynthia Breazeal Aaron Edsinger Paul Fitzpatrick Brian Scassellati MIT AI Lab Social Constraints on Animate Vision.
Dynamic 3D Scene Analysis from a Moving Vehicle Young Ki Baik (CV Lab.) (Wed)
Beyond Gazing, Pointing, and Reaching A Survey of Developmental Robotics Authors: Max Lungarella, Giorgio Metta.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Mobile Robot Navigation Using Fuzzy logic Controller
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
University of Windsor School of Computer Science Topics in Artificial Intelligence Fall 2008 Sept 11, 2008.
DARPA ITO/MARS Project Update Vanderbilt University A Software Architecture and Tools for Autonomous Robots that Learn on Mission K. Kawamura, M. Wilkes,
Motion Analysis using Optical flow CIS750 Presentation Student: Wan Wang Prof: Longin Jan Latecki Spring 2003 CIS Dept of Temple.
Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence.
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
Object Lesson: Discovering and Learning to Recognize Objects Object Lesson: Discovering and Learning to Recognize Objects – Paul Fitzpatrick – MIT CSAIL.
The geometry of the system consisting of the hyperbolic mirror and the CCD camera is shown to the right. The points on the mirror surface can be expressed.
The Next Generation of Robots?
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Autonomous Robots Vision © Manfred Huber 2014.
Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN.
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; January Adaptive Intelligent Mobile Robotics Leslie Pack Kaelbling Artificial Intelligence.
Hand Gesture Recognition Using Haar-Like Features and a Stochastic Context-Free Grammar IEEE 高裕凱 陳思安.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Probabilistic Robotics Introduction.  Robotics is the science of perceiving and manipulating the physical world through computer-controlled devices.
MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.
Give examples of the way that virtual reality can be used in Psychology.
DARPA Mobile Autonomous Robot Software BAA99-09 July 1999 Natural Tasking of Robots Based on Human Interaction Cues Cynthia Breazeal Rodney Brooks Brian.
Robotics/Machine Vision Robert Love, Venkat Jayaraman July 17, 2008 SSTP Seminar – Lecture 7.
Learning video saliency from human gaze using candidate selection CVPR2013 Poster.
Feel the beat: using cross-modal rhythm to integrate perception of objects, others, and self Paul Fitzpatrick and Artur M. Arsenio CSAIL, MIT.
Probabilistic Robotics Introduction. SA-1 2 Introduction  Robotics is the science of perceiving and manipulating the physical world through computer-controlled.
AN ACTIVE VISION APPROACH TO OBJECT SEGMENTATION – Paul Fitzpatrick – MIT CSAIL.
Simulation of Characters in Entertainment Virtual Reality.
Visual Recognition of Human Movement Style Frank E. Pollick Department of Psychology University of Glasgow.
Paper – Stephen Se, David Lowe, Jim Little
CS201 Lecture 02 Computer Vision: Image Formation and Basic Techniques
Domo: Manipulation for Partner Robots Aaron Edsinger MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group
Mental Modules, Mindreading & Male-Female Brain-Based Differences
Life in the Humanoid Robotics Group MIT AI Lab
Synthesis of Motion from Simple Animations
Domo: Manipulation for Partner Robots Aaron Edsinger MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group
Learning Sensorimotor Contingencies
Presentation transcript:

Theory of Mind for a Humanoid Robot Brian Scassellati MIT Artificial Intelligence Lab

Learning Environments Unreliable Feedback Unconstrained Environment High Penalty for Failure Unstructured Learning Continuous Feedback Constrained Environment Minimized Risk Structuring of Task and Solution

Grand Challenge: Social Learning Exploit the knowledge and assistance of people Recognize and respond to appropriate social cues Utilize natural social dynamics other

What would this Require? Machine Vision - object recognition - face finding Artificial Intelligence - behavior selection - planning Human- Machine Interfaces - social scripts - dynamics Real-Time Systems - embedded control - parallelism Motor Control - response fidelity - flexible control - safety issues Machine Learning - sequence learning - feedback cues Theory of Mind - beliefs and desires - joint reference

Outline Existing Models of Theory of Mind Embodied Theory of Mind Robot Hardware Implementation Application to Mimicry

Development of Theory of Mind Pretend Play Declarative Pointing Eye contact Simple Gaze Detection Complex Gaze Detection False Belief Tasks Normal Children Non-Human Animals Autistic Children ~ 12 months < 12 months < 3 months < 9 months < 18 months < 48 months vertebrates monkeys - ? great apes - yes subgroup A limited Very limited

Leslie’s Model Three spheres of causation: –Theory of Body (ToBY) –Theory of Mind Mechanism (ToMM) »System 1 applies rules of goals and desires »System 2 applies rules of belief and knowledge Mechanical agency Actional agency Attitudinal agency Inanimate Objects Animate Objects ToBY ToMM-1 ToMM-2

Baron-Cohen’s Model Requires two types of input stimuli: –Eye-like stimuli –Self-propelled (animate) stimuli Proposes that autism is an impairment of either SAM (subgroup A) or ToMM (subgroup B) Eye Direction Detector (EDD) Intentionality Detector (ID) Shared Attention Mechanism (SAM) Theory of Mind Mechanism (ToMM) Eye-like stimuliSelf-Propelled Stimuli Dyadic Representations (sees) Dyadic representations (desire, goals)

Implications to Robotics Both models –Offer an encouraging task decomposition –Provide an evaluation metric –Are approachable by our current technologies Neither model –Grounded in real perceptions –Accounts for behavioral selection

Embodied Theory of Mind EDDID ToBY Visual Input SAM Object Trajectories Eye- like Stimuli Animat e Stimuli Animat e Stimuli

Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Animat e Stimuli Animat e Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters

Embodied Theory of Mind EDDID ToBY Visual Input SAM Eye- like Stimuli Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects

Embodied Theory of Mind EDDID ToBY Visual Input SAM Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder

Embodied Theory of Mind EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Pre-attentive filters Animate Objects Face Finder SAM f

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Three Robotic Platforms Kismet Cog Lazlo

Hardware – Cog’s Arms Williamson (1998), Adams (2001) 6 DOF in each arm Series elastic actuator Force control Spring law

Hardware – Cog’s Head 7 degrees of freedom Human speed and range of motion

Visual and Inertial Sensors Peripheral View Foveal View Peripheral View Foveal View 3-axis inertial sensor

Computational System Designed for real-time responses Network of 24 PC’s ranging from MHz QNX real-time operating system Implementation shown today consists of –~26 QNX processes –~75 QNX threads

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

The Problem of Saliency How do you know what to attend to? Inherent Properties –Saturated Color –Movement –Skin Color Task Constraints Joint Reference Context-based attention (Breazeal & Scassellati, 1999)

A Model of Visual Search and Attention – (Wolfe 1998) Activation Map High level Goals Visual Input Color w Skin w Feature Detectors Feature Maps Motion w  Motor System

Motion Detection Image differencing produces a raw motion map Motion detection is inhibited for 300 msec following an eye movement Optic flow methods provide more local detail, but are much more computationally expensive D -

High Color Saturation Saliency is the maximum of the four opponent-color channels

Skin Color Saliency Skin tones can be (approximately) located within an (R,G,B) space:

Habituation Purpose: –Initially enhance the target of attention (foveated object) –Gradually decrease activation –Eventually suppress so that new target is selected Eye movement resets the habituation Contribution to the human model time

Implemented Model of Visual Search and Attention Activation Map Color w Skin w Motion w  Motor System Motivation System Habituation w Visual Input

“Seek face” high skin gain, low color saliency gain Looking time 28% block, 72% face “Seek toy” low skin gain, high saturated-color gain Looking time 28% face, 72% block Internal Influences on Attention  Internal influences bias how salience is measured  The robot is not a slave to its environment

Context-Based Attention Identical computation system on both robots Attention system drives the gaze direction Generation of social cues

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Trajectory Formation Each frame produces a set of target points tt+1t+2t+3

Motion Correspondence tt+1t+2t+3 Each frame produces a set of target points Objective is to identify sequences through a subset of the frames

Multiple Hypothesis Tracking (Reid, 1979)(Cox and Hingorani, 1996) Allows for –Trajectory Initiation –Trajectory Termination –Minor occlusion Modified for continuous, real-time operation Matching based on »Area »Overall Saliency »Saliency among the individual feature channels Feature Extraction Generate k-best Hypotheses Management (pruning, merging) Delay Generate Predictions Matching

Trajectory Example Real-time (30 Hz) Maximum of 5 target points in each frame Search range limited to 60 frames (2 seconds)

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Theory of Body (ToBY) Must distinguish between animate and inanimate objects Criteria : Self-propelled motion Laws of Naïve Physics Motion studies of Michotte (1963) with adults and Cohen and Amsel (1986) with children Launching Spatial Gap Temporal Gap (movies courtesy of Brian Scholl, Yale)

ToBY Architecture Static Object Expert Straight Line Expert Energy Expert Elastic Collision Expert Acceleration Sign Change Expert Trajectories Min Length Arbiter reject no yes Animacy Judgment

Straight Line Expert Elastic Collision Expert Energy Expert Acceleration Sign Change Expert Minimize the sum of the deviations from the mean velocity Look for transfer in velocities before and after collision Look for multiple sign changes in the acceleration Constant mass and the inertial system provides the gravity vector ToBY Agents

Animacy Results Animate InanimateAnimate Arbitration methods Weighted sum Winner-take-all

Human Baseline Responses Removed all context 45 Subjects on a web- based system Unnatural stimulus for the human subjects Exact stimuli processed by the robot

Comparing Human and Machine Results Hard task for subjects, but high inter-subject correlation Strong results on –Falling stimuli –Straight-line motion ToBY matched human judgment on all stimuli except #13 Mixed results on #

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Post-Attentive Visual Processing: Finding Faces and Eyes Can you find the face in this image? Can you tell where I am looking?

Post-Attentive Visual Processing: Finding Faces and Eyes Locate target In wide field Foveate Target Apply Face Filter Software Zoom Feature Extraction 300 msec66 msec

Two Sensory-Motor Mappings Saccade Map Maps image positions to motor commands necessary to center that location in the visual image. Learned using standard self- supervised techniques (lookup tables, neural nets, etc.) Peripheral-Foveal Map Maps pixels in the peripheral image to pixels in the foveal image. Scale is learned using optic flow rates obtained while the cameras are moving. Position is learned using correlation.

Face Finding Skin Filter Foveal Image Ratio Template (Sinha, 1996) Oval Detector (Banks, Arsenio, & Fitzpatrick) Detected Faces

Software Zoom From full 640x480 image, extract the relevant 128x128 sub-image Introduces the majority of the system delay

Feature Finding Locate eyes and mouth by looking for centroid of luminance minima Mouth: –Iterative algorithm with adaptive regions provides performance similar to simulated annealing Eyes: –Add symmetry requirement

Head Pose Derivation Obtain estimate of head pose from the positions of the mouth, nostrils, and eyes Failure modes –Match to nostril –Vertical Lengthening Accuracy of +/- 5 degrees at a distance of 6 meters

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Basic Intentionality Gigerenzer & Todd –Basic representations of intent in a simulation game Approach –Non-increasing distance –Matched relative heading Avoidance –Non-decreasing distance –Opposed relative heading

Roadmap EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

An Application to Mimicry (Scassellati & Adams) Mimicry of arm trajectories –No body model –Based on animate motion trajectories –Movement range based on perceived scale of face

Mapping Visual Trajectories to Arm Movements Postural primitives define a sub- space for positioning Positions within that sub-space can be represented as linear combinations of the basis vectors Based on findings of spinal force fields in frog (Bizzi, Mussa-Ivaldi) Mapping is based on perceived head position and robot’s own symmetry axis Postural Primitive Arm Coordinates Visual Coordinates

Basic Mimicry Autonomous operation Visually identified trajectories First step toward social learning

Mimicry based on Animacy Only animate trajectories are possible targets Match to human face scale or to perceived object extent

Mimicry based on Joint Reference Target selection is based on head orientation and animacy constraints Responds to natural social cues Use of joint reference as a saliency metric

Reaching based on Intent Head Orientation drives eye position Intent drives pointing Instructions –Get the robot’s attention –Look at the block –Get the robot’s attention again –Reach for the block

Evaluating Social Behaviors (Audley, Scassellati & Turkle) Do naïve subjects produce and recognize the appropriate social cues? Can they successfully instruct the robot to perform simple actions among many distractors? Future: degrading performances to match autistic behavior

End of the Road? EDDID ToBY Visual Input Visual Attention Trajectory Formation ffff Face Finder SAM f Behavior System

Conclusions Proposed an embodied, perceptually grounded model of theory of mind Implemented system that –Determines saliency –Judges animacy –Engages in joint reference –Attributes basic intent Demonstrated an application to simple social mimicry as a first step toward social learning

The future Increases in computational power Drive for interactive technology Integration of many sub- disciplines Theory of mind skills will be central to any technology that interacts with people

Acknowledgements Committee –Rodney Brooks –Leslie Pack Kaelbling –Eric Grimson Cog Team –Bryan Adams –Aaron Edsinger –Matt Marjanovic Kismet Team –Cynthia Breazeal –Paul Fitzpatrick –Lijin Aryananda –Paulina Varchavskaia Lazlo Team –Aaron Edsinger –Una-May O’Reilly