The Next Generation of Robots? Rodney Brooks and Una-May O’Reilly
Our Objectives How can biology inform robotic competence? How can aspects of human development and social behavior inform robotic competence?
Our Approach Exploit the advantages of the robot’s physical embodiment Integrate multiple sensory and motor systems to provide robust and stable behavioral constraints Capitalize on social cues from an instructor Build adaptive systems with a developmental progression to limit complexity
Our Humanoid Platforms Cog and Kismet
Biological Inspiration for Cog Cog has simulated musculature in its arms Cog has an implementation of a human model of visual search and attention Cog employs context-based attention and internal situations influence action Cog uses a naïve model of physics to distinguish animate from inanimate
Social Inspiration for Cog A theory of mind A theory of body Mimicry
Human <—> Robot Cameras Gaze direction Microphones Facial Neck pan Neck tilt Neck lean Eye tilt Left eye pan Right eye pan Camera with wide field of view Camera with narrow field of Axis of rotation Microphones Facial features Speech synthesizer Head orientation
Levels of Control robot responds to human human responds to robot Social Level Behavior Level perceptual feedback current goal Skills Level coordination between motor modalities current primitive(s) Primitives Level
Kismet’s Competencies Direct Visual Attention Recognize Socially Communicated Reinforcement Communicate Internal State to Human Regulation of Social Interaction
No One in Charge QNX L NT 11 400-500 MHz PCs Linux QNX (vision) speech synthesis affect recognition Linux Speech recognition Face Control Emotive Response Percept & Motor Drives & Behavior L Tracker Attn system Dist. to target Motion filter Eye finder Motor ctrl audio speech comms Skin Color QNX CORBA dual-port RAM Cameras Eye, neck, jaw motors Ear, eyebrow, eyelid, lip motors Microphone Speakers 4 Motorola 68332 micro-controllers L, multi-threaded lisp higher-level perception, motivation, behavior, motor skill integration & face control 11 400-500 MHz PCs QNX (vision) Linux (speech recognition) NT (speech synthesis & vocal affect recognition
Visual Attention skin tone color motion habituation attention Frame Grabber skin tone color motion habituation w w w w attention inhibit reset Top down, task-driven influences Eye Motor Control
Visual Search
Social Constraints Person backs off Person draws closer Comfortable interaction speed Too fast – irritation response Too fast, Too close – threat response Comfortable interaction distance Too close – withdrawal response Too far – calling behavior Person draws closer Person backs off Beyond sensor range
Evidence for 4 contours in Kismet-directed speech Cross Cultural Affect Evidence for 4 contours in Kismet-directed speech time (ms) pitch, f (kHz) o approval That’s a good bo-o-y! No no baby. prohibition Can you get it? attention MMMM Oh, honey. comfort
Low-Intensity Neutral High Intensity Neutral Affect Recognizer Soothing & Low-Intensity neutral vs Everything Else Soothing Low-Intensity Neutral Approval & Attention Prohibition High Intensity Neutral approval attention soothing prohibition neutral prohibition attention & approval energy variance soothing & low-energy neutral pitch mean
Naive Subjects 5 female subjects 4 naive subjects 1 caregiver Four contours and neutral speech praise, prohibition, attention, soothing Multiple languages French, German, Indonesian, English, Russian Driven by Human
Facial Expressions arousal sleep displeasure pleasure neutral excitement depression stress calm afraid angry frustrated relaxed content elated bored sad fatigued happy surprise sleepy
Facial Postures in Affect Space Open stance Low arousal fear accepting Negative valence tired unhappy content surprise Positive valence disgust stern High arousal anger Closed stance
Face, Voice, Posture
Turn-Taking / Proto-Dialog Naïve subjects Told to “talk to the robot” Engage in turn taking No understanding (on either side) of content
Implemented Model of Visual Search and Attention Color w Motor System Motion w Activation Map Skin w Motivation System Habituation w
Hardware – Cog’s Arms 6 DOF in each arm Series elastic actuator Force control Spring law
Hardware – Cog’s Head 7 degrees of freedom Human speed and range of motion
Visual and Inertial Sensors 3-axis inertial sensor Peripheral View Peripheral View Foveal View Foveal View
Computational System Designed for real-time responses Network of 24 PC’s ranging from 200-800 MHz QNX real-time operating system Implementation shown today consists of ~26 QNX processes ~75 QNX threads