Paul Fitzpatrick Human – Robot Communication.  Motivation for communication  Human-readable actions  Reading human actions  Conclusions Human – Robot.

Slides:



Advertisements
Similar presentations
ARTIFICIAL PASSENGER.
Advertisements

Introduction to Eye Tracking
Perception and Perspective in Robotics Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory Humanoid Robotics Group Goal To build.
Paul Fitzpatrick lbr-vision – Face to Face – Robot vision in social settings.
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Range Imaging and Pose Estimation of Non-Cooperative Targets using Structured Light Dr Frank Pipitone, Head, Sensor Based Systems Group Navy Center for.
The Free Safety Problem Using Gaze Estimation as a Meaningful Input to a Homing Task Albert Goldfain CSE 668: Animate Vision Principles Final Project Presentation.
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
CH24 in Robotics Handbook Presented by Wen Li Ph.D. student Texas A&M University.
Exchanging Faces in Images SIGGRAPH ’04 Blanz V., Scherbaum K., Vetter T., Seidel HP. Speaker: Alvin Date: 21 July 2004.
Advanced Computer Vision Introduction Goal and objectives To introduce the fundamental problems of computer vision. To introduce the main concepts and.
ECE 7340: Building Intelligent Robots QUALITATIVE NAVIGATION FOR MOBILE ROBOTS Tod S. Levitt Daryl T. Lawton Presented by: Aniket Samant.
Face Detection: a Survey Speaker: Mine-Quan Jing National Chiao Tung University.
Visual Attention More information in visual field than we can process at a given moment Solutions Shifts of Visual Attention related to eye movements Some.
Saccades: Rapid rotation of the eyes to reorient gaze to a new object or region. Reflex saccades Attentional saccades Shifting Gaze - Saccades.
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
Active Visual Observer Integration of Visual Processes for Control of Fixation KTH (Royal Institute of Technology, Stockholm)and Aalborg University C.S.
Non-invasive Techniques for Human Fatigue Monitoring Qiang Ji Dept. of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute
Non-invasive Techniques for Human Fatigue Monitoring Qiang Ji Dept. of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute
Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.
Humanoid Robots Debzani Deb.
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Multimodal Interaction Dr. Mike Spann
Zhengyou Zhang Microsoft Research Digital Object Identifier: /MMUL Publication Year: 2012, Page(s): Professor: Yih-Ran Sheu Student.
Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.
Seeing a Three-Dimensional World
A Method for Hand Gesture Recognition Jaya Shukla Department of Computer Science Shiv Nadar University Gautam Budh Nagar, India Ashutosh Dwivedi.
Cynthia Breazeal Aaron Edsinger Paul Fitzpatrick Brian Scassellati MIT AI Lab Social Constraints on Animate Vision.
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
Generalized Hough Transform
N n Debanga Raj Neog, Anurag Ranjan, João L. Cardoso, Dinesh K. Pai Sensorimotor Systems Lab, Department of Computer Science The University of British.
A Comparative Evaluation of Three Skin Color Detection Approaches Dennis Jensch, Daniel Mohr, Clausthal University Gabriel Zachmann, University of Bremen.
1 Ying-li Tian, Member, IEEE, Takeo Kanade, Fellow, IEEE, and Jeffrey F. Cohn, Member, IEEE Presenter: I-Chung Hung Advisor: Dr. Yen-Ting Chen Date:
Lecture 15 – Social ‘Robots’. Lecture outline This week Selecting interfaces for robots. Personal robotics Chatbots AIML.
National institute of science & technology BLINK DETECTION AND TRACKING OF EYES FOR EYE LOCALIZATION LOPAMUDRA CS BLINK DETECTION AND TRACKING.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #16.
Natural Tasking of Robots Based on Human Interaction Cues Brian Scassellati, Bryan Adams, Aaron Edsinger, Matthew Marjanovic MIT Artificial Intelligence.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
CSE 185 Introduction to Computer Vision Stereo. Taken at the same time or sequential in time stereo vision structure from motion optical flow Multiple.
Expectation-Maximization (EM) Case Studies
Peter Henry1, Michael Krainin1, Evan Herbst1,
The geometry of the system consisting of the hyperbolic mirror and the CCD camera is shown to the right. The points on the mirror surface can be expressed.
The Next Generation of Robots?
Animated Speech Therapist for Individuals with Parkinson Disease Supported by the Coleman Institute for Cognitive Disabilities J. Yan, L. Ramig and R.
Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.
Using Adaptive Tracking To Classify And Monitor Activities In A Site W.E.L. Grimson, C. Stauffer, R. Romano, L. Lee.
DARPA Mobile Autonomous Robot Software BAA99-09 July 1999 Natural Tasking of Robots Based on Human Interaction Cues Cynthia Breazeal Rodney Brooks Brian.
1 Computational Vision CSCI 363, Fall 2012 Lecture 29 Structure from motion, Heading.
WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.
What visual image information is needed for the things we do? How is vision used to acquire information from the world?
Recognition and Expression of Emotions by a Symbiotic Android Head Daniele Mazzei, Abolfazl Zaraki, Nicole Lazzeri and Danilo De Rossi Presentation by:
AN ACTIVE VISION APPROACH TO OBJECT SEGMENTATION – Paul Fitzpatrick – MIT CSAIL.
1 2D TO 3D IMAGE AND VIDEO CONVERSION. INTRODUCTION The goal is to take already existing 2D content, and artificially produce the left and right views.
Best Practice T-Scan5 Version T-Scan 5 vs. TS50-A PropertiesTS50-AT-Scan 5 Range51 – 119mm (stand- off 80mm / total 68mm) 94 – 194mm (stand-off.
Emotion and Sociable Humanoid Robots (Cynthia Breazeal) Yumeng Liao.
MIT Artificial Intelligence Laboratory — Research Directions Intelligent Perceptual Interfaces Trevor Darrell Eric Grimson.
Localization Life in the Atacama 2004 Science & Technology Workshop January 6-7, 2005 Daniel Villa Carnegie Mellon Matthew Deans QSS/NASA Ames.
Signal and Image Processing Lab
Paper – Stephen Se, David Lowe, Jim Little
Journal of Vision. 2012;12(12):18. doi: / Figure Legend:
Head pose estimation without manual initialization
Common Classification Tasks
Vehicle Segmentation and Tracking in the Presence of Occlusions
Fusion, Face, HD Face Matthew Simari | Program Manager, Kinect Team
Institute of Neural Information Processing (Prof. Heiko Neumann •
Human-centered Interfaces
Peeping into the Human World
Presentation transcript:

Paul Fitzpatrick Human – Robot Communication

 Motivation for communication  Human-readable actions  Reading human actions  Conclusions Human – Robot Communication

Motivation  What is communication for? –Transferring information –Coordinating behavior  What is it built from? –Commonality –Perception of action –Protocols

Communication protocols Computer – computer protocols TCP/IP, HTTP, FTP, SMTP, …

Communication protocols Human – human protocols Human – computer protocols Initiating conversation, turn-taking, interrupting, directing attention, … Shell interaction, drag-and-drop, dialog boxes, …

Communication protocols Human – human protocols Human – computer protocols Human – robot protocols Initiating conversation, turn-taking, interrupting, directing attention, … Shell interaction, drag-and-drop, dialog boxes, …

Requirements on robot  Human-oriented perception –Person detection, tracking –Pose estimation –Identity recognition –Expression classification –Speech/prosody recognition –Objects of human interest  Human-readable action –Clear locus of attention –Express engagement –Express confusion, surprise –Speech/prosody generation ENGAGED ACQUIRED Pointing (53,92,12) Fixating (47,98,37) Saying “/o’ver[200] \/there[325]”

Example: attention protocol  Expressing attention  Influencing other’s attention  Reading other’s attention

 Motivation for communication  Human-readable actions  Reading human actions  Conclusions Foveate gaze

Human gaze reflects attention (Taken from C. Graham, “Vision and Visual Perception”)

Types of eye movement Vergence angle Left eye Right eye Ballistic saccade to new target Smooth pursuit and vergence co-operate to track object (Based on Kandel & Schwartz, “Principles of Neural Science”)

Engineering gaze Kismet

Collaborative effort  Cynthia Breazeal  Brian Scassellati  And others  Will describe components I’m responsible for

Engineering gaze

“Cyclopean” cameraStereo pair

Tip-toeing around 3D Wide View camera Narrow view camera Object of interest Field of view Rotate camera New field of view

Example

Influences on attention

 Built in biases

Influences on attention  Built in biases  Behavioral state

Influences on attention  Built in biases  Behavioral state  Persistence slipped……recovered

Directing attention

 Motivation for communication  Human-readable actions  Reading human actions  Conclusions Head pose estimation

Head pose estimation (rigid) * Nomenclature varies Yaw * Pitch * Roll * Translation in X, Y, Z

Head pose literature Horprasert, Yacoob, Davis ’97 McKenna, Gong ’98 Wang, Brandstein ’98 Basu, Essa, Pentland ’96 Harville, Darrell, et al ’99

Head pose: Anthropometrics Horprasert, Yacoob, Davis McKenna, Gong Wang, Brandstein Basu, Essa, Pentland Harville, Darrell, et al

Head pose: Eigenpose Horprasert, Yacoob, Davis McKenna, Gong Wang, Brandstein Basu, Essa, Pentland Harville, Darrell, et al

Head pose: Contours Horprasert, Yacoob, Davis McKenna, Gong Wang, Brandstein Basu, Essa, Pentland Harville, Darrell, et al

Head pose: mesh model Horprasert, Yacoob, Davis McKenna, Gong Wang, Brandstein Basu, Essa, Pentland Harville, Darrell, et al

Head pose: Integration Horprasert, Yacoob, Davis McKenna, Gong Wang, Brandstein Basu, Essa, Pentland Harville, Darrell, et al

My approach  Integrate changes in pose (after Harville et al)  Use mesh model (after Basu et al)  Need automatic initialization –Head detection, tracking, segmentation –Reference orientation –Head shape parameters  Initialization drives design

Head tracking, segmentation  Segment by color histogram, grouped motion  Match against ellipse model (M. Pilu et al)

Mutual gaze as reference point

Tracking pose changes  Choose coordinates to suit tracking  4 of 6 degrees of freedom measurable from monocular image  Independent of shape parameters X translationY translation Translation in depth In-plane rotation

Remaining coordinates  2 degrees of freedom remaining  Choose as surface coordinate on head  Specify where image plane is tangent to head  Isolates effect of errors in parameters Tangent region shifts when head rotates in depth

Surface coordinates  Establish surface coordinate system with mesh

Initializing a surface mesh

Example

Typical results Ground truth due to Sclaroff et al.

Merits  No need for any manual initialization  Capable of running for long periods  Tracking accuracy is insensitive to model  User independent  Real-time

Problems  Greater accuracy possible with manual initialization  Deals poorly with certain classes of head movement (e.g. 360° rotation)  Can’t initialize without occasional mutual regard

 Motivation for communication  Human-readable actions  Reading human actions  Conclusions

Other protocols

 Protocol for negotiating interpersonal distance Comfortable interaction distance Too close – withdrawal response Too far – calling behavior Person draws closer Person backs off Beyond sensor range

Other protocols  Protocol for negotiating interpersonal distance  Protocol for controlling the presentation of objects Comfortable interaction speed Too fast – irritation response Too fast, Too close – threat response

 Protocol for negotiating interpersonal distance  Protocol for controlling the presentation of objects  Protocol for conversational turn-taking Other protocols  Protocol for introducing vocabulary  Protocol for communicating processes Protocols make good modules

Linux Speech recognition Speakers NT speech synthesis affect recognition Linux Speech recognition Face Control Emotion Percept & Motor Drives & Behavior L Tracker Attent. system Dist. to target Motion filter Eye finder Motor ctrl audio speech comms Skin filter Color filter QNX CORBA sockets, CORBA dual-port RAM Cameras Eye, neck, jaw motors Ear, eyebrow, eyelid, lip motors Microphone Track head Recog. pose Track pose

Skin Detector Color Detector Motion Detector Face Detector Wide Frame Grabber Motion Control Daemon Right Frame Grabber Left Frame Grabber Right Foveal Camera Wide Camera Left Foveal Camera Eye-Neck Motors WWWW Attention Wide Tracker Foveal Disparity Smooth Pursuit & Vergence w/ neck comp. VOR Saccade w/ neck comp. Fixed Action Pattern Affective Postural Shifts w/ gaze comp. Arbitor Eye-Head-Neck Control Disparity Ballistic movement Locus of attention Behaviors Motivations , , ... , d  d  2 sss 2 ppp 2 fff 2 vvv Wide Camera 2 Tracked target Salient target Eye finder Left Frame Grabber Distance to target

Other protocols  What about robot – robot protocol?  Basically computer – computer  But physical states may be hard to model  Borrow human – robot protocol for these

Current, future work  Protocols for reference –Know how to point to an object –How to point to an attribute? –Or an action?  Until a better answer comes along: –Communicate task/game that depends on attribute/action –Pull out number of classes, positive and negative examples for supervised learning

FIN