Vision-Based Interactive Systems Martin Jagersand c610
Applications for vision in User Interfaces Interaction with machines and robots – Service robotics – Surgical robots – Emergency response Interaction with software – A store or museum information kiosk
Service robots Mobile manipulators, semi-autonomous DIST TU BerlinKAIST
TORSO with 2 WAMs
Service tasks This is completely hardwired! Found no real task on WWW
But Maybe first applications in tasks humans can’t do?
Why is humanlike robotics so hard to achieve? See human task: – Tracking motion, seeing gestures Understand: – Motion understanding: Translate to correct reference frame – High level task understanding? Do: – Vision based control
Types of robotic systems Auton omy Generality Supervisory control Tele-assistance Programming by demonstration Preprogrammed systems
Interaction styles If A then end Conventional : Low bandwidth interaction Partial or indirect system state displayed User works from internal mental model
Interaction styles Direct Manipulation Direct Manipulation: High bandwidth interactionHigh bandwidth interaction Interact directly and intuitively with objects (affordance)Interact directly and intuitively with objects (affordance) See system state (visibility)See system state (visibility) (Reversible actions)(Reversible actions)
Examples of Direct Manipulation Drawing programs e.g. Mac Paint Video games, flight simulator Robot/machine teaching by showing Tele-assistance Spreadsheet programs Some window system desktops But can you always see effects (visibility)?
xfig drawing program Icons afford use Results visible Direct spatial action- result mapping line([10, 20],[30, 85]); patch([35, 22],[15, 35], C); % C complex structure text(70,30,'Kalle'); % Potentially add font, size, etc matlab drawing:
Why direct manipulation? Recognition quicker than recall. Human uses “the world” as memory/model Human skilled at interacting spatially How quick is direct? Subsecond! Experiments show human performance decreased at 0.4s delay. Subsecond! Experiments show human performance decreased at 0.4s delay.
Vision and Touch based UI Typical UI today: Symbolic, 1D (slider), 2D But human skilled at 3D, 6D, n-D spatial interaction with the world Supports Direct Manip!
Seeing a task Tracking movement – See directions, movements in tasks Recognizing gestures – Static hand and body postures Combination: Spatio-temporal gestures
Tracking movement Tracking the human is hard: – Appearance varies – Large search space, 60 parameters – Unobservable: Joint angles have to be inffered from limb positions, clothing etc. – Motion is non-linear. – Difficult to track 3D from 2D image plane info – Self occlusion of limbs
Trick 1: Physical model Reduce number of DOF’s by coupled model of articulated motion (Hedvig, Mike)
Trick 2: Use uniqueness of skin color Can be tracked at real time
Gestures: Identifying gestures is hard – Hard to segment hand parts – Self occlusion – Variability in viewpoints
Trick 3: Scale space Define hand gesture in course to fine terms
Trick 4: Variability filters
Programming by Demonstration From assembly relations From temporal assembly sequence – Segmenting manipulation sequence into parts (subtasks) is hard Using a gesture language
Tele-assistance: Gestures + context
Robust manipulations
Conclusions Most aspects of Robot see – robot do are hard Conventional methods are – Incapable of seeing task – Incapable of understanding what’s going on – Incapable of performing human manipulation tasks Uncalibrated methods are more promising