Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras.

Slides:

Advertisements

Similar presentations

Université du Québec École de technologie supérieure Face Recognition in Video Using What- and-Where Fusion Neural Network Mamoudou Barry and Eric Granger.

Advertisements

Bryan Willimon, Steven Hickson, Ian Walker, and Stan Birchfield IROS 2012 Vila Moura, Algarve An Energy Minimization Approach to 3D Non- Rigid Deformable.

INTERACTING WITH SIMULATION ENVIRONMENTS THROUGH THE KINECT Fayez Alazmi Supervisor: Dr. Brett Wilkinson Flinders University Image 1Image 2Image 3 Source.

Jonathan Tompson, Murphy Stein, Yann LeCun, Ken Perlin

1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Cambridge, Massachusetts Pose Estimation in Heavy Clutter using a Multi-Flash Camera Ming-Yu Liu, Oncel Tuzel, Ashok Veeraraghavan, Rama Chellappa, Amit.

Su-A Kim 3 rd June 2014 Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Real-time Articulated Hand Pose Estimation using Semi-supervised.

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

Computer and Robot Vision I

A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.

Object Recognition with Invariant Features n Definition: Identify objects or scenes and determine their pose and model parameters n Applications l Industrial.

UNC Chapel Hill M. C. Lin Reading Assignments Principles of Traditional Animation Applied to 3D Computer Animation, by J. Lasseter, Proc. of ACM SIGGRAPH.

Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.

A Brief Overview of Computer Vision Jinxiang Chai.

AMI GUI Design V1.1 by Kilian Pohl - Reflects changes in AMI MRML Structure - Includes feedback from AMI Workshop in Dec 09.

Shape Recognition and Pose Estimation for Mobile Augmented Reality Author ： N. Hagbi, J. El-Sana, O. Bergig, and M. Billinghurst Date ： Speaker.

Monitoring, Modelling, and Predicting with Real-Time Control Dr Ian Oppermann Director, CSIRO ICT Centre.

MESA LAB Two papers in icfda14 Guimei Zhang MESA LAB MESA (Mechatronics, Embedded Systems and Automation) LAB School of Engineering, University of California,

Semi-automated Coaching for Elderly Collaborative effort between UCBerkeley, OHSU and NorthEastern University.

Bryan Willimon, Steven Hickson, Ian Walker, and Stan Birchfield Clemson University IROS Vilamoura, Portugal An Energy Minimization Approach to 3D.

Vision-based human motion analysis: An overview Computer Vision and Image Understanding(2007)

Human pose recognition from depth image MS Research Cambridge.

Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.

Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.

Based on the success of image extraction/interpretation technology and advances in control theory, more recent research has focused on the use of a monocular.

Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.

A Framework with Behavior-Based Identification and PnP Supporting Architecture for Task Cooperation of Networked Mobile Robots Joo-Hyung Kiml, Yong-Guk.

RGB-D Images and Applications

Outline Introduction Related Work System Overview Methodology Experiment Conclusion and Future Work.

Template-Based Manipulation in Unstructured Environments for Supervised Semi-Autonomous Humanoid Robots Alberto Romay, Stefan Kohlbrecher, David C. Conner,

Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.

V k equals the vector difference between the object and the block across the first and last frames in the image sequence or more formally: Toward Learning.

Under Guidance of Mr. A. S. Jalal Associate Professor Dept. of Computer Engineering and Applications GLA University, Mathura Presented by Dev Drume Agrawal.

CIRP Annals - Manufacturing Technology 60 (2011) 1–4 Augmented assembly technologies based on 3D bare-hand interaction S.K. Ong (2)*, Z.B. Wang Mechanical.

Manipulation in Human Environments

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

REAL-TIME DETECTOR FOR UNUSUAL BEHAVIOR

San Diego May 22, 2013 Giovanni Saponaro Giampiero Salvi

Architecture Concept Documents

Andreas Hermann, Felix Mauch, Sebastian Klemm, Arne Roennau

WP3: Visualization services

A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology

ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION

Recognizing Deformable Shapes

CAPTURING OF MOVEMENT DURING MUSIC PERFORMANCE

Group 18 Team members: Joshua Liu, Jerry Fang

Manipulation in Human Environments

Real-Time Human Pose Recognition in Parts from Single Depth Image

Jörg Stückler, imageMax Schwarz and Sven Behnke*

Reading Assignments Principles of Traditional Animation Applied to 3D Computer Animation, by J. Lasseter, Proc. of ACM SIGGRAPH 1987 Computer Animation:

Video-based human motion recognition using 3D mocap data

Affordance Detection for Task-Speciﬁc Grasping Using Deep Learning

Adversarially Tuned Scene Generation

Vehicle Segmentation and Tracking in the Presence of Occlusions

Identifying Human-Object Interaction in Range and Video Data

Basics of Motion Generation

Robot Operating System (ROS) Framework

Iterative Optimization

Eric Grimson, Chris Stauffer,

Paper Presentation Aryeh Zapinsky

Mixed Reality Server under Robot Operating System

One-shot learning and generation of dexterous grasps for novel objects

Concurrent Graph Exploration with Multiple Robots

Outline Background Motivation Proposed Model Experimental Results

Computer Graphics Lecture 15.

Robot Programming Through Augmented Trajectories in Augmented Reality

Learning complex visual concepts

Point Set Representation for Object Detection and Beyond

Presentation transcript:

Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras Director of CERTH/ITI, Researcher Grade A’ Email: dimitrios.tzovaras@iti.gr

Teaching from Demonstration for Robotic Assembly Tasks Problem Definition Enable a non-expert user to teach a new assembly task to an industrial robot in less than a day no explicit programming required Motivation Even expensive products produced in large volumes are still assembled manually in low wage countries under harsh conditions Approach Our goal is to enable a non-expert user to teach a new assembly task to an industrial robot in less than a day, without the use of conventional robot programming. This is important because it will allow automation for expensive products produced in large volumes that are still assembled manually in low wage countries under harsh conditions. In order to achieve this, we extend the robotic system with advanced perception and cognition abilities. Moreover, we have developed a user-friendly Human Robot Interaction (HRI) interface that allows the human instructor to demonstrate the assembly task. An overview of our approach is illustrated in the presented diagram. Extend the robotic system with advanced perception and cognition abilities Develop a user-friendly Human Robot Interaction (HRI) interface human operator demonstrates a task Overview of the proposed approach

Assembly Key-frame extraction: Automatic extraction Automatic Key-frame identification based on semantic graphs from image sequences1 Employing 3D hand-object tracking results we can automatically extract kinematics and motion information perform more accurate and robust segmentation using 2D rendered images instead of watershed extend the method to 3D data using ellipsoids to fit the object models2 resulting to additional semantic relationships between the objects Implemented and tested in assembly video samples from RGBD data Segmented masks based on 2D rendered images constructed by the models of the tracked objects In order to automatically extract the Key-frame sequence, semantic graphs are generated using segmented images of the demonstrated assembly. Segmentation is performed on synthetic images using the tracking results. An example sequence is illustrated in the presented figure. Key-frame 01 Key-frame 02 Key-frame 03 Key-frame 04 “Learning the semantics of object–action relations by observation.” Int. Journal of Robotics Research 2011, Aksoy et al. 2. “Keyframe extraction with semantic graphs in assembly processes.”, in IEEE Robotics and Automation Letters 2017, Piperagkas et al. 3

Perception: Hand-Object Detection and Tracking in 3D RGBD data are acquired Initial pose estimation from detection Object Detection (6DoF pose) is performed based on sparse auto-encoders for feature extraction and Hough Forests for classification 3D CAD models are employed for both training the object detector and performing hand-object tracking 6 DoF for the models of the assembly parts 42 DoF for the hand models Coarse hand detection of an open configuration is performed Real Data Synthetic Data Hand-Object Tracking implementation using Particle Swarm Optimization (PSO) We have augmented the system with visual sensors acquiring both RGB and depth images. Using sparse auto-encoders, features are automatically extracted by the acquired images and are employed by a Hough forest classifier for estimating the 3D pose of the objects in the scene. Training of the classifiers, is based on synthetic data generated using 3D CAD models of the objects. Assuming an open hand configuration and using the detected hand’s contour, its initial pose is also estimated. The detection results of hands and objects are employed for initializing a Hand-Object Tracking method using Particle Swarm Optimization (PSO). Our approach is based on the hand tracking method presented in these two papers, extending them to perform joint hand-object tracking. Currently, tracking is performed off-line in recorded assembly sequences, requiring about 0.6 sec for each frame. Based on hand tracking approaches in: “Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks”, Jonathan Tompson, Murphy Stein, Yann Lecun and Ken Perlin, SIGGRAPH'14 “Efficient model-based 3d tracking of hand articulations using Kinect”, I. Oikonomidis, N. Kyriazis and A. Argyros, BMVC 2014 Modified optimization for joint hand – object tracking Optimization Time: 0.6 sec per frame 4

Assembly Key-frame Extraction: Definition of Key-frames General information: Scenario id and current step Object(s) id involved in the demonstration phase Relative timestamp Kinematics & Motion information: Object pose coordinates (position & orientation, 6 DOF) Hand pose (42 DOF) Semantic information: User defined corresponding to assembly states, e.g. grasping Automatic system suggestions, e.g. aligned axes Dynamics information: Forces derived from the kinesthetic learning Grasping contact points Object deformation characteristics Key-frame information XML format The generated tracking results are employed for extracting important information from the demonstrated sequence, and for selecting significant frames of the assembly, called Key-frames. An XML format is used for storing the information associated with each Key-frame. Apart from the kinematic information on the assembly parts and the instructor’s hands, semantic information is extracted automatically, whereas the human instructor can also add manually semantic labels selected via a dropdown menu.

HRI interface : Teaching Example Create new Assembly Preview CAD models Detect parts In the illustrated sequence, screenshots of the developed HRI are presented, acquired during the demonstration of a folding assembly of cell-phone parts. After creating a new assembly entry, and loading the corresponding CAD models, the loaded parts are previewed in a virtual environment, using Gazebo and GzWeb. Then the actual parts are placed in front of the camera sensor and are detected by the system. The user’s hand is also detected and the demonstration is initiated and recorded. The Key-frames are extracted and are presented to the user for inspection. The user can add/remove Key-frames and annotates them with semantic information. Semantic annotation of the key-frames Detect hand and record assembly Extract Key-frames

Assembly Program Generation A sequence of Key-frames is used for deriving an assembly program based on the associated semantic information Sequential Function Charts Behavior Trees Out of scope of this presentation Using the extracted Key-frame sequence and the associated semantic information, an assembly program can be generated. Both Sequential Function Charts, and Behavior Trees have been investigated with promissing results. However, a detailed analysis of these approaches is out of scope of the current presentation.

Future Work Test bi-manual assemblies Examine different types of assembly Folding assembly Insertion by deformation assembly Address assemblies with deformable parts Our planned efforts include testing of bi-manual assembly use cases, as well as different assembly types, such as folding assembly, or insertion by deformation assembly. The later case is very challenging, since it addresses assembly of deformable parts.

Thank you!