Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras.

Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras Director of CERTH/ITI, Researcher Grade A’

Teaching from Demonstration for Robotic Assembly Tasks
Problem Definition Enable a non-expert user to teach a new assembly task to an industrial robot in less than a day no explicit programming required Motivation Even expensive products produced in large volumes are still assembled manually in low wage countries under harsh conditions Approach Our goal is to enable a non-expert user to teach a new assembly task to an industrial robot in less than a day, without the use of conventional robot programming. This is important because it will allow automation for expensive products produced in large volumes that are still assembled manually in low wage countries under harsh conditions. In order to achieve this, we extend the robotic system with advanced perception and cognition abilities. Moreover, we have developed a user-friendly Human Robot Interaction (HRI) interface that allows the human instructor to demonstrate the assembly task. An overview of our approach is illustrated in the presented diagram. Extend the robotic system with advanced perception and cognition abilities Develop a user-friendly Human Robot Interaction (HRI) interface human operator demonstrates a task Overview of the proposed approach

Assembly Key-frame extraction: Automatic extraction
Automatic Key-frame identification based on semantic graphs from image sequences1 Employing 3D hand-object tracking results we can automatically extract kinematics and motion information perform more accurate and robust segmentation using 2D rendered images instead of watershed extend the method to 3D data using ellipsoids to fit the object models2 resulting to additional semantic relationships between the objects Implemented and tested in assembly video samples from RGBD data Segmented masks based on 2D rendered images constructed by the models of the tracked objects In order to automatically extract the Key-frame sequence, semantic graphs are generated using segmented images of the demonstrated assembly. Segmentation is performed on synthetic images using the tracking results. An example sequence is illustrated in the presented figure. Key-frame 01 Key-frame 02 Key-frame 03 Key-frame 04 “Learning the semantics of object–action relations by observation.” Int. Journal of Robotics Research 2011, Aksoy et al. “Keyframe extraction with semantic graphs in assembly processes.”, in IEEE Robotics and Automation Letters 2017, Piperagkas et al. 3

Perception: Hand-Object Detection and Tracking in 3D
RGBD data are acquired Initial pose estimation from detection Object Detection (6DoF pose) is performed based on sparse auto-encoders for feature extraction and Hough Forests for classification 3D CAD models are employed for both training the object detector and performing hand-object tracking 6 DoF for the models of the assembly parts 42 DoF for the hand models Coarse hand detection of an open configuration is performed Real Data Synthetic Data Hand-Object Tracking implementation using Particle Swarm Optimization (PSO) We have augmented the system with visual sensors acquiring both RGB and depth images. Using sparse auto-encoders, features are automatically extracted by the acquired images and are employed by a Hough forest classifier for estimating the 3D pose of the objects in the scene. Training of the classifiers, is based on synthetic data generated using 3D CAD models of the objects. Assuming an open hand configuration and using the detected hand’s contour, its initial pose is also estimated. The detection results of hands and objects are employed for initializing a Hand-Object Tracking method using Particle Swarm Optimization (PSO). Our approach is based on the hand tracking method presented in these two papers, extending them to perform joint hand-object tracking. Currently, tracking is performed off-line in recorded assembly sequences, requiring about 0.6 sec for each frame. Based on hand tracking approaches in: “Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks”, Jonathan Tompson, Murphy Stein, Yann Lecun and Ken Perlin, SIGGRAPH'14 “Efficient model-based 3d tracking of hand articulations using Kinect”, I. Oikonomidis, N. Kyriazis and A. Argyros, BMVC 2014 Modified optimization for joint hand – object tracking Optimization Time: 0.6 sec per frame 4

Assembly Key-frame Extraction: Definition of Key-frames
General information: Scenario id and current step Object(s) id involved in the demonstration phase Relative timestamp Kinematics & Motion information: Object pose coordinates (position & orientation, 6 DOF) Hand pose (42 DOF) Semantic information: User defined corresponding to assembly states, e.g. grasping Automatic system suggestions, e.g. aligned axes Dynamics information: Forces derived from the kinesthetic learning Grasping contact points Object deformation characteristics Key-frame information XML format The generated tracking results are employed for extracting important information from the demonstrated sequence, and for selecting significant frames of the assembly, called Key-frames. An XML format is used for storing the information associated with each Key-frame. Apart from the kinematic information on the assembly parts and the instructor’s hands, semantic information is extracted automatically, whereas the human instructor can also add manually semantic labels selected via a dropdown menu.

HRI interface : Teaching Example
Create new Assembly Preview CAD models Detect parts In the illustrated sequence, screenshots of the developed HRI are presented, acquired during the demonstration of a folding assembly of cell-phone parts. After creating a new assembly entry, and loading the corresponding CAD models, the loaded parts are previewed in a virtual environment, using Gazebo and GzWeb. Then the actual parts are placed in front of the camera sensor and are detected by the system. The user’s hand is also detected and the demonstration is initiated and recorded. The Key-frames are extracted and are presented to the user for inspection. The user can add/remove Key-frames and annotates them with semantic information. Semantic annotation of the key-frames Detect hand and record assembly Extract Key-frames

Assembly Program Generation
A sequence of Key-frames is used for deriving an assembly program based on the associated semantic information Sequential Function Charts Behavior Trees Out of scope of this presentation Using the extracted Key-frame sequence and the associated semantic information, an assembly program can be generated. Both Sequential Function Charts, and Behavior Trees have been investigated with promissing results. However, a detailed analysis of these approaches is out of scope of the current presentation.

Future Work Test bi-manual assemblies
Examine different types of assembly Folding assembly Insertion by deformation assembly Address assemblies with deformable parts Our planned efforts include testing of bi-manual assembly use cases, as well as different assembly types, such as folding assembly, or insertion by deformation assembly. The later case is very challenging, since it addresses assembly of deformable parts.

Thank you!

Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras.

Similar presentations

Presentation on theme: "Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras.

Similar presentations

Presentation on theme: "Developing systems with advanced perception, cognition, and interaction capabilities for learning a robotic assembly in one day Dr. Dimitrios Tzovaras."— Presentation transcript:

Similar presentations

About project

Feedback