Download presentation
Presentation is loading. Please wait.
1
Presentation by Ryan Brand
Combining Deep Learning for Visuomotor Coordination with Object Identification to Realize a High-level Interface for Robot Object-picking Manfred Eppe1 and Matthias Kerzel1 and Sascha Griffiths1 and Hwei Geok Ng1 and Stefan Wermter1 1Knowledge Technology, Department of Informatics, University of Hamburg, Germany {feppe, kerzel, griffiths, 5ng, informatik.uni-hamburg.de Presentation by Ryan Brand
2
How can robots identify and interact with objects in complex environments?
3
T-800
4
How do humans do it? Three steps:
Identify the object of interest Focus on it (plan motion) Execute the motion People focus on the task more than the details Good enough parsing hypothesis “How many animals of each kind did Moses take on the Ark?” Inattentional blindness People performing counting task often miss other salient features of video
5
General Approach Cognitively motivated end-to-end integration of object identification with object grasping Implemented using deep convolutional neural networks and an intermediate “attention focus mechanism” Allows for the identification and manipulation of objects in environments where multiple objects are present
6
Robotic Platform: NICO
NICO: Neuro-Inspired COmpanion Designed to have human-like sensing and motor capabilities For use in human/robot interaction and neurocognitive models Arms with six degrees of freedom Three motors in shoulder area similar to ball joint One motor each for elbow and wrist Three fingered hands with a tendon mechanism Head can tilt and yaw to adjust field of view for two cameras embedded in head Fully articulated legs
7
Deep Learning Framework
Three Step Procedure: Faster R-CNN for object property detection and data annotation -> single network trained on object class, shape, and color “Attention focus mechanism” for object identification and attentional focus Visuomotor network
8
Faster R-CNN (1) Training data: boxes drawn around objects in first frame and annotated with {class, shape, color} (can be adjusted during course of grasp) Fed as input to shared convolutional layers of R-CNN (Zeiler and Fergus model) Regional Proposal Network (RPN) slides over feature map from last shared convolutional layer RPN processes n by n spatial windows of the input convolutional feature map
9
Faster R-CNN (2) RPN maps window to a lower-dimensional feature (256-d vector) Fed into two sibling fully connected layers—a box-regression layer (reg) and a box-classification layer (cls) Fully-connected layers are shared across all spatial locations k proposals parameterized relative to k reference boxes (anchors) centered at the sliding window in question E.g. 3 scales * 3 aspect ratios = 9 anchors per position W by H feature map of will have WHk total anchors Reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal
10
Object Identification
Clustering using affinity propagation algorithm generates single bounding box per object Message passing between data points to find “exemplar”, does not require k to be stipulated The "responsibility" matrix R has values r(i, k) that quantify how well-suited xk is to serve as the exemplar for xi, relative to other candidate exemplars for xi. The "availability" matrix A contains values a(i, k) that represent how "appropriate" it would be for xi to pick xk as its exemplar, taking into account other points' preference for xk as an exemplar. Final object is selected with highest total score summed over desired characteristics and input to visuomotor network
11
Attention Focus “After the object is identified, all other objects are removed from the robot’s visual input by computing the average background color in the image an by flood-filling everything around the bounding box for the selected objects with that color. The result is an image that only shows the identified object.”
12
Visuomotor Network NICO trained in a semiautonomous self-learning cycle by letting the robot repeatedly place an object at random positions with minimal human assistance to generate training samples
13
Results Video: Visuomotor network alone generalizes well, grasp performance does not seem to be as dependent on number of training samples as it is on object shape Overall grasp success of 76.4% FRCNN 100% successful on identifying correct object Significantly lower grasp success rates after applying object identification and attention focus Overall success of 46% Visuomotor network appears to be overfitting for hard-to-grasp objects
14
Conclusions Proof of concept for high-level object picking interface
Object identification and and grasping can be integrated into a joint system where detection architecture manipulates input to visuomotor control architecture Applied image manipulation by attention focus leads to less precise motor behavior What is the effect of object context on grasp performance? Future work will attempt to integrate the two networks more closely Fine-tune visuomotor network to deal with modified input Include other sensory modalities such as haptic feedback Use framework to provide high-level abstraction layer and interface for symbolic reasoning and action planning methods
15
References [1] Eppe, M., Kerzel, M., Griffiths, S., Ng, H. G., & Wermter, S. (2017). Combining deep learning for visuomotor coordination with object identification to realize a high-level interface for robot object-picking IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids). doi: /humanoids [2] Kerzel, M., & Wermter, S. (2017). Neural End-to-End Self-learning of Visuomotor Skills by Environment Interaction. Artificial Neural Networks and Machine Learning – ICANN 2017 Lecture Notes in Computer Science, doi: / _4 [3] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence,39(6), doi: /tpami [4]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.