Presentation by Ryan Brand

Slides:



Advertisements
Similar presentations
University of Minho School of Engineering Centre ALGORITMI Uma Escola a Reinventar o Futuro – Semana da Escola de Engenharia - 24 a 27 de Outubro de 2011.
Advertisements

AI Lab Weekly Seminar By: Buluç Çelik.
Spatial Pyramid Pooling in Deep Convolutional
Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Cognitive Robots © 2014, SNU CSE Biointelligence Lab.,
Traffic Sign Recognition Using Artificial Neural Network Radi Bekker
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Information Fusion in Continuous Assurance Johan Perols University of San Diego Uday Murthy University of South Florida UWCISA Symposium October 2, 2009.
Knowledge Systems Lab JN 9/10/2002 Computer Vision: Gesture Recognition from Images Joshua R. New Knowledge Systems Laboratory Jacksonville State University.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Towards Cognitive Robotics Biointelligence Laboratory School of Computer Science and Engineering Seoul National University Christian.
Vrobotics I. DeSouza, I. Jookhun, R. Mete, J. Timbreza, Z. Hossain Group 3 “Helping people reach further”
Efficient Visual Object Tracking with Online Nearest Neighbor Classifier Many slides adapt from Steve Gu.
Artificial Intelligence, Expert Systems, and Neural Networks Group 10 Cameron Kinard Leaundre Zeno Heath Carley Megan Wiedmaier.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Ali Ghadirzadeh, Atsuto Maki, Mårten Björkman Sept 28- Oct Hamburg Germany Presented by Jen-Fang Chang 1.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Pining for Data II: The Empirical Results Strike Back
Recent developments in object detection
Big data classification using neural network
Convolutional Sequence to Sequence Learning
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Faster R-CNN – Concepts
Convolutional Neural Network
The Relationship between Deep Learning and Brain Function
Deep Learning Amin Sobhani.
Chapter 11: Artificial Intelligence
Data Mining, Neural Network and Genetic Programming
Intro to Machine Learning
Rotational Rectification Network for Robust Pedestrian Detection
Compositional Human Pose Regression
GESTURE CONTROLLED ROBOTIC ARM
Mean Euclidean Distance Error (mm)
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
Computer Vision James Hays
Introduction to Neural Networks
Chapter 12 Advanced Intelligent Systems
Lecture 22 Clustering (3).
CSC 578 Neural Networks and Deep Learning
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Object Detection + Deep Learning
Introduction of MATRIX CAPSULES WITH EM ROUTING
network of simple neuron-like computing elements
Creating Data Representations
Capabilities of Threshold Neurons
Papers 15/08.
Outline Background Motivation Proposed Model Experimental Results
RCNN, Fast-RCNN, Faster-RCNN
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
Word embeddings (continued)
Computer Vision Lecture 19: Object Recognition III
Heterogeneous convolutional neural networks for visual recognition
Attention for translation
Unsupervised Perceptual Rewards For Imitation Learning
Human-object interaction
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Neural Machine Translation using CNN
Learning and Memorization
Semantic Segmentation
Object Detection Implementations
Report 2 Brandon Silva.
CSC 578 Neural Networks and Deep Learning
Week 7 Presentation Ngoc Ta Aidean Sharghi
Random Neural Network Texture Model
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Presentation by Ryan Brand Combining Deep Learning for Visuomotor Coordination with Object Identification to Realize a High-level Interface for Robot Object-picking Manfred Eppe1 and Matthias Kerzel1 and Sascha Griffiths1 and Hwei Geok Ng1 and Stefan Wermter1 1Knowledge Technology, Department of Informatics, University of Hamburg, Germany {feppe, kerzel, griffiths, 5ng, wermterg} @ informatik.uni-hamburg.de Presentation by Ryan Brand

How can robots identify and interact with objects in complex environments?

T-800

How do humans do it? Three steps: Identify the object of interest Focus on it (plan motion) Execute the motion People focus on the task more than the details Good enough parsing hypothesis “How many animals of each kind did Moses take on the Ark?” Inattentional blindness People performing counting task often miss other salient features of video

General Approach Cognitively motivated end-to-end integration of object identification with object grasping Implemented using deep convolutional neural networks and an intermediate “attention focus mechanism” Allows for the identification and manipulation of objects in environments where multiple objects are present

Robotic Platform: NICO NICO: Neuro-Inspired COmpanion Designed to have human-like sensing and motor capabilities For use in human/robot interaction and neurocognitive models Arms with six degrees of freedom Three motors in shoulder area similar to ball joint One motor each for elbow and wrist Three fingered hands with a tendon mechanism Head can tilt and yaw to adjust field of view for two cameras embedded in head Fully articulated legs https://www.youtube.com/watch?v=KlVRcLjjRV8&t=2s

Deep Learning Framework Three Step Procedure: Faster R-CNN for object property detection and data annotation -> single network trained on object class, shape, and color “Attention focus mechanism” for object identification and attentional focus Visuomotor network

Faster R-CNN (1) Training data: boxes drawn around objects in first frame and annotated with {class, shape, color} (can be adjusted during course of grasp) Fed as input to shared convolutional layers of R-CNN (Zeiler and Fergus model) Regional Proposal Network (RPN) slides over feature map from last shared convolutional layer RPN processes n by n spatial windows of the input convolutional feature map

Faster R-CNN (2) RPN maps window to a lower-dimensional feature (256-d vector) Fed into two sibling fully connected layers—a box-regression layer (reg) and a box-classification layer (cls) Fully-connected layers are shared across all spatial locations k proposals parameterized relative to k reference boxes (anchors) centered at the sliding window in question E.g. 3 scales * 3 aspect ratios = 9 anchors per position W by H feature map of will have WHk total anchors Reg layer has 4k outputs encoding the coordinates of k boxes, and the cls layer outputs 2k scores that estimate probability of object or not object for each proposal

Object Identification Clustering using affinity propagation algorithm generates single bounding box per object Message passing between data points to find “exemplar”, does not require k to be stipulated The "responsibility" matrix R has values r(i, k) that quantify how well-suited xk is to serve as the exemplar for xi, relative to other candidate exemplars for xi. The "availability" matrix A contains values a(i, k) that represent how "appropriate" it would be for xi to pick xk as its exemplar, taking into account other points' preference for xk as an exemplar. Final object is selected with highest total score summed over desired characteristics and input to visuomotor network

Attention Focus “After the object is identified, all other objects are removed from the robot’s visual input by computing the average background color in the image an by flood-filling everything around the bounding box for the selected objects with that color. The result is an image that only shows the identified object.”

Visuomotor Network NICO trained in a semiautonomous self-learning cycle by letting the robot repeatedly place an object at random positions with minimal human assistance to generate training samples

Results Video: https://www.dropbox.com/s/cqrin9aoy53cdzt/0044.VI.mp4?dl=0 Visuomotor network alone generalizes well, grasp performance does not seem to be as dependent on number of training samples as it is on object shape Overall grasp success of 76.4% FRCNN 100% successful on identifying correct object Significantly lower grasp success rates after applying object identification and attention focus Overall success of 46% Visuomotor network appears to be overfitting for hard-to-grasp objects

Conclusions Proof of concept for high-level object picking interface Object identification and and grasping can be integrated into a joint system where detection architecture manipulates input to visuomotor control architecture Applied image manipulation by attention focus leads to less precise motor behavior What is the effect of object context on grasp performance? Future work will attempt to integrate the two networks more closely Fine-tune visuomotor network to deal with modified input Include other sensory modalities such as haptic feedback Use framework to provide high-level abstraction layer and interface for symbolic reasoning and action planning methods

References [1] Eppe, M., Kerzel, M., Griffiths, S., Ng, H. G., & Wermter, S. (2017). Combining deep learning for visuomotor coordination with object identification to realize a high-level interface for robot object-picking. 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids). doi:10.1109/humanoids.2017.8246935 [2] Kerzel, M., & Wermter, S. (2017). Neural End-to-End Self-learning of Visuomotor Skills by Environment Interaction. Artificial Neural Networks and Machine Learning – ICANN 2017 Lecture Notes in Computer Science,27-34. doi:10.1007/978-3-319-68600-4_4 [3] Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence,39(6), 1137-1149. doi:10.1109/tpami.2016.2577031 [4] https://en.wikipedia.org/wiki/Affinity_propagation