From Vision to Grasping: Adapting Visual Networks

Slides:

Advertisements

Similar presentations

Scalable Multi-Label Annotation Jia Deng Olga Russakovsky Jonathan Krause, Michael Bernstein Alexander Berg Li Fei-Fei.

Advertisements

Classification spotlights

3 Small Comments Alex Berg Stony Brook University I work on recognition: features – action recognition – alignment – detection – attributes – hierarchical.

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Large-Scale Object Recognition with Weak Supervision

Learning to grasp objects with multiple contact points Quoc V. Le, David Kamm, Arda Kara, Andrew Y. Ng.

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Spatial Pyramid Pooling in Deep Convolutional

Multiclass object recognition

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Kuan-Chuan Peng Tsuhan Chen

An Example of Course Project Face Identification.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.

Deep Convolutional Nets

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Robotic Grasping of Novel Objects using Vision Ashutosh Saxena, Justin Driemeyer, Andrew Y. Ng Center Computer Science Department Stanford University,

Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.

Philipp Gysel ECE Department University of California, Davis

Spatial Localization and Detection

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Face Recognition based on 2D-PCA and CNN

Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences Andreas Veit ∗ 1, Balazs Kovacs ∗ 1, Sean Bell 1, Julian McAuley 3, Kavita Bala.

Deeply-Recursive Convolutional Network for Image Super-Resolution

Hybrid Deep Learning for Reflectance Confocal Microscopy Skin Images

CS 4501: Introduction to Computer Vision Computer Vision + Natural Language Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy / Justin Johnson.

The Relationship between Deep Learning and Brain Function

Training convolutional networks

Data Mining, Neural Network and Genetic Programming

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Week III: Deep Tracking

Learning Mid-Level Features For Recognition

Article Review Todd Hricik.

Lecture 24: Convolutional neural networks

YOLO9000:Better, Faster, Stronger

Part-Based Room Categorization for Household Service Robots

Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.

Lecture 5 Smaller Network: CNN

Training Techniques for Deep Neural Networks

Mean Euclidean Distance Error (mm)

Adri`a Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba

Finding Clusters within a Class to Improve Classification Accuracy

Bird-species Recognition Using Convolutional Neural Network

Computer Vision James Hays

Robot Operating System (ROS) Framework

Introduction to Neural Networks

Image Classification.

Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.

Convolutional Neural Networks for Visual Tracking

Vessel Extraction in X-Ray Angiograms Using Deep Learning

Pose Estimation for non-cooperative Spacecraft Rendevous using CNN

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

Neural network training

Lecture: Deep Convolutional Neural Networks

YOLO-LITE: A Real-Time Object Detection Web Implementation

Outline Background Motivation Proposed Model Experimental Results

John H.L. Hansen & Taufiq Al Babba Hasan

Convolutional Neural Networks

Heterogeneous convolutional neural networks for visual recognition

Reuben Feinman Research advised by Brenden Lake

Visual Manipulation Relationship Network for Autonomous Robotics

Natalie Lang Tomer Malach

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

Deep learning: Recurrent Neural Networks CV192

Presentation transcript:

From Vision to Grasping: Adapting Visual Networks 1 From Vision to Grasping: Adapting Visual Networks Rebecca Allday Supervised by: Richard Bowden Simon Hadfield

Planar grasping Baxter robot by Rethink Robotics Problem Planar robotic grasping - an essential yet challenging skill for robots Convolutional Neural Networks (CNNs) revolutionised computer vision Some CNN structures may not be appropriate for grasping Planar grasping Baxter robot by Rethink Robotics

Machine Learning in Grasping Varying modalities of data and methods: Learning from hand-engineered features using 2D images (Saxena et al., IJRR 2008) Using an Support Vector Machine to determine grasp quality from superquadratics approximating objects from 3D data (Pelossof et al., ICRA 2004) Using deep learning approaches to detect optimal grasps from multimodal data (Lenz et al., IJRR 2015) Examples of hand engineered filters Example of grasp fitted to a superquadratic Examples of learned 3D features

CNNs in Grasping Pinto and Gupta (ICRA 2016): CNNs with RGB data Redmon and Angelova (ICRA 2015): CNNs with RGB-D data Both use AlexNet – a successful vision network. Neither adapted the main structure of AlexNet for grasping. Example of image patch and learned grasp from Pinto and Gupta (ICRA 2016)

Motivation Reasons to adapt visual networks for grasping include: Much smaller data sets in robotics More constraints on model size due to robotics hardware More constraints on run times since robotics systems need to be real time Vision data sets (Russakovsky 2014) vs grasping data sets (Pinto and Gupta ICRA 2016)

Convolutional Neural Networks (CNNs) Typical Vision CNN

Effect of max pooling on spatial information Approach Predict grasp angle given position of object Discretize output to N likelihoods (Pinto and Gupta, ICRA 2016) Translation Invariance Max pooling layers increase translation invariance Avoid spatial accumulation by removing the max pooling layers Effect of max pooling on spatial information Reduced Feature Complexity Vision CNNs classify hundreds of categories Simpler networks with fewer convolution layers can be better constrained by the problem

Networks

Results Evaluation of Translation Invariance AlexNet: increased accuracy from 75.72% to 78.60% SqueezeNet: improved accuracy, extended training time

Results Evaluation of Reduced Feature Complexity SqueezeNet: decreased model size by 69%, maintained 71.63% accuracy AlexNet: improved accuracy with only two convolution layers

Results Cross-Learning Between Angles Training SqueezeNet with all 18 output layers increased accuracy from 74.03% to 86.92% Decreased full model size by 24% and achieved 85.01% accuracy by removing final fire module from SqueezeNet and training on all angles Training of SqueezeNet with single angle (left) and all 18 angles (right)

Conclusion It is vital for researchers to consider adapting networks for robotics The importance of the exact position in grasping means decreasing translation invariance improves accuracy Reducing the number of parameters can simultaneously improve accuracy and reduce model size

Future Work Early PhD work accepted into TAROS 2017 Working with other datasets Exploring the use of Reinforcement Learning (RL) with CNNS

Thank you

References A. Saxena, J. Driemeyer and A. Y. Ng. Robotic Grasping of Novel Objects using Vision. In The International Journal of Robotics Research, vol 27 pages 157-173, Feb 2008 R. Pelossof, A. Miller, P. Allen and T. Jebara. An SVM Learning Approach to Robotic Grasping. In 2004 IEEE International Conference on Robotics and Automation (ICRA), April 2004 I. Lenz, H. Lee and A. Saxena. Deep learning for detecting robotic grasps. In The International Journal of Robotics Research, vol 35 pages 705-724, April 2015 L. Pinto and A. Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016. J. Redmon and A. Angelova. Real-Time Grasp Detection Using Convolutional Neural Networks. In 2015 IEEE International Conference on Robotics and Automation (ICRA), May 2015 Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. International Journal for Computer Vison (IJCV), April 2015