From Vision to Grasping: Adapting Visual Networks

From Vision to Grasping: Adapting Visual Networks
1 From Vision to Grasping: Adapting Visual Networks Rebecca Allday Supervised by: Richard Bowden Simon Hadfield

Planar grasping Baxter robot by Rethink Robotics
Problem Planar robotic grasping - an essential yet challenging skill for robots Convolutional Neural Networks (CNNs) revolutionised computer vision Some CNN structures may not be appropriate for grasping Planar grasping Baxter robot by Rethink Robotics

Machine Learning in Grasping
Varying modalities of data and methods: Learning from hand-engineered features using 2D images (Saxena et al., IJRR 2008) Using an Support Vector Machine to determine grasp quality from superquadratics approximating objects from 3D data (Pelossof et al., ICRA 2004) Using deep learning approaches to detect optimal grasps from multimodal data (Lenz et al., IJRR 2015) Examples of hand engineered filters Example of grasp fitted to a superquadratic Examples of learned 3D features

CNNs in Grasping Pinto and Gupta (ICRA 2016): CNNs with RGB data
Redmon and Angelova (ICRA 2015): CNNs with RGB-D data Both use AlexNet – a successful vision network. Neither adapted the main structure of AlexNet for grasping. Example of image patch and learned grasp from Pinto and Gupta (ICRA 2016)

Motivation Reasons to adapt visual networks for grasping include:
Much smaller data sets in robotics More constraints on model size due to robotics hardware More constraints on run times since robotics systems need to be real time Vision data sets (Russakovsky 2014) vs grasping data sets (Pinto and Gupta ICRA 2016)

Convolutional Neural Networks (CNNs)
Typical Vision CNN

Effect of max pooling on spatial information
Approach Predict grasp angle given position of object Discretize output to N likelihoods (Pinto and Gupta, ICRA 2016) Translation Invariance Max pooling layers increase translation invariance Avoid spatial accumulation by removing the max pooling layers Effect of max pooling on spatial information Reduced Feature Complexity Vision CNNs classify hundreds of categories Simpler networks with fewer convolution layers can be better constrained by the problem

Networks

Results Evaluation of Translation Invariance
AlexNet: increased accuracy from 75.72% to 78.60% SqueezeNet: improved accuracy, extended training time

Results Evaluation of Reduced Feature Complexity
SqueezeNet: decreased model size by 69%, maintained 71.63% accuracy AlexNet: improved accuracy with only two convolution layers

Results Cross-Learning Between Angles
Training SqueezeNet with all 18 output layers increased accuracy from 74.03% to 86.92% Decreased full model size by 24% and achieved 85.01% accuracy by removing final fire module from SqueezeNet and training on all angles Training of SqueezeNet with single angle (left) and all 18 angles (right)

Conclusion It is vital for researchers to consider adapting networks for robotics The importance of the exact position in grasping means decreasing translation invariance improves accuracy Reducing the number of parameters can simultaneously improve accuracy and reduce model size

Future Work Early PhD work accepted into TAROS 2017
Working with other datasets Exploring the use of Reinforcement Learning (RL) with CNNS

Thank you

References A. Saxena, J. Driemeyer and A. Y. Ng. Robotic Grasping of Novel Objects using Vision. In The International Journal of Robotics Research, vol 27 pages , Feb 2008 R. Pelossof, A. Miller, P. Allen and T. Jebara. An SVM Learning Approach to Robotic Grasping. In 2004 IEEE International Conference on Robotics and Automation (ICRA), April 2004 I. Lenz, H. Lee and A. Saxena. Deep learning for detecting robotic grasps. In The International Journal of Robotics Research, vol 35 pages , April 2015 L. Pinto and A. Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In 2016 IEEE International Conference on Robotics and Automation (ICRA), May 2016. J. Redmon and A. Angelova. Real-Time Grasp Detection Using Convolutional Neural Networks. In 2015 IEEE International Conference on Robotics and Automation (ICRA), May 2015 Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. International Journal for Computer Vison (IJCV), April 2015

From Vision to Grasping: Adapting Visual Networks

Similar presentations

Presentation on theme: "From Vision to Grasping: Adapting Visual Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Vision to Grasping: Adapting Visual Networks

Similar presentations

Presentation on theme: "From Vision to Grasping: Adapting Visual Networks"— Presentation transcript:

Similar presentations

About project

Feedback