(boy, that’s a long title)

Slides:



Advertisements
Similar presentations
CSC321: Neural Networks Lecture 3: Perceptrons
Advertisements

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
October 14, 2010Neural Networks Lecture 12: Backpropagation Examples 1 Example I: Predicting the Weather We decide (or experimentally determine) to use.
Autoencoders Mostafa Heidarpour
November 24, 2009Introduction to Cognitive Science Lecture 21: Self-Organizing Maps 1 Self-Organizing Maps (Kohonen Maps) In the BPN, we used supervised.
Radial Basis Function Networks
November 25, 2014Computer Vision Lecture 20: Object Recognition IV 1 Creating Data Representations The problem with some data representations is that the.
West Virginia University
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
NEURAL NETWORKS FOR DATA MINING
1 Research Question  Can a vision-based mobile robot  with limited computation and memory,  and rapidly varying camera positions,  operate autonomously.
Joseph Xu Soar Workshop Learning Modal Continuous Models.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
University of Pennsylvania 1 GRASP Control of Multiple Autonomous Robot Systems Vijay Kumar Camillo Taylor Aveek Das Guilherme Pereira John Spletzer GRASP.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Introduction to Machine Learning, its potential usage in network area,
Big data classification using neural network
Convolutional Sequence to Sequence Learning
Reinforcement Learning
Unsupervised Learning of Video Representations using LSTMs
OPERATING SYSTEMS CS 3502 Fall 2017
Chapter 7. Classification and Prediction
Convolutional Neural Network
CSC2535: Computation in Neural Networks Lecture 11 Extracting coherent properties by maximizing mutual information across space or time Geoffrey Hinton.
Deep Feedforward Networks
Deep Learning Amin Sobhani.
Automatic Lung Cancer Diagnosis from CT Scans (Week 2)
Automation as the Subject of Mechanical Engineer’s interest
System Design Ashima Wadhwa.
Adversarial Learning for Neural Dialogue Generation
Chapter 11: Usability © Len Bass, Paul Clements, Rick Kazman, distributed under Creative Commons Attribution License.
Tracking Objects with Dynamics
Spring Courses CSCI 5922 – Probabilistic Models (Mozer) CSCI Mind Reading Machines (Sidney D’Mello) CSCI 7000 – Human Centered Machine Learning.
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Deep reinforcement learning
Intelligent Information System Lab
Machine Learning Basics
Supervised Training of Deep Networks
Intelligent Agents Chapter 2.
Deep Belief Networks Psychology 209 February 22, 2013.
Adversarially Tuned Scene Generation
"Playing Atari with deep reinforcement learning."
A critical review of RNN for sequence learning Zachary C
PixelGAN Autoencoders
A First Look at Music Composition using LSTM Recurrent Neural Networks
Neural Networks and Their Application in the Fields of Coporate Finance By Eric Séverin Hanna Viinikainen.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Word Embedding Word2Vec.
Creating Data Representations
Papers 15/08.
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Autoencoders hi shea autoencoders Sys-AI.
Machine learning overview
Neural networks (3) Regularization Autoencoder
Advances in Deep Audio and Audio-Visual Processing
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Word embeddings (continued)
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Attention for translation
Unsupervised Perceptual Rewards For Imitation Learning
Chapter 4 . Trajectory planning and Inverse kinematics
Emulator of Cosmological Simulation for Initial Parameters Study
Cengizhan Can Phoebe de Nooijer
CSC 578 Neural Networks and Deep Learning
Morteza Kheirkhah University College London
Model based RL Part 1.
Presentation transcript:

(boy, that’s a long title) Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End Learning from Demonstration (boy, that’s a long title)

Abstract Autonomous accomplishment of wide variety of tasks Multi-task learning from demonstration Uses raw images as input Low-cost robotic arm Controller Single recurrent neural network Generates robot arm trajectories Performs different manipulation tasks Results Weight sharing/reconstruction improves generalization/robustness Training on simultaneous tasks improves success rate

Purpose: ADL ADL: Activities of Daily Living Everyday tasks performed with ease by most But present a major challenge for disabled / elderly Assistive robots need to operate in an uncontrolled environment User’s home For maximum social impact Develop perception and control methods Use low-cost, imprecise hardware Use readily-available sensory inputs Camera images

Central Idea: LfD Hand-engineer a controller specific to a certain task Robust and resilient vision-based strategy Extremely difficult to implement in an open world Learning from Demonstration (LfD) Humans demonstrate tasks to robots No knowledge about robots’ controls or programming Can be performed using supervised learning Labeled training set Orders of magnitude fewer interactions than in Reinforced Learning Unsupervised model, learns from scratch

Proposal Efficient approach to learn a single multi-task recurrent neural network policy Combines data from multiple tasks Tractable number of demonstrations Input: Images of the environment and Task selector one-hot vector Output: Prediction of the robot’s joints in the next time-step Recurrent Neural Networks Powerful models for learning sequential data (e.g. movement trajectory) Need large number of traning samples (not easy to obtain) Instead, use a single neural network Train it with all data gathered from mutliple (related) tasks There is more data that contain common patters Thus, common patterns among tasks are learned more easily

Proposal (cont’d) Parameters shared across tasks Use efficient auto-regressive estimator Regularize via image reconstruction Perform tractable number of demonstrations Improve generalization Improve success rates Method can perform a variety of manupulation tasks Picking, placing, non-prehensive manipulations Correct its mistakes Attempt tasks multiple times Achieve high success rates ALL this with imprecise demonstrations and a low-cost manipulator

Learning the Multi-Task Controller Collecting a set of demonstrations for multiple tasks Training a single deep recurrent neural network to emulate the user’s behavior on all of the tasks simultaneously Deploying the system in the real world using raw camera perception to perform the tasks

Task Demonstration and Data Collection Human control of the robot using a low-cost and intuitive teleoperation system Low-cost robots do not have zero-gravity modes suitable for demonstration Remote control techniques based on mouse or keyboard: not intuitive Technique used Robot end-effector follows the user’s hand User performs natural movements to demonstrate the task Two solutions for capturing the position and orientation of the hand: (a) using a Leap Motion controller and a “naked” hand (b) using a Playstation Move controller

Neural Network Architecture Convolutional layers Process the input images Map them to a low dimensional feature representation VAE-GAN Encodes input images using Variational Autoencoders Reconstructs realistic images based on the idea of Generative Adverserial Networks A discriminator is used by the generator Discriminate the reconstructed images with the real images. Extracted features of the real and reconstructed images after the third convolutional layer of the discriminator are compared together instead of directly comparing the image pixels that causes uncertainty to appear in the form of blurriness Controller network Extracted visual features are combined with a task selector one-hot vector Fed into 3 layers of layer normalized LSTM to generate joint commands to control the robot VAE-GAN: Variational Auto-Encoder Generative Adversarial Networks

Neural Network Architecture (cont’d) Manipulation tasks can be solved in various ways Human demonstraitors often choose different ways for performing one task Unimodal predictor (Gaussian distribution) averages out dissimilar motions Multi-modal autoegressive estimator Similar to Neural Autoregressive Distribution Estimator (NADE) Capture different modes in demonstrations Increases the number of modes the model can represent exponentially for each step Estimators usually discretize output This approach uses mixture of Gaussians to predict entire probability distribution of the output Encoder tries to fully reconstruct the image Controller network tries to focus on some relevant features Competition/collaboration between the two results in a better feature extractor

Why this Architecture Why predicting the entire probability distribution? Manipulation tasks can usually be solved in multiple ways Same human can choose randomly between the possible solutions Another approach: predict a deterministic joint command Mean squared error minimizes the error between the predicted and demonstrated command In case of multiple demonstrated solutions, the network will learn to predict the average However, averaging between multiple solutions is usually incorrect The robot arm might avoid an object from left / right, but the average results in a collision This approach Models the entire multimodal probability distribution of solutions Samples its solution from that space

Why this Architecture (cont’d) Why use recurrent neural networks? Helps robot remember and stick to a particular strategy similar to humans Sometimes a single input image is not enough for predicting the next action* There will be timesteps where the model will fail to extract enough information from the current input to decide on what to do next** E.g.: sometimes human stops for a couple of time-steps to check the gripper. A single image can’t help decide whether to wait for another time-step or continue. The model needs to remember and count the number of time-steps that it has been waiting before closing the gripper. E.g.: the manipulation object might be occluded or not encoded correctly due to the imperfection of the visual encoder. The LSTM recurrent neural networks are able to store this information and the controller can continue to act based on the network’s memory until it regains the sight of the object.

Why this Architecture (cont’d) Why autoregressive density estimator? By using this density estimator, we condition the prediction of each joint on the prediction of previous joints* Situation: an object is about to be grasped. The network needs to predict whether the gripper should be closed or not in the next time-step. If we do not condition the prediction of gripper on the prediction of other joints in the current time-step, the gripper might be closed before the end-effector is in a good grasping angle. Therefore, the grasp will fail. This idea of modeling the joint probability distribution of outputs by casting it as a product of conditional distributions is used in Pixel RNNs, Neural Autoregressive Distribution Estimator (NADE), and fully visible neural networks.

Robot Experiments Picks up a small bubble wrap and puts it into a small plate Pushes a round plate to a specified area on the left side of its workspace Pushes a large box and places it close to its base with a certain position and orientation Closes pliers and orient them parallel to the borders of the table (pliers are placed on the desk in the open position). Pick up a towel and rub a small screwdriver box to clean it Video: https://www.youtube.com/playlist?list=PL5i33tEH-MHfrXjj_Nekl0jyqgdJomg_a

Compared Methods Single-task Same as single-task method Train network on the data of a single task (one-hot task selector vector is not used) Joints are predicted at the same time (it is not an autoregressive estimator) Same as single-task method Trained on the data of all tasks One-hot task selector vector used to decide which task should be performed Multi-task autoregressive (no reconstruction) Utilizes the autoregressive estimator and trained on the data of all tasks Excluded: VAE/GAN that adds the reconstruction to the error signal Multi-task autoregressive Main approach that contains the autoregressive estimator and reconstruction error

Results Method Task 1 Task 2 Task 3 Task 4 Task 5 Single-task 36% 16% 44% 8% Multi-task 20% 52% 64% Multi-task autoregressive (no reconstruction) 12% 72% 56% 48% Multi-task autoregressive 76% 80% 88%