Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Neural networks Introduction Fitting neural networks
Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Image classification by sparse coding.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
How to do backpropagation in a brain
Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December.
Cs: compressed sensing
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
A shallow introduction to Deep Learning
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Object detection, deep learning, and R-CNNs
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Analysis of Classification Algorithms In Handwritten Digit Recognition Logan Helms Jon Daniele.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Neural Network and Deep Learning 王强昌 MLA lab.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Deep learning Tsai bing-chen 10/22.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Neural networks and support vector machines
CS 9633 Machine Learning Support Vector Machines
Learning Deep Generative Models by Ruslan Salakhutdinov
Compressive Coded Aperture Video Reconstruction
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
ECE 5424: Introduction to Machine Learning
Learning Mid-Level Features For Recognition
Article Review Todd Hricik.
Matt Gormley Lecture 16 October 24, 2016
Restricted Boltzmann Machines for Classification
References [1] - Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11): ,
Classification with Perceptrons Reading:
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Deep Belief Networks Psychology 209 February 22, 2013.
Deep Learning Workshop
Group Norm for Learning Latent Structural SVMs
Machine Learning Today: Reading: Maria Florina Balcan
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Deep Learning for Non-Linear Control
Neural networks (3) Regularization Autoencoder
Autoencoders Supervised learning uses explicit labels/correct output in order to train a network. E.g., classification of images. Unsupervised learning.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Introduction to Neural Networks
Machine Learning.
Presentation transcript:

Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie Mellon University

Outline Discuss value of modular and deep gradient based systems, especially in robotics Introduce a new and useful family of modules Properties of new family –Online training with non-gaussian priors E.g. encourage sparsity, multi-task weight sharing –Modules internally solve continuous optimization problems Captures interesting nonlinear effects such as inhibition that involve coupled outputs Sparse Approximation –Modules can be jointly optimized by a generalization of backpropagation

Deep Modular Learning systems Efficiently represent complex functions –Particularly efficient for closely related tasks Recently shown to be powerful learning machines –Greedy layer-wise training improves initialization Greedy module-wise training is useful for designing complex systems –Design and Initialize modules independently –Jointly optimize the final system with backpropagation Gradient methods allow the incorporation of diverse data sources and losses Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, 1998 Y. Bengio, P. Lamblin, H. Larochelle, “Greedy layer-wise training of deep networks.”, NIPS 2007 G. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief networks.”, Neural Computation 2006

Mobile Robot Perception RGB Camera NIR Camera Ladar Lots of unlabeled data Hard to define traditional supervised learning data Target task is defined by weakly-labeled structured output data

Perception Problem: Scene labeling Motion Planner Cost for each 2-D cell

Goal System Labeled 3-D points Camera Laser Labelme Webcam Data Labelme Observed Wheel Heights IMU data Object Classification Cost Lighting Variance Cost Human-Driven Example Paths Proprioception Prediction Cost Ground Plane Estimator Max Margin Planner Classification Cost Point Classifier Data Flow Gradient Motion plans

New Modules Modules that are important in this system require two new abilities –Induce new priors on weights –Allow modules to solve internal optimization problems

Standard Backpropagation assumes L2 prior Gradient descent with convex loss functions: Small steps with early stopping imply L 2 regularization –Minimizes a regret bound by solving the optimization: –Which bounds the true regret M. Zinkevich, “Online Convex Programming and Generalized Infinitesimal Gradient Ascent”, ‘03

KL-divergence –Useful if many features are irrelevant –Approximately solved with exponentiated gradient descent multi-task priors (encourage sharing between related tasks) Alternate Priors Bradley and Bagnell 2008 Argyriou and Evgeniou, “Multi-task Feature Learning”, NIPS 07

L 2 Backpropagation Module (M 2 ) + Loss Function c a Module (M 3 )Module (M 1 ) b Input Loss Function

With KL prior modules Module (M 2 ) + Loss Function c a Module (M 3 )Module (M 1 ) b Input Loss Function

General Mirror Descent Module (M 2 ) + Loss Function c a Module (M 3 )Module (M 1 ) b Input Loss Function

New Modules Modules that are important in this system require two new abilities –Induce new priors on weights –Allow modules to solve internal optimization problems interesting nonlinear effects such as inhibition that involve coupled outputs Sparse Approximation

Inhibition Input Basis

Inhibition Input Basis Projection

Inhibition Input Basis KL-regularized Optimization

Sparse Approximation Assumes the input is a sparse combination of elements, plus observation noise –Many possible elements –Only a few present in any particular example True for many real-world signals Many applications –Compression (JPEG), Sensing (MRI), Machine Learning Produces effects observed in biology –V1 receptive fields, Inhibition Tropp et al. “Algorithms For Simultaneous Sparse Approximation”, 2005 Olhausen and Field, “Sparse Coding of Natural Images Produces Localized, Oriented, Bandpass Receptive Fields”, Nature 95 Doi and Lewicki, “Sparse Coding of natural images using an overcomplete set of limited capacity units”, NIPS 04 Raina et al. “Self Taught Learning: Transfer Learning from unlabeled data”, ICML ’07

Sparse Approximation Semantic meaning is sparse Visual Representation is Sparse (JPEG)

MNIST Digits Dataset 60,000 28x28 pixel handwritten digits –10,000 reserved for a validation set Separate 10,000 digit test set

Sparse Approximation Basis Coefficients (w 1 ) Input Reconstruction Error (Cross Entropy) r 1 =Bw Error gradient

Sparse Approximation KL-regularized Coefficients on a KL-regularized Basis Input Output

Sparse Coding Basis Coefficients (w (i) ) Input Reconstruction Error (Cross Entropy) r=Bw (i) Training Examples Minimize over W and B

Optimization Modules L1 Regularized Sparse Approximation L1 Regularized Sparse Coding Lee et al. “Efficient Sparse Coding Algorithms”, NIPS '06 Reconstruction Loss Regularization Term Not Convex Convex

KL-regularized Sparse Approximation Unnormalized KL Reconstruction Loss Since this is continuous and differentiable, at the minimum we have: Differentiating both sides with respect to B, and solving for the k th row we get:

Preliminary Results KL sparse coding with backpropagation KL improves classification performance Backpropagation further improves performance L1 sparse coding

Main Points Modular, gradient based systems are an important design tool for large scale learning systems Need new tools to include a family of modules that have important properties Presented a generalized backpropagation technique that –Allow priors that encourage, e.g. sparsity (KL prior): uses mirror descent to modify weights –Uses implicit differentiation to compute gradients through modules (e.g. sparse approximation) that internally solve optimization Demonstrated work-in-progress on building deep, sparse coders using generalized backpropagation

Acknowledgements The Authors would like to thank the UPI team, especially Cris Dima, David Silver, and Carl Wellington DARPA and the Army Research Office for supporting this work through the UPI program and the NDSEG fellowship