Lecture 2c: Caffe: Training of CIFAR-10

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Lecture 5: CNN: Regularization
Lecture 2: Caffe: getting started Forward propagation
Spike Sorting Goal: Extract neural spike trains from MEA electrode data Method 1: Convolution of template spikes Method 2: Sort by spikes features.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.
Introduction to Neural Networks John Paxton Montana State University Summer 2003.
Lecture 4: CNN: Optimization Algorithms
Lecture 3: CNN: Back-propagation
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.
Lecture 8: Caffe - CPU Optimization
ECE6504 – Deep Learning for Perception
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Radoslav Forgáč, Igor Mokriš Pulse Coupled Neural Network Models for Dimension Reduction of Classification Space Institute of Informatics Slovak Academy.
Non-Bayes classifiers. Linear discriminants, neural networks.
Deep Convolutional Nets
Deep learning using Caffe
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Artificiel Neural Networks 3 Tricks for improved learning Morten Nielsen Department of Systems Biology, DTU.
Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.
Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.
Convolutional Neural Network
Lecture 1c: Caffe - Getting Started
Introduction to Convolutional Neural Networks
Lecture 3b: CNN: Advanced Layers
Assignment 4: Deep Convolutional Neural Networks
Lecture 2b: Convolutional NN: Optimization Algorithms
Lecture 3a Analysis of training of NN
Parallel Multi Channel Convolution using General Matrix Multiplication
Analysis of Sparse Convolutional Neural Networks
CSE 190 Caffe Tutorial.
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Principal Solutions AWS Deep Learning
Matt Gormley Lecture 16 October 24, 2016
Deep Learning Libraries
LARS Background Reference Paper: Reference Patch in Intel Caffe
Mini Presentations - part 2
Lecture 5 Smaller Network: CNN
First Steps With Deep Learning Course.
Introduction to CuDNN (CUDA Deep Neural Nets)
Neural Networks and Backpropagation
Deep Learning Convoluted Neural Networks Part 2 11/13/
Fully Convolutional Networks for Semantic Segmentation
Brewing Deep Networks With Caffe
Tensorflow in Deep Learning
Convolutional Neural Network
Cache Replacement Scheme based on Back Propagation Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Face Recognition with Neural Networks
Smart Robots, Drones, IoT
network of simple neuron-like computing elements
CSC 578 Neural Networks and Deep Learning
Convolutional networks
Neural Networks Geoff Hulten.
Capabilities of Threshold Neurons
Forward and Backward Max Pooling
Neural Networks.
Artificial Intelligence 10. Neural Networks
Image Classification & Training of Neural Networks
Convolution Layer Optimization
Deep Learning Libraries
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
Image recognition.
Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.
CSC 578 Neural Networks and Deep Learning
Dhruv Batra Georgia Tech
Artificial Neural Networks / Spring 2002
Principles of Back-Propagation
Presentation transcript:

Lecture 2c: Caffe: Training of CIFAR-10 boris.ginsburg@gmail.com

Agenda Caffe: selection of learning algorithm and its parameters Caffe: details of convolutional layer implementation CIFAR-10

Caffe: optimization parameters Select solver algorithm SGD parameters: Batch size Learning rate Initial value Learning rate adaptation fixed: 𝜆=𝑐𝑜𝑛𝑠𝑡 exp: 𝜆_𝑛=𝜆_0∗ 𝛾^𝑛 step : 𝜆_𝑛=𝜆_0∗ 𝛾^([𝑛/𝑠𝑡𝑒𝑝 ]) inverse: 𝜆_𝑛=𝜆_0∗〖(1+𝛾∗𝑛)〗^(−𝑐) Lr per layer ( weights/bias) Momentum and weight decay Weight initialization

CIFAR-10 Tutorial http://www.cs.toronto.edu/~kriz/cifar.html https://www.kaggle.com/c/cifar-10 60000 32x32 colour images in 10 classes, 6000 images per class: 50000 training images 10000 test images. 

Homework HW1 - CIFAR-10 competition Look at definition of Backward() for basic layers CIFAR-10 tutorial Read convert_cifar.cpp Train CIFAR-10 with 4 different topologies Experiment with SGD parameters HW1 - CIFAR-10 competition HW2- re-implement convolutional layer “by definition” for CPU: Forward() and Backward() without groups // groups – bonus 10 ponits.

Convolutional layer internals

Conv layer:: Forward() template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

Convolutional Layer : im2col Implementation of Convolutional Layer is based on 2 tricks: Im2col - reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

Convolutional Layer : im2col See Chellapilla, “High Performance Convolutional Neural Networks for Document Processing” for more details:

Convolutional Layer: im2col

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

Convolutional Layer :: Backward 𝜕𝐸 𝜕 𝑦 𝑙−1 is sum of convolution of gradients 𝜕𝐸 𝜕 𝑦 𝑙 with weights W, over all output feature maps: 𝜕𝐸 𝜕 𝑦 𝑙−1 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑦 𝑙−1 = 𝑛=1 𝑁 𝑐𝑜𝑛𝑣(𝑊, 𝜕𝐸 𝜕 𝑦 𝑙 ) 𝜕𝐸 𝜕 𝑤 𝑙 is a “correlation” of 𝜕𝐸 𝜕 𝑦 𝑙 with corresponding input maps Yl-1: 𝜕𝐸 𝜕 𝑤 𝑙 = 𝜕𝐸 𝜕𝑙 ∗ 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑤 𝑙 = 0≤𝑥≤𝑋 0<𝑦<𝑌 𝜕𝐸 𝜕 𝑦 𝑙 𝑥,𝑦 ° 𝑦 𝑙−1 (𝑥,𝑦)

Layer:: Backward( ) } class Layer { Setup (bottom, top); // initialize layer Forward (bottom, top); //compute : 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1 Backward( top, bottom); //compute gradient } Backward: we start from gradient 𝜕𝐸 𝜕 𝑦 𝑙 from next layer (top) and 1) propagate gradient back to previous layer: 𝜕𝐸 𝜕 𝑦 𝑙 → 𝜕𝐸 𝜕 𝑦 𝑙−1 2) compute the gradient of E wrt weights wl: 𝜕𝐸 𝜕 𝑤 𝑙 (and bias)

Convolutional Layer :: Backward How this is implemented: Backward ( ) { … // im2col data to col_data im2col_cpu( bottom_data , CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, col_data); // gradient w.r.t. weight.: caffe_cpu_gemm (CblasNoTrans, CblasTrans, M_, K_, N_, 1., top_diff, col_data , 1., weight_diff ); // gradient w.r.t. bottom data: caffe_cpu_gemm (CblasTrans, CblasNoTrans, K_, N_, M_, 1., weight , top_diff , 0., col_diff ); // col2im back to the data col2im_cpu(col_diff, CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, bottom_diff ); }

Convolutional Layer: im2col Workout these 2 cases based on the example above: 2 input feature, 1output feature 2 input features , 3 ouput features