Lecture 2c: Caffe: Training of CIFAR-10

Slides:

Advertisements

Similar presentations

Neural Networks and Kernel Methods

Advertisements

NEURAL NETWORKS Backpropagation Algorithm

Lecture 5: CNN: Regularization

Lecture 2: Caffe: getting started Forward propagation

Spike Sorting Goal: Extract neural spike trains from MEA electrode data Method 1: Convolution of template spikes Method 2: Sort by spikes features.

Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.

Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Lecture 4: CNN: Optimization Algorithms

Lecture 3: CNN: Back-propagation

S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.

Lecture 8: Caffe - CPU Optimization

ECE6504 – Deep Learning for Perception

Backpropagation An efficient way to compute the gradient Hung-yi Lee.

Radoslav Forgáč, Igor Mokriš Pulse Coupled Neural Network Models for Dimension Reduction of Classification Space Institute of Informatics Slovak Academy.

Non-Bayes classifiers. Linear discriminants, neural networks.

Deep Convolutional Nets

Deep learning using Caffe

Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.

Artificiel Neural Networks 3 Tricks for improved learning Morten Nielsen Department of Systems Biology, DTU.

Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.

Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.

Convolutional Neural Network

Lecture 1c: Caffe - Getting Started

Introduction to Convolutional Neural Networks

Lecture 3b: CNN: Advanced Layers

Assignment 4: Deep Convolutional Neural Networks

Lecture 2b: Convolutional NN: Optimization Algorithms

Lecture 3a Analysis of training of NN

Parallel Multi Channel Convolution using General Matrix Multiplication

Analysis of Sparse Convolutional Neural Networks

CSE 190 Caffe Tutorial.

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Principal Solutions AWS Deep Learning

Matt Gormley Lecture 16 October 24, 2016

Deep Learning Libraries

LARS Background Reference Paper: Reference Patch in Intel Caffe

Mini Presentations - part 2

Lecture 5 Smaller Network: CNN

First Steps With Deep Learning Course.

Introduction to CuDNN (CUDA Deep Neural Nets)

Neural Networks and Backpropagation

Deep Learning Convoluted Neural Networks Part 2 11/13/

Fully Convolutional Networks for Semantic Segmentation

Brewing Deep Networks With Caffe

Tensorflow in Deep Learning

Convolutional Neural Network

Cache Replacement Scheme based on Back Propagation Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Face Recognition with Neural Networks

Smart Robots, Drones, IoT

network of simple neuron-like computing elements

CSC 578 Neural Networks and Deep Learning

Convolutional networks

Neural Networks Geoff Hulten.

Capabilities of Threshold Neurons

Forward and Backward Max Pooling

Neural Networks.

Artificial Intelligence 10. Neural Networks

Image Classification & Training of Neural Networks

Convolution Layer Optimization

Deep Learning Libraries

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

Image recognition.

Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.

CSC 578 Neural Networks and Deep Learning

Dhruv Batra Georgia Tech

Artificial Neural Networks / Spring 2002

Principles of Back-Propagation

Presentation transcript:

Lecture 2c: Caffe: Training of CIFAR-10 boris.ginsburg@gmail.com

Agenda Caffe: selection of learning algorithm and its parameters Caffe: details of convolutional layer implementation CIFAR-10

Caffe: optimization parameters Select solver algorithm SGD parameters: Batch size Learning rate Initial value Learning rate adaptation fixed: 𝜆=𝑐𝑜𝑛𝑠𝑡 exp: 𝜆_𝑛=𝜆_0∗ 𝛾^𝑛 step : 𝜆_𝑛=𝜆_0∗ 𝛾^([𝑛/𝑠𝑡𝑒𝑝 ]) inverse: 𝜆_𝑛=𝜆_0∗〖(1+𝛾∗𝑛)〗^(−𝑐) Lr per layer ( weights/bias) Momentum and weight decay Weight initialization

CIFAR-10 Tutorial http://www.cs.toronto.edu/~kriz/cifar.html https://www.kaggle.com/c/cifar-10 60000 32x32 colour images in 10 classes, 6000 images per class: 50000 training images 10000 test images.

Homework HW1 - CIFAR-10 competition Look at definition of Backward() for basic layers CIFAR-10 tutorial Read convert_cifar.cpp Train CIFAR-10 with 4 different topologies Experiment with SGD parameters HW1 - CIFAR-10 competition HW2- re-implement convolutional layer “by definition” for CPU: Forward() and Backward() without groups // groups – bonus 10 ponits.

Convolutional layer internals

Conv layer:: Forward() template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

Convolutional Layer : im2col Implementation of Convolutional Layer is based on 2 tricks: Im2col - reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

Convolutional Layer : im2col See Chellapilla, “High Performance Convolutional Neural Networks for Document Processing” for more details:

Convolutional Layer: im2col

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

Convolutional Layer :: Backward 𝜕𝐸 𝜕 𝑦 𝑙−1 is sum of convolution of gradients 𝜕𝐸 𝜕 𝑦 𝑙 with weights W, over all output feature maps: 𝜕𝐸 𝜕 𝑦 𝑙−1 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑦 𝑙−1 = 𝑛=1 𝑁 𝑐𝑜𝑛𝑣(𝑊, 𝜕𝐸 𝜕 𝑦 𝑙 ) 𝜕𝐸 𝜕 𝑤 𝑙 is a “correlation” of 𝜕𝐸 𝜕 𝑦 𝑙 with corresponding input maps Yl-1: 𝜕𝐸 𝜕 𝑤 𝑙 = 𝜕𝐸 𝜕𝑙 ∗ 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑤 𝑙 = 0≤𝑥≤𝑋 0<𝑦<𝑌 𝜕𝐸 𝜕 𝑦 𝑙 𝑥,𝑦 ° 𝑦 𝑙−1 (𝑥,𝑦)

Layer:: Backward( ) } class Layer { Setup (bottom, top); // initialize layer Forward (bottom, top); //compute : 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1 Backward( top, bottom); //compute gradient } Backward: we start from gradient 𝜕𝐸 𝜕 𝑦 𝑙 from next layer (top) and 1) propagate gradient back to previous layer: 𝜕𝐸 𝜕 𝑦 𝑙 → 𝜕𝐸 𝜕 𝑦 𝑙−1 2) compute the gradient of E wrt weights wl: 𝜕𝐸 𝜕 𝑤 𝑙 (and bias)

Convolutional Layer :: Backward How this is implemented: Backward ( ) { … // im2col data to col_data im2col_cpu( bottom_data , CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, col_data); // gradient w.r.t. weight.: caffe_cpu_gemm (CblasNoTrans, CblasTrans, M_, K_, N_, 1., top_diff, col_data , 1., weight_diff ); // gradient w.r.t. bottom data: caffe_cpu_gemm (CblasTrans, CblasNoTrans, K_, N_, M_, 1., weight , top_diff , 0., col_diff ); // col2im back to the data col2im_cpu(col_diff, CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, bottom_diff ); }

Convolutional Layer: im2col Workout these 2 cases based on the example above: 2 input feature, 1output feature 2 input features , 3 ouput features