Lecture 2c: Caffe: Training of CIFAR-10

Name: Lecture 2c: Caffe: Training of CIFAR-10
Uploaded: 2017-07-11T20:16:57+00:00
Duration: PTM12S41
Channel: Mercy Conley
Description: Lecture 2c: Caffe: Training of CIFAR-10

Lecture 2c: Caffe: Training of CIFAR-10

Agenda Caffe: selection of learning algorithm and its parameters
Caffe: details of convolutional layer implementation CIFAR-10

Caffe: optimization parameters
Select solver algorithm SGD parameters: Batch size Learning rate Initial value Learning rate adaptation fixed: 𝜆=𝑐𝑜𝑛𝑠𝑡 exp: 𝜆_𝑛=𝜆_0∗ 𝛾^𝑛 step : 𝜆_𝑛=𝜆_0∗ 𝛾^([𝑛/𝑠𝑡𝑒𝑝 ]) inverse: 𝜆_𝑛=𝜆_0∗〖(1+𝛾∗𝑛)〗^(−𝑐) Lr per layer ( weights/bias) Momentum and weight decay Weight initialization

CIFAR-10 Tutorial http://www.cs.toronto.edu/~kriz/cifar.html
x32 colour images in 10 classes, 6000 images per class: 50000 training images 10000 test images.

Homework HW1 - CIFAR-10 competition
Look at definition of Backward() for basic layers CIFAR-10 tutorial Read convert_cifar.cpp Train CIFAR-10 with 4 different topologies Experiment with SGD parameters HW1 - CIFAR-10 competition HW2- re-implement convolutional layer “by definition” for CPU: Forward() and Backward() without groups // groups – bonus 10 ponits.

Convolutional layer internals

Conv layer:: Forward()
template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

Convolutional Layer : im2col
Implementation of Convolutional Layer is based on 2 tricks: Im2col - reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

See Chellapilla, “High Performance Convolutional Neural Networks for Document Processing” for more details:

Convolutional Layer: im2col

for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

Convolutional Layer :: Backward
𝜕𝐸 𝜕 𝑦 𝑙−1 is sum of convolution of gradients 𝜕𝐸 𝜕 𝑦 𝑙 with weights W, over all output feature maps: 𝜕𝐸 𝜕 𝑦 𝑙−1 = 𝜕𝐸 𝜕 𝑦 𝑙 × 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑦 𝑙−1 = 𝑛=1 𝑁 𝑐𝑜𝑛𝑣(𝑊, 𝜕𝐸 𝜕 𝑦 𝑙 ) 𝜕𝐸 𝜕 𝑤 𝑙 is a “correlation” of 𝜕𝐸 𝜕 𝑦 𝑙 with corresponding input maps Yl-1: 𝜕𝐸 𝜕 𝑤 𝑙 = 𝜕𝐸 𝜕𝑙 ∗ 𝜕 𝑦 𝑙 (𝑤, 𝑦 𝑙−1 ) 𝜕 𝑤 𝑙 = 0≤𝑥≤𝑋 0<𝑦<𝑌 𝜕𝐸 𝜕 𝑦 𝑙 𝑥,𝑦 ° 𝑦 𝑙−1 (𝑥,𝑦)

Layer:: Backward( ) } class Layer {
Setup (bottom, top); // initialize layer Forward (bottom, top); //compute : 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1 Backward( top, bottom); //compute gradient } Backward: we start from gradient 𝜕𝐸 𝜕 𝑦 𝑙 from next layer (top) and 1) propagate gradient back to previous layer: 𝜕𝐸 𝜕 𝑦 𝑙 → 𝜕𝐸 𝜕 𝑦 𝑙−1 2) compute the gradient of E wrt weights wl: 𝜕𝐸 𝜕 𝑤 𝑙 (and bias)

Convolutional Layer :: Backward
How this is implemented: Backward ( ) { … // im2col data to col_data im2col_cpu( bottom_data , CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, col_data); // gradient w.r.t. weight.: caffe_cpu_gemm (CblasNoTrans, CblasTrans, M_, K_, N_, 1., top_diff, col_data , , weight_diff ); // gradient w.r.t. bottom data: caffe_cpu_gemm (CblasTrans, CblasNoTrans, K_, N_, M_, 1., weight , top_diff , , col_diff ); // col2im back to the data col2im_cpu(col_diff, CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, bottom_diff ); }

Convolutional Layer: im2col
Workout these 2 cases based on the example above: 2 input feature, 1output feature 2 input features , 3 ouput features

Lecture 2c: Caffe: Training of CIFAR-10

Similar presentations

Presentation on theme: "Lecture 2c: Caffe: Training of CIFAR-10"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2c: Caffe: Training of CIFAR-10

Similar presentations

Presentation on theme: "Lecture 2c: Caffe: Training of CIFAR-10"— Presentation transcript:

Similar presentations

About project

Feedback