Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 2c: Caffe: Training of CIFAR-10

Similar presentations


Presentation on theme: "Lecture 2c: Caffe: Training of CIFAR-10"β€” Presentation transcript:

1 Lecture 2c: Caffe: Training of CIFAR-10

2 Agenda Caffe: selection of learning algorithm and its parameters
Caffe: details of convolutional layer implementation CIFAR-10

3 Caffe: optimization parameters
Select solver algorithm SGD parameters: Batch size Learning rate Initial value Learning rate adaptation fixed: πœ†=π‘π‘œπ‘›π‘ π‘‘ exp: πœ†_𝑛=πœ†_0βˆ— 𝛾^𝑛 step : πœ†_𝑛=πœ†_0βˆ— 𝛾^([𝑛/𝑠𝑑𝑒𝑝 ]) inverse: πœ†_𝑛=πœ†_0βˆ—γ€–(1+π›Ύβˆ—π‘›)γ€—^(βˆ’π‘) Lr per layer ( weights/bias) Momentum and weight decay Weight initialization

4 CIFAR-10 Tutorial http://www.cs.toronto.edu/~kriz/cifar.html
x32 colour images in 10 classes, 6000 images per class: 50000 training images 10000 test images.Β 

5 Homework HW1 - CIFAR-10 competition
Look at definition of Backward() for basic layers CIFAR-10 tutorial Read convert_cifar.cpp Train CIFAR-10 with 4 different topologies Experiment with SGD parameters HW1 - CIFAR-10 competition HW2- re-implement convolutional layer β€œby definition” for CPU: Forward() and Backward() without groups // groups – bonus 10 ponits.

6 Convolutional layer internals

7 Conv layer:: Forward()
template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

8 Conv layer:: Forward()
for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

9 Convolutional Layer : im2col
Implementation of Convolutional Layer is based on 2 tricks: Im2col - reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

10 Convolutional Layer : im2col
W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

11 Convolutional Layer : im2col
W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

12 Convolutional Layer : im2col
W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

13 Convolutional Layer : im2col
W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

14 Convolutional Layer : im2col
See Chellapilla, β€œHigh Performance Convolutional Neural Networks for Document Processing” for more details:

15 Convolutional Layer: im2col

16 Conv layer:: Forward()
for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

17 Convolutional Layer :: Backward
πœ•πΈ πœ• 𝑦 π‘™βˆ’1 is sum of convolution of gradients πœ•πΈ πœ• 𝑦 𝑙 with weights W, over all output feature maps: πœ•πΈ πœ• 𝑦 π‘™βˆ’1 = πœ•πΈ πœ• 𝑦 𝑙 Γ— πœ• 𝑦 𝑙 (𝑀, 𝑦 π‘™βˆ’1 ) πœ• 𝑦 π‘™βˆ’1 = 𝑛=1 𝑁 π‘π‘œπ‘›π‘£(π‘Š, πœ•πΈ πœ• 𝑦 𝑙 ) πœ•πΈ πœ• 𝑀 𝑙 is a β€œcorrelation” of πœ•πΈ πœ• 𝑦 𝑙 with corresponding input maps Yl-1: πœ•πΈ πœ• 𝑀 𝑙 = πœ•πΈ πœ•π‘™ βˆ— πœ• 𝑦 𝑙 (𝑀, 𝑦 π‘™βˆ’1 ) πœ• 𝑀 𝑙 = 0≀π‘₯≀𝑋 0<𝑦<π‘Œ πœ•πΈ πœ• 𝑦 𝑙 π‘₯,𝑦 Β° 𝑦 π‘™βˆ’1 (π‘₯,𝑦)

18 Layer:: Backward( ) } class Layer {
Setup (bottom, top); // initialize layer Forward (bottom, top); //compute : 𝑦 𝑙 =𝑓 𝑀 𝑙 , 𝑦 π‘™βˆ’1 Backward( top, bottom); //compute gradient } Backward: we start from gradient πœ•πΈ πœ• 𝑦 𝑙 from next layer (top) and 1) propagate gradient back to previous layer: πœ•πΈ πœ• 𝑦 𝑙 β†’ πœ•πΈ πœ• 𝑦 π‘™βˆ’1 2) compute the gradient of E wrt weights wl: πœ•πΈ πœ• 𝑀 𝑙 (and bias)

19 Convolutional Layer :: Backward
How this is implemented: Backward ( ) { … // im2col data to col_data im2col_cpu( bottom_data , CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, col_data); // gradient w.r.t. weight.: caffe_cpu_gemm (CblasNoTrans, CblasTrans, M_, K_, N_, 1., top_diff, col_data , , weight_diff ); // gradient w.r.t. bottom data: caffe_cpu_gemm (CblasTrans, CblasNoTrans, K_, N_, M_, 1., weight , top_diff , , col_diff ); // col2im back to the data col2im_cpu(col_diff, CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, bottom_diff ); }

20 Convolutional Layer: im2col
Workout these 2 cases based on the example above: 2 input feature, 1output feature 2 input features , 3 ouput features


Download ppt "Lecture 2c: Caffe: Training of CIFAR-10"

Similar presentations


Ads by Google