Download presentation
1
Lecture 2c: Caffe: Training of CIFAR-10
2
Agenda Caffe: selection of learning algorithm and its parameters
Caffe: details of convolutional layer implementation CIFAR-10
3
Caffe: optimization parameters
Select solver algorithm SGD parameters: Batch size Learning rate Initial value Learning rate adaptation fixed: π=ππππ π‘ exp: π_π=π_0β πΎ^π step : π_π=π_0β πΎ^([π/π π‘ππ ]) inverse: π_π=π_0βγ(1+πΎβπ)γ^(βπ) Lr per layer ( weights/bias) Momentum and weight decay Weight initialization
4
CIFAR-10 Tutorial http://www.cs.toronto.edu/~kriz/cifar.html
x32 colour images in 10 classes, 6000 images per class: 50000 training images 10000 test images.Β
5
Homework HW1 - CIFAR-10 competition
Look at definition of Backward() for basic layers CIFAR-10 tutorial Read convert_cifar.cpp Train CIFAR-10 with 4 different topologies Experiment with SGD parameters HW1 - CIFAR-10 competition HW2- re-implement convolutional layer βby definitionβ for CPU: Forward() and Backward() without groups // groups β bonus 10 ponits.
6
Convolutional layer internals
7
Conv layer:: Forward()
template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); β¦.
8
Conv layer:: Forward()
for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);
9
Convolutional Layer : im2col
Implementation of Convolutional Layer is based on 2 tricks: Im2col - reduction of convolution to matrix β matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Letβs discuss reduction to matrix β matrix multiply: im2col( ). We will talk about BLAS in details later.
10
Convolutional Layer : im2col
W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)
11
Convolutional Layer : im2col
W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)
12
Convolutional Layer : im2col
W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)
13
Convolutional Layer : im2col
W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)
14
Convolutional Layer : im2col
See Chellapilla, βHigh Performance Convolutional Neural Networks for Document Processingβ for more details:
15
Convolutional Layer: im2col
16
Conv layer:: Forward()
for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));
17
Convolutional Layer :: Backward
ππΈ π π¦ πβ1 is sum of convolution of gradients ππΈ π π¦ π with weights W, over all output feature maps: ππΈ π π¦ πβ1 = ππΈ π π¦ π Γ π π¦ π (π€, π¦ πβ1 ) π π¦ πβ1 = π=1 π ππππ£(π, ππΈ π π¦ π ) ππΈ π π€ π is a βcorrelationβ of ππΈ π π¦ π with corresponding input maps Yl-1: ππΈ π π€ π = ππΈ ππ β π π¦ π (π€, π¦ πβ1 ) π π€ π = 0β€π₯β€π 0<π¦<π ππΈ π π¦ π π₯,π¦ Β° π¦ πβ1 (π₯,π¦)
18
Layer:: Backward( ) } class Layer {
Setup (bottom, top); // initialize layer Forward (bottom, top); //compute : π¦ π =π π€ π , π¦ πβ1 Backward( top, bottom); //compute gradient } Backward: we start from gradient ππΈ π π¦ π from next layer (top) and 1) propagate gradient back to previous layer: ππΈ π π¦ π β ππΈ π π¦ πβ1 2) compute the gradient of E wrt weights wl: ππΈ π π€ π (and bias)
19
Convolutional Layer :: Backward
How this is implemented: Backward ( ) { β¦ // im2col data to col_data im2col_cpu( bottom_data , CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, col_data); // gradient w.r.t. weight.: caffe_cpu_gemm (CblasNoTrans, CblasTrans, M_, K_, N_, 1., top_diff, col_data , , weight_diff ); // gradient w.r.t. bottom data: caffe_cpu_gemm (CblasTrans, CblasNoTrans, K_, N_, M_, 1., weight , top_diff , , col_diff ); // col2im back to the data col2im_cpu(col_diff, CHANNELS_, HEIGHT_, WIDTH_, KSIZE_, PAD_, STRIDE_, bottom_diff ); }
20
Convolutional Layer: im2col
Workout these 2 cases based on the example above: 2 input feature, 1output feature 2 input features , 3 ouput features
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.