Lecture 1c: Caffe - Getting Started

Name: Lecture 1c: Caffe - Getting Started
Uploaded: 2017-07-13T01:09:13+00:00
Duration: PTM19S28
Channel: Shanon Osborne
Description: Lecture 1c: Caffe - Getting Started

Lecture 1c: Caffe - Getting Started

Agenda Caffe installation Caffe internals MNIST training Digits
Import of Dataset Test description Network topology definition Layers( ) Internal data format ( ) MNIST training Digits Implementation details of Convolutional layer

Exercises & Projects Build caffe Play with MNIST topologies & layers
Change to CPU Change atlas to openblas Play with MNIST topologies & layers How does the net accuracy depend on topology? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the definition of following layers: Maxout, normalization layer Convolutional layer internals

Open-source Deep Learning libraries
C++/ CUDA, Python and Matlab wrappers, + easily extendable ; NVIDIA Digits Excellent tutorial, C++/Cuda; Lua. Integrated with Theano, C++/Cuda, Python

Caffe: installation HW: Nvidia – optional, can run on CPU only OS: Ubuntu (12.04) , OS X (Windows – unofficial) CUDA 7.0 (6.5) + cuDNN Install caffe: works out of box

Caffe: make Build caffe for CPU
Get caffe source: $ git clone Copy makefile.config and $ make -j Tutorial (MNIST. CIFAR-10, Imagenet): Build caffe for CPU Change to CPU_only and BLAS:= openblas

Homework Play with MNIST topologies & layers
How does accuracy depend on topology? How speed and accuracy depend on batch size? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the implementation of following layers: Data layer Convolutional layer IP layer Soft-max and accuracy

Caffe: example 1 - MNIST Database:

Caffe: Step 1 - import datasets
3 examples of data conversion from images to caffe Mnist: Dataset …/examples/mnist/convert_mnist_data.cpp CIFAR-10: Dataset : …/examples/mnist/convert_cifar_data.cpp Imagenet: Dataset …tools/convert_imageset.cpp

Caffe: Database format
leveldb: <key,value>: arbitrary byte arrays; data is stored sorted by key; callers can provide a custom comparison function to override the sort order. basic operations: Put(key,value), Get(key), Delete(key). lmdb: (in dev branch) <key;value> ; data is stored sorted by key uses memory-mapped files: the read performance of in-memory db while still offering the persistence of standard disk-based db concurrent HDF5 (Hierarchical Data Format 5)

Caffe: configuration files
You should define train and test configuration Solver parameters file : …/examples/mnist/lenet_solver.prototxt Network descriptor for training and testing: …/examples/mnist/lenet_train_test.prototxt Descriptors for parameters are in …/src/caffe/proto/caffe.proto . Protobuf tool (Google protocol buffers) will generate C++ classes from these descriptors

LeNet topology Soft Max Inner Product BACKWARD FORWARD ReLUP
Pooling [2x2, stride 2] Convolutional layer [5x5] Pooling [2x2, stride 2] Convolutional layer [5x5] Data Layer

Layer:: Forward( ) } Layer::Forward( ) propagate yl-1 to next layer:
class Layer { Setup (bottom, top); // initialize layer Forward (bottom, top); //compute next layer Backward( top, bottom); //compute gradient } Layer::Forward( ) propagate yl-1 to next layer: 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1

Data Layer data label name: "mnist" type: DATA data_param {
source: "mnist-train-leveldb" batch_size: 64 scale: } top: "data" top: "label" mnist-train-leveldb label

Blob All data is stored as BLOBs – Binary (Basic) Large Objects
class Blob { Blob( int num, int channels, int height, int width); const Dtype* cpu_data() const; const Dtype* gpu_data() const; … protected: shared_ptr<SyncedMemory> data_; // containter for cpu_ / gpu_memory shared_ptr<SyncedMemory> diff_; // gradient int num_; int channels_; int height_; int width_; int count_; }

SynchedMemory class SyncedMemory { public: SyncedMemory()
const void* cpu_data(); const void* gpu_data(); void* mutable_cpu_data(); void* mutable_gpu_data(); enum SyncedHead { UNINITIALIZED, HEAD_AT_CPU, HEAD_AT_GPU, SYNCED }; SyncedHead head() { return head_; } size_t size() { return size_; }

SynchedMemory class SyncedMemory { … private: void to_cpu();
void to_gpu(); void* cpu_ptr_; void* gpu_ptr_; size_t size_; SyncedHead head_; }; // class SyncedMemory

Convolutional Layer Data name: "conv1" type: CONVOLUTION conv1
blobs_lr: 1. convolution_param { num_output: 20 kernelsize: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "data" top: "conv1” conv1 conv1 Data conv1 conv1

Convolutional Layer for (n = 0; n < N; n++) for (m = 0; m < M; m ++) for(y = 0; y<Y; y++) for(x = 0; x<X; x++) for (p = 0; p< K; p++) for (q = 0; q< K; q++) yL (n; x, y) += yL-1(m, x+p, y+q) * w (m , n; p, q); Add bias…

Pooling Layer name: "pool1" type: POOLING pooling_param { kernel_size: 2 stride: 2 pool: MAX } bottom: "conv1" top: "pool1" for (p = 0; p< k; p++) for (q = 0; q< k; q++) yL (x, y) = max ( yL(x, y), yL-1(x*s + p, y*s + q) ) ; Pooling helps to extract features that are increasingly invariant to local transformations of the input image.

Inner product (Fully Connected) Layer
name: "ip1" type: INNER_PRODUCT blobs_lr: 1. blobs_lr: 2. inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "pool2" top: "ip1" YL (n) = ∑ WL(n, m) * YL-1 (m)

ReLU Layer YL (n; x, y) = max( YL-1(n; x, y), 0 ); layers {
name: "relu1" type: RELU bottom: "ip1" top: "ip1" } YL (n; x, y) = max( YL-1(n; x, y), 0 );

SoftMax + Loss Layer Combines softmax: E = - log (YL-(label (n) )
layers { name: "loss" type: SOFTMAX_LOSS bottom: "ip2" bottom: "label" } label X[0..9] Combines softmax: YL [i] = exp (YL-1[i] ) / ( ∑ exp (YL-[i] ); with log-loss : E = - log (YL-(label (n) )

Digits https://developer.nvidia.com/digits
Nice development environment built on top of caffe: Easy installation Manage datasets and models Visualize training Look inside layers Open source

Backup: Convolutional layer internals

Conv layer:: Forward()
template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

Convolutional Layer : im2col
Implementation of Convolutional Layer is based on 2 tricks: reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

See Chellapilla, “High Performance Convolutional Neural Networks for Document Processing” for more details:

Convolutional Layer: im2col

Im2col: Exercise Workout these 2 cases based on the example above:
2 input feature, 1output feature 2 input features , 3 ouput features

Lecture 1c: Caffe - Getting Started

Similar presentations

Presentation on theme: "Lecture 1c: Caffe - Getting Started"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 1c: Caffe - Getting Started

Similar presentations

Presentation on theme: "Lecture 1c: Caffe - Getting Started"— Presentation transcript:

Similar presentations

About project

Feedback