Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 1c: Caffe - Getting Started

Similar presentations


Presentation on theme: "Lecture 1c: Caffe - Getting Started"— Presentation transcript:

1 Lecture 1c: Caffe - Getting Started

2 Agenda Caffe installation Caffe internals MNIST training Digits
Import of Dataset Test description Network topology definition Layers( ) Internal data format ( ) MNIST training Digits Implementation details of Convolutional layer

3 Exercises & Projects Build caffe Play with MNIST topologies & layers
Change to CPU Change atlas to openblas Play with MNIST topologies & layers How does the net accuracy depend on topology? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the definition of following layers: Maxout, normalization layer Convolutional layer internals

4 Open-source Deep Learning libraries
C++/ CUDA, Python and Matlab wrappers, + easily extendable ; NVIDIA Digits Excellent tutorial, C++/Cuda; Lua. Integrated with Theano, C++/Cuda, Python

5

6 Caffe: installation HW: Nvidia – optional, can run on CPU only OS: Ubuntu (12.04) , OS X (Windows – unofficial) CUDA 7.0 (6.5) + cuDNN Install caffe: works out of box

7 Caffe: make Build caffe for CPU
Get caffe source: $ git clone Copy makefile.config and $ make -j Tutorial (MNIST. CIFAR-10, Imagenet): Build caffe for CPU Change to CPU_only and BLAS:= openblas

8 Homework Play with MNIST topologies & layers
How does accuracy depend on topology? How speed and accuracy depend on batch size? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the implementation of following layers: Data layer Convolutional layer IP layer Soft-max and accuracy

9 Caffe: example 1 - MNIST Database:

10 Caffe: Step 1 - import datasets
3 examples of data conversion from images to caffe Mnist: Dataset …/examples/mnist/convert_mnist_data.cpp CIFAR-10: Dataset : …/examples/mnist/convert_cifar_data.cpp Imagenet: Dataset …tools/convert_imageset.cpp

11 Caffe: Database format
leveldb: <key,value>: arbitrary byte arrays; data is stored sorted by key; callers can provide a custom comparison function to override the sort order. basic operations: Put(key,value), Get(key), Delete(key). lmdb: (in dev branch) <key;value> ; data is stored sorted by key uses memory-mapped files: the read performance of in-memory db while still offering the persistence of standard disk-based db concurrent HDF5 (Hierarchical Data Format 5)

12 Caffe: configuration files
You should define train and test configuration Solver parameters file : …/examples/mnist/lenet_solver.prototxt Network descriptor for training and testing: …/examples/mnist/lenet_train_test.prototxt Descriptors for parameters are in …/src/caffe/proto/caffe.proto . Protobuf tool (Google protocol buffers) will generate C++ classes from these descriptors

13 LeNet topology Soft Max Inner Product BACKWARD FORWARD ReLUP
Pooling [2x2, stride 2] Convolutional layer [5x5] Pooling [2x2, stride 2] Convolutional layer [5x5] Data Layer

14 Layer:: Forward( ) } Layer::Forward( ) propagate yl-1 to next layer:
class Layer { Setup (bottom, top); // initialize layer Forward (bottom, top); //compute next layer Backward( top, bottom); //compute gradient } Layer::Forward( ) propagate yl-1 to next layer: 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1

15 Data Layer data label name: "mnist" type: DATA data_param {
source: "mnist-train-leveldb" batch_size: 64 scale: } top: "data" top: "label" mnist-train-leveldb label

16 Blob All data is stored as BLOBs – Binary (Basic) Large Objects
class Blob { Blob( int num, int channels, int height, int width); const Dtype* cpu_data() const; const Dtype* gpu_data() const; … protected: shared_ptr<SyncedMemory> data_; // containter for cpu_ / gpu_memory shared_ptr<SyncedMemory> diff_; // gradient int num_; int channels_; int height_; int width_; int count_; }

17 SynchedMemory class SyncedMemory { public: SyncedMemory()
const void* cpu_data(); const void* gpu_data(); void* mutable_cpu_data(); void* mutable_gpu_data(); enum SyncedHead { UNINITIALIZED, HEAD_AT_CPU, HEAD_AT_GPU, SYNCED }; SyncedHead head() { return head_; } size_t size() { return size_; }

18 SynchedMemory class SyncedMemory { … private: void to_cpu();
void to_gpu(); void* cpu_ptr_; void* gpu_ptr_; size_t size_; SyncedHead head_; }; // class SyncedMemory

19 Convolutional Layer Data name: "conv1" type: CONVOLUTION conv1
blobs_lr: 1. convolution_param { num_output: 20 kernelsize: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "data" top: "conv1” conv1 conv1 Data conv1 conv1

20 Convolutional Layer for (n = 0; n < N; n++) for (m = 0; m < M; m ++) for(y = 0; y<Y; y++) for(x = 0; x<X; x++) for (p = 0; p< K; p++) for (q = 0; q< K; q++) yL (n; x, y) += yL-1(m, x+p, y+q) * w (m , n; p, q); Add bias…

21 Pooling Layer name: "pool1" type: POOLING pooling_param { kernel_size: 2 stride: 2 pool: MAX } bottom: "conv1" top: "pool1" for (p = 0; p< k; p++) for (q = 0; q< k; q++) yL (x, y) = max ( yL(x, y), yL-1(x*s + p, y*s + q) ) ; Pooling helps to extract features that are increasingly invariant to local transformations of the input image.

22 Inner product (Fully Connected) Layer
name: "ip1" type: INNER_PRODUCT blobs_lr: 1. blobs_lr: 2. inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "pool2" top: "ip1" YL (n) = ∑ WL(n, m) * YL-1 (m)

23 ReLU Layer YL (n; x, y) = max( YL-1(n; x, y), 0 ); layers {
name: "relu1" type: RELU bottom: "ip1" top: "ip1" } YL (n; x, y) = max( YL-1(n; x, y), 0 );

24 SoftMax + Loss Layer Combines softmax: E = - log (YL-(label (n) )
layers { name: "loss" type: SOFTMAX_LOSS bottom: "ip2" bottom: "label" } label X[0..9] Combines softmax: YL [i] = exp (YL-1[i] ) / ( ∑ exp (YL-[i] ); with log-loss : E = - log (YL-(label (n) )

25 Digits https://developer.nvidia.com/digits
Nice development environment built on top of caffe: Easy installation Manage datasets and models Visualize training Look inside layers Open source

26 Backup: Convolutional layer internals

27 Conv layer:: Forward()
template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

28 Conv layer:: Forward()
for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

29 Conv layer:: Forward()
for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

30 Convolutional Layer : im2col
Implementation of Convolutional Layer is based on 2 tricks: reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

31 Convolutional Layer : im2col
W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

32 Convolutional Layer : im2col
W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

33 Convolutional Layer : im2col
W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

34 Convolutional Layer : im2col
W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

35 Convolutional Layer : im2col
See Chellapilla, “High Performance Convolutional Neural Networks for Document Processing” for more details:

36 Convolutional Layer: im2col

37 Im2col: Exercise Workout these 2 cases based on the example above:
2 input feature, 1output feature 2 input features , 3 ouput features


Download ppt "Lecture 1c: Caffe - Getting Started"

Similar presentations


Ads by Google