Lecture 1c: Caffe - Getting Started

Slides:



Advertisements
Similar presentations
Lecture 6: Classification & Localization
Advertisements

Lecture 5: CNN: Regularization
Lecture 2: Caffe: getting started Forward propagation
Lecture 7: Caffe: GPU Optimization
Lecture 3: CNN: Back-propagation
Spatial Pyramid Pooling in Deep Convolutional
Lecture 8: Caffe - CPU Optimization
Brewing Deep Networks With Caffe
ECE6504 – Deep Learning for Perception
Basic User Guide 1 Installation Data preparation Examples – Convolutional Neural Network (CNN) Dataset: CIFAR-10 Single Worker / Synchronous / Downpour.
Deep Learning for Julia Chiyuan Zhang CSAIL, MIT
Object detection, deep learning, and R-CNNs
Deep Convolutional Nets
Deep learning using Caffe
Logan Lebanoff Mentor: Haroon Idrees
Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.
Convolutional Neural Network
Lecture 4a: Imagenet: Classification with Localization
Assignment 4: Deep Convolutional Neural Networks
Lecture 2c: Caffe: Training of CIFAR-10
Lecture 3a Analysis of training of NN
Deep Residual Learning for Image Recognition
Presented by Yuting Liu
Analysis of Sparse Convolutional Neural Networks
CSE 190 Caffe Tutorial.
Artificial Neural Networks
Gradient-based Learning Applied to Document Recognition
Computer Science and Engineering, Seoul National University
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Matt Gormley Lecture 16 October 24, 2016
Deep Learning Libraries
Training Techniques for Deep Neural Networks
First Steps With Deep Learning Course.
Convolutional Networks
Understanding the Difficulty of Training Deep Feedforward Neural Networks Qiyue Wang Oct 27, 2017.
Machine Learning: The Connectionist
Introduction to CuDNN (CUDA Deep Neural Nets)
Neural Networks and Backpropagation
Deep Learning Convoluted Neural Networks Part 2 11/13/
Fully Convolutional Networks for Semantic Segmentation
Brewing Deep Networks With Caffe
Introduction to Neural Networks
Deep Learning Packages
Tensorflow in Deep Learning
Gradient Checks for ANN
Counting in Dense Crowds using Deep Learning
Convolutional Neural Networks
Construct a Convolutional Neural Network with Python
Very Deep Convolutional Networks for Large-Scale Image Recognition
Smart Robots, Drones, IoT
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
Generative Adversarial Network
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
On Convolutional Neural Network
Lecture: Deep Convolutional Neural Networks
Papers 15/08.
Use 3D Convolutional Neural Network to Inspect Solder Ball Defects
Analysis of Trained CNN (Receptive Field & Weights of Network)
实习生汇报 ——北邮 张安迪.
ImageNet Classification with Deep Convolutional Neural Networks
Convolutional Neural Network
Deep Object Co-Segmentation
Deep Learning Libraries
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Debasis Bhattacharya, JD, DBA University of Hawaii Maui College
CSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning
Principles of Back-Propagation
Presentation transcript:

Lecture 1c: Caffe - Getting Started boris.ginsburg@gmail.com

Agenda Caffe installation Caffe internals MNIST training Digits Import of Dataset Test description Network topology definition Layers( ) Internal data format ( ) MNIST training Digits Implementation details of Convolutional layer

Exercises & Projects Build caffe Play with MNIST topologies & layers Change to CPU Change atlas to openblas Play with MNIST topologies & layers How does the net accuracy depend on topology? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the definition of following layers: Maxout, normalization layer Convolutional layer internals

Open-source Deep Learning libraries http://caffe.berkeleyvision.org/ C++/ CUDA, Python and Matlab wrappers, + easily extendable ; NVIDIA Digits http://torch.ch/ Excellent tutorial, C++/Cuda; Lua. https://code.google.com/p/cuda-convnet2/ http://deeplearning.net/software/pylearn2/ Integrated with Theano, C++/Cuda, Python http://torontodeeplearning.github.io/convnet/

Caffe: installation HW: https://timdettmers.wordpress.com/2015/03/09/deep-learning-hardware-guide/ Nvidia – optional, can run on CPU only OS: Ubuntu 14.04 (12.04) , OS X (Windows – unofficial) CUDA 7.0 (6.5) + cuDNN Install caffe: http://caffe.berkeleyvision.org/ works out of box

Caffe: make Build caffe for CPU Get caffe source: $ git clone https://github.com/BVLC/caffe Copy makefile.config and $ make -j Tutorial (MNIST. CIFAR-10, Imagenet): http://caffe.berkeleyvision.org/gathered/examples/mnist.html http://caffe.berkeleyvision.org/gathered/examples/cifar10.html Build caffe for CPU Change to CPU_only and BLAS:= openblas

Homework Play with MNIST topologies & layers How does accuracy depend on topology? How speed and accuracy depend on batch size? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the implementation of following layers: Data layer Convolutional layer IP layer Soft-max and accuracy

Caffe: example 1 - MNIST Database: http://yann.lecun.com/exdb/lenet/index.html

Caffe: Step 1 - import datasets 3 examples of data conversion from images to caffe Mnist: Dataset http://yann.lecun.com/exdb/mnist/ …/examples/mnist/convert_mnist_data.cpp CIFAR-10: Dataset : http://www.cs.toronto.edu/~kriz/cifar.html …/examples/mnist/convert_cifar_data.cpp Imagenet: Dataset http://www.image-net.org/challenges/LSVRC/2014/ …tools/convert_imageset.cpp

Caffe: Database format leveldb: https://code.google.com/p/leveldb/ <key,value>: arbitrary byte arrays; data is stored sorted by key; callers can provide a custom comparison function to override the sort order. basic operations: Put(key,value), Get(key), Delete(key). lmdb: http://symas.com/mdb/ (in dev branch) <key;value> ; data is stored sorted by key uses memory-mapped files: the read performance of in-memory db while still offering the persistence of standard disk-based db concurrent HDF5 (Hierarchical Data Format 5) http://www.hdfgroup.org/HDF5/

Caffe: configuration files You should define train and test configuration Solver parameters file : …/examples/mnist/lenet_solver.prototxt Network descriptor for training and testing: …/examples/mnist/lenet_train_test.prototxt Descriptors for parameters are in …/src/caffe/proto/caffe.proto . Protobuf tool (Google protocol buffers) will generate C++ classes from these descriptors https://developers.google.com/protocol-buffers/docs/overview

LeNet topology Soft Max Inner Product BACKWARD FORWARD ReLUP Pooling [2x2, stride 2] Convolutional layer [5x5] Pooling [2x2, stride 2] Convolutional layer [5x5] Data Layer

Layer:: Forward( ) } Layer::Forward( ) propagate yl-1 to next layer: class Layer { Setup (bottom, top); // initialize layer Forward (bottom, top); //compute next layer Backward( top, bottom); //compute gradient } Layer::Forward( ) propagate yl-1 to next layer: 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1

Data Layer data label name: "mnist" type: DATA data_param { source: "mnist-train-leveldb" batch_size: 64 scale: 0.00390625 } top: "data" top: "label" mnist-train-leveldb label

Blob All data is stored as BLOBs – Binary (Basic) Large Objects class Blob { Blob( int num, int channels, int height, int width); const Dtype* cpu_data() const; const Dtype* gpu_data() const; … protected: shared_ptr<SyncedMemory> data_; // containter for cpu_ / gpu_memory shared_ptr<SyncedMemory> diff_; // gradient int num_; int channels_; int height_; int width_; int count_; }

SynchedMemory class SyncedMemory { public: SyncedMemory() const void* cpu_data(); const void* gpu_data(); void* mutable_cpu_data(); void* mutable_gpu_data(); enum SyncedHead { UNINITIALIZED, HEAD_AT_CPU, HEAD_AT_GPU, SYNCED }; SyncedHead head() { return head_; } size_t size() { return size_; }

SynchedMemory class SyncedMemory { … private: void to_cpu(); void to_gpu(); void* cpu_ptr_; void* gpu_ptr_; size_t size_; SyncedHead head_; }; // class SyncedMemory

Convolutional Layer Data name: "conv1" type: CONVOLUTION conv1 blobs_lr: 1. convolution_param { num_output: 20 kernelsize: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "data" top: "conv1” conv1 conv1 Data conv1 conv1

Convolutional Layer for (n = 0; n < N; n++) for (m = 0; m < M; m ++) for(y = 0; y<Y; y++) for(x = 0; x<X; x++) for (p = 0; p< K; p++) for (q = 0; q< K; q++) yL (n; x, y) += yL-1(m, x+p, y+q) * w (m , n; p, q); Add bias…

Pooling Layer name: "pool1" type: POOLING pooling_param { kernel_size: 2 stride: 2 pool: MAX } bottom: "conv1" top: "pool1" for (p = 0; p< k; p++) for (q = 0; q< k; q++) yL (x, y) = max ( yL(x, y), yL-1(x*s + p, y*s + q) ) ; Pooling helps to extract features that are increasingly invariant to local transformations of the input image.

Inner product (Fully Connected) Layer name: "ip1" type: INNER_PRODUCT blobs_lr: 1. blobs_lr: 2. inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "pool2" top: "ip1" YL (n) = ∑ WL(n, m) * YL-1 (m)

ReLU Layer YL (n; x, y) = max( YL-1(n; x, y), 0 ); layers { name: "relu1" type: RELU bottom: "ip1" top: "ip1" } YL (n; x, y) = max( YL-1(n; x, y), 0 );

SoftMax + Loss Layer Combines softmax: E = - log (YL-(label (n) ) layers { name: "loss" type: SOFTMAX_LOSS bottom: "ip2" bottom: "label" } label X[0..9] Combines softmax: YL [i] = exp (YL-1[i] ) / ( ∑ exp (YL-[i] ); with log-loss : E = - log (YL-(label (n) )

Digits https://developer.nvidia.com/digits Nice development environment built on top of caffe: Easy installation Manage datasets and models Visualize training Look inside layers Open source

Backup: Convolutional layer internals

Conv layer:: Forward() template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

Convolutional Layer : im2col Implementation of Convolutional Layer is based on 2 tricks: reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

Convolutional Layer : im2col See Chellapilla, “High Performance Convolutional Neural Networks for Document Processing” for more details:

Convolutional Layer: im2col

Im2col: Exercise Workout these 2 cases based on the example above: 2 input feature, 1output feature 2 input features , 3 ouput features