Lecture 1c: Caffe - Getting Started

Slides:

Advertisements

Similar presentations

Lecture 6: Classification & Localization

Advertisements

Lecture 5: CNN: Regularization

Lecture 2: Caffe: getting started Forward propagation

Lecture 7: Caffe: GPU Optimization

Lecture 3: CNN: Back-propagation

Spatial Pyramid Pooling in Deep Convolutional

Lecture 8: Caffe - CPU Optimization

Brewing Deep Networks With Caffe

ECE6504 – Deep Learning for Perception

Basic User Guide 1 Installation Data preparation Examples – Convolutional Neural Network (CNN) Dataset: CIFAR-10 Single Worker / Synchronous / Downpour.

Deep Learning for Julia Chiyuan Zhang CSAIL, MIT

Object detection, deep learning, and R-CNNs

Deep Convolutional Nets

Deep learning using Caffe

Logan Lebanoff Mentor: Haroon Idrees

Object Recognition Tutorial Beatrice van Eden - Part time PhD Student at the University of the Witwatersrand. - Fulltime employee of the Council for Scientific.

Convolutional Neural Network

Lecture 4a: Imagenet: Classification with Localization

Assignment 4: Deep Convolutional Neural Networks

Lecture 2c: Caffe: Training of CIFAR-10

Lecture 3a Analysis of training of NN

Deep Residual Learning for Image Recognition

Presented by Yuting Liu

Analysis of Sparse Convolutional Neural Networks

CSE 190 Caffe Tutorial.

Artificial Neural Networks

Gradient-based Learning Applied to Document Recognition

Computer Science and Engineering, Seoul National University

Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.

Matt Gormley Lecture 16 October 24, 2016

Deep Learning Libraries

Training Techniques for Deep Neural Networks

First Steps With Deep Learning Course.

Convolutional Networks

Understanding the Difficulty of Training Deep Feedforward Neural Networks Qiyue Wang Oct 27, 2017.

Machine Learning: The Connectionist

Introduction to CuDNN (CUDA Deep Neural Nets)

Neural Networks and Backpropagation

Deep Learning Convoluted Neural Networks Part 2 11/13/

Fully Convolutional Networks for Semantic Segmentation

Brewing Deep Networks With Caffe

Introduction to Neural Networks

Deep Learning Packages

Tensorflow in Deep Learning

Gradient Checks for ANN

Counting in Dense Crowds using Deep Learning

Convolutional Neural Networks

Construct a Convolutional Neural Network with Python

Very Deep Convolutional Networks for Large-Scale Image Recognition

Smart Robots, Drones, IoT

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning

Generative Adversarial Network

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

On Convolutional Neural Network

Lecture: Deep Convolutional Neural Networks

Use 3D Convolutional Neural Network to Inspect Solder Ball Defects

Analysis of Trained CNN (Receptive Field & Weights of Network)

实习生汇报 ——北邮张安迪.

ImageNet Classification with Deep Convolutional Neural Networks

Convolutional Neural Network

Deep Object Co-Segmentation

Deep Learning Libraries

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Debasis Bhattacharya, JD, DBA University of Hawaii Maui College

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning

Principles of Back-Propagation

Presentation transcript:

Lecture 1c: Caffe - Getting Started boris.ginsburg@gmail.com

Agenda Caffe installation Caffe internals MNIST training Digits Import of Dataset Test description Network topology definition Layers( ) Internal data format ( ) MNIST training Digits Implementation details of Convolutional layer

Exercises & Projects Build caffe Play with MNIST topologies & layers Change to CPU Change atlas to openblas Play with MNIST topologies & layers How does the net accuracy depend on topology? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the definition of following layers: Maxout, normalization layer Convolutional layer internals

Open-source Deep Learning libraries http://caffe.berkeleyvision.org/ C++/ CUDA, Python and Matlab wrappers, + easily extendable ; NVIDIA Digits http://torch.ch/ Excellent tutorial, C++/Cuda; Lua. https://code.google.com/p/cuda-convnet2/ http://deeplearning.net/software/pylearn2/ Integrated with Theano, C++/Cuda, Python http://torontodeeplearning.github.io/convnet/

Caffe: installation HW: https://timdettmers.wordpress.com/2015/03/09/deep-learning-hardware-guide/ Nvidia – optional, can run on CPU only OS: Ubuntu 14.04 (12.04) , OS X (Windows – unofficial) CUDA 7.0 (6.5) + cuDNN Install caffe: http://caffe.berkeleyvision.org/ works out of box

Caffe: make Build caffe for CPU Get caffe source: $ git clone https://github.com/BVLC/caffe Copy makefile.config and $ make -j Tutorial (MNIST. CIFAR-10, Imagenet): http://caffe.berkeleyvision.org/gathered/examples/mnist.html http://caffe.berkeleyvision.org/gathered/examples/cifar10.html Build caffe for CPU Change to CPU_only and BLAS:= openblas

Homework Play with MNIST topologies & layers How does accuracy depend on topology? How speed and accuracy depend on batch size? What will happen if we replace RELU with tanh? Add normalization layer Extra: look at the implementation of following layers: Data layer Convolutional layer IP layer Soft-max and accuracy

Caffe: example 1 - MNIST Database: http://yann.lecun.com/exdb/lenet/index.html

Caffe: Step 1 - import datasets 3 examples of data conversion from images to caffe Mnist: Dataset http://yann.lecun.com/exdb/mnist/ …/examples/mnist/convert_mnist_data.cpp CIFAR-10: Dataset : http://www.cs.toronto.edu/~kriz/cifar.html …/examples/mnist/convert_cifar_data.cpp Imagenet: Dataset http://www.image-net.org/challenges/LSVRC/2014/ …tools/convert_imageset.cpp

Caffe: Database format leveldb: https://code.google.com/p/leveldb/ <key,value>: arbitrary byte arrays; data is stored sorted by key; callers can provide a custom comparison function to override the sort order. basic operations: Put(key,value), Get(key), Delete(key). lmdb: http://symas.com/mdb/ (in dev branch) <key;value> ; data is stored sorted by key uses memory-mapped files: the read performance of in-memory db while still offering the persistence of standard disk-based db concurrent HDF5 (Hierarchical Data Format 5) http://www.hdfgroup.org/HDF5/

Caffe: configuration files You should define train and test configuration Solver parameters file : …/examples/mnist/lenet_solver.prototxt Network descriptor for training and testing: …/examples/mnist/lenet_train_test.prototxt Descriptors for parameters are in …/src/caffe/proto/caffe.proto . Protobuf tool (Google protocol buffers) will generate C++ classes from these descriptors https://developers.google.com/protocol-buffers/docs/overview

LeNet topology Soft Max Inner Product BACKWARD FORWARD ReLUP Pooling [2x2, stride 2] Convolutional layer [5x5] Pooling [2x2, stride 2] Convolutional layer [5x5] Data Layer

Layer:: Forward( ) } Layer::Forward( ) propagate yl-1 to next layer: class Layer { Setup (bottom, top); // initialize layer Forward (bottom, top); //compute next layer Backward( top, bottom); //compute gradient } Layer::Forward( ) propagate yl-1 to next layer: 𝑦 𝑙 =𝑓 𝑤 𝑙 , 𝑦 𝑙−1

Data Layer data label name: "mnist" type: DATA data_param { source: "mnist-train-leveldb" batch_size: 64 scale: 0.00390625 } top: "data" top: "label" mnist-train-leveldb label

Blob All data is stored as BLOBs – Binary (Basic) Large Objects class Blob { Blob( int num, int channels, int height, int width); const Dtype* cpu_data() const; const Dtype* gpu_data() const; … protected: shared_ptr<SyncedMemory> data_; // containter for cpu_ / gpu_memory shared_ptr<SyncedMemory> diff_; // gradient int num_; int channels_; int height_; int width_; int count_; }

SynchedMemory class SyncedMemory { public: SyncedMemory() const void* cpu_data(); const void* gpu_data(); void* mutable_cpu_data(); void* mutable_gpu_data(); enum SyncedHead { UNINITIALIZED, HEAD_AT_CPU, HEAD_AT_GPU, SYNCED }; SyncedHead head() { return head_; } size_t size() { return size_; }

SynchedMemory class SyncedMemory { … private: void to_cpu(); void to_gpu(); void* cpu_ptr_; void* gpu_ptr_; size_t size_; SyncedHead head_; }; // class SyncedMemory

Convolutional Layer Data name: "conv1" type: CONVOLUTION conv1 blobs_lr: 1. convolution_param { num_output: 20 kernelsize: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "data" top: "conv1” conv1 conv1 Data conv1 conv1

Convolutional Layer for (n = 0; n < N; n++) for (m = 0; m < M; m ++) for(y = 0; y<Y; y++) for(x = 0; x<X; x++) for (p = 0; p< K; p++) for (q = 0; q< K; q++) yL (n; x, y) += yL-1(m, x+p, y+q) * w (m , n; p, q); Add bias…

Pooling Layer name: "pool1" type: POOLING pooling_param { kernel_size: 2 stride: 2 pool: MAX } bottom: "conv1" top: "pool1" for (p = 0; p< k; p++) for (q = 0; q< k; q++) yL (x, y) = max ( yL(x, y), yL-1(x*s + p, y*s + q) ) ; Pooling helps to extract features that are increasingly invariant to local transformations of the input image.

Inner product (Fully Connected) Layer name: "ip1" type: INNER_PRODUCT blobs_lr: 1. blobs_lr: 2. inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" bottom: "pool2" top: "ip1" YL (n) = ∑ WL(n, m) * YL-1 (m)

ReLU Layer YL (n; x, y) = max( YL-1(n; x, y), 0 ); layers { name: "relu1" type: RELU bottom: "ip1" top: "ip1" } YL (n; x, y) = max( YL-1(n; x, y), 0 );

SoftMax + Loss Layer Combines softmax: E = - log (YL-(label (n) ) layers { name: "loss" type: SOFTMAX_LOSS bottom: "ip2" bottom: "label" } label X[0..9] Combines softmax: YL [i] = exp (YL-1[i] ) / ( ∑ exp (YL-[i] ); with log-loss : E = - log (YL-(label (n) )

Digits https://developer.nvidia.com/digits Nice development environment built on top of caffe: Easy installation Manage datasets and models Visualize training Look inside layers Open source

Backup: Convolutional layer internals

Conv layer:: Forward() template <typename Dtype> Dtype ConvolutionLayer<Dtype>::Forward_cpu( const vector<Blob<Dtype>*>& bottom, vector<Blob<Dtype>*>* top) { const Dtype* bottom_data = bottom[0]->cpu_data(); Dtype* top_data = (*top)[0]->mutable_cpu_data(); const Dtype* weight = this->blobs_[0]->cpu_data(); Dtype* col_data = col_buffer_.mutable_cpu_data(); ….

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, M_, N_, K_, 1., weight , col_data , 0., top_data + (*top)[0]->offset(n)); if (bias_term_) { caffe_cpu_gemm(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, 1., this->blobs_[1]->cpu_data(), bias_multiplier_->cpu_data(), 1., top_data + (*top)[0]->offset(n)); } return Dtype(0.);

Conv layer:: Forward() for (int n = 0; n < num_; ++n) { im2col_cpu(bottom_data + bottom[0]->offset(n), channels_, height_, width_, kernel_size_, pad_, stride_, col_data); for (int g = 0; g < group_; ++g) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, M_, N_, K_, (Dtype)1., weight + weight_offset * g, col_data + col_offset * g, (Dtype)0., top_data + (*top)[0]->offset(n) + top_offset * g); } if (bias_term_) { caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, num_output_, N_, 1, (Dtype)1., this->blobs_[1]->cpu_data(), reinterpret_cast<const Dtype*>(bias_multiplier_->cpu_data()), (Dtype)1., top_data + (*top)[0]->offset(n));

Convolutional Layer : im2col Implementation of Convolutional Layer is based on 2 tricks: reduction of convolution to matrix – matrix multiply Using BLAS gemm() for fast computation of matrix-matrix multiply Let’s discuss reduction to matrix – matrix multiply: im2col( ). We will talk about BLAS in details later.

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) X(1, 1) X(1, 2) X(1, 3) Y(1, 1) Y(1, 2) W(2, 1) W(2, 2) X(2, 1) X(2, 2) X(2, 3) Y(2, 1) Y(2, 2) X(3, 1) X(3, 2) X(3, 3)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) Y(1, 1) X(1, 2) X(2, 2) X(2, 1)

Convolutional Layer : im2col W(1, 1) W(1, 2) W(2, 1) W(2, 2) X(1, 1) X(1, 2) X(2, 1) X(2, 2) Y(1, 1) Y(1, 2) Y(2, 1) Y(2, 2) X(1, 2) X(1, 3) X(2, 2) X(2, 3) X(2, 2) X(2, 2) X(3, 2) X(3, 2) X(2, 1) X(2, 3) X(3, 2) A3, 3)

Convolutional Layer : im2col See Chellapilla, “High Performance Convolutional Neural Networks for Document Processing” for more details:

Convolutional Layer: im2col

Im2col: Exercise Workout these 2 cases based on the example above: 2 input feature, 1output feature 2 input features , 3 ouput features