Deep Learning for Julia Chiyuan Zhang CSAIL, MIT

Slides:



Advertisements
Similar presentations
Neural networks Introduction Fitting neural networks
Advertisements

Lecture 2: Caffe: getting started Forward propagation
Lecture 14 – Neural Networks
Dr. Chris Musselle – Consultant R Meets Julia Dr Chris Musselle.
ECE6504 – Deep Learning for Perception
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
Object detection, deep learning, and R-CNNs
Deep Convolutional Nets
M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
1 End-to-End Learning for Automatic Cell Phenotyping Paolo Emilio Barbano, Koray Kavukcuoglu, Marco Scoffier, Yann LeCun April 26, 2006.
CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,
Convolutional Neural Network
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Assignment 4: Deep Convolutional Neural Networks
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Big data classification using neural network
TensorFlow CS 5665 F16 practicum Karun Joseph, A Reference:
TensorFlow– A system for large-scale machine learning
Analysis of Sparse Convolutional Neural Networks
Convolutional Neural Network
CSE 190 Caffe Tutorial.
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Lecture 3. Fully Connected NN & Hello World of Deep Learning
ECE 5424: Introduction to Machine Learning
DeepCount Mark Lenson.
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Deep Learning Insights and Open-ended Questions
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Deep Learning for Natural Language Processing in R
Matt Gormley Lecture 16 October 24, 2016
Deep Learning Libraries
Classification with Perceptrons Reading:
Classification of Hand-Written Digits Using Scattering Convolutional Network Dongmian Zou Advisor: Professor Radu Balan.
A VERY Brief Introduction to Convolutional Neural Network using TensorFlow 李 弘
Comparison Between Deep Learning Packages
First Steps With Deep Learning Course.
CS6890 Deep Learning Weizhen Cai
Machine Learning: The Connectionist
Neural Networks and Backpropagation
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Torch 02/27/2018 Hyeri Kim Good afternoon, everyone. I’m Hyeri. Today, I’m gonna talk about Torch.
Introduction to Deep Learning for neuronal data analyses
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Fully Convolutional Networks for Semantic Segmentation
Handwritten Digits Recognition
INF 5860 Machine learning for image classification
Brewing Deep Networks With Caffe
Introduction to Neural Networks
Deep Learning Packages
Counting in Dense Crowds using Deep Learning
MXNet Internals Cyrus M. Vahid, Principal Solutions Architect,
Construct a Convolutional Neural Network with Python
Smart Robots, Drones, IoT
MNIST Dataset Training with Tensorflow
CSC 578 Neural Networks and Deep Learning
APACHE MXNET By Beni Mulyana.
Deep Learning and Mixed Integer Optimization
Convolutional networks
Neural Networks Geoff Hulten.
On Convolutional Neural Network
Object Tracking: Comparison of
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Deep Learning Libraries
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
CSC 578 Neural Networks and Deep Learning
Dhruv Batra Georgia Tech
Parallel Systems to Compute
Presentation transcript:

Deep Learning for Julia Chiyuan Zhang (@pluskid) CSAIL, MIT Mocha.jl Deep Learning for Julia Chiyuan Zhang (@pluskid) CSAIL, MIT

Julia Basics 10-minute Introduction to Julia

A glance of basic syntax Numpy Matlab Julia Description x[0] x(1) x[1] Index 1st elem of array np.random.randn(3,3) randn(3) randn(3,3) 3-by-3 random Gaussian matrix np.arange(1,11) 1:10 1,2,…,10 X * Y X .* Y Elementwise mul np.dot(X,Y) Matrix mul linalg.solve(X,Y) X \ Y Left matrix division d = {‘one’:1, ‘two’:2} d[‘one’] d = containers.Map({‘one’,’two’},{1,2}); d(‘one’) d = Dict("one"=>1,"two"=>2) d["one"] Hash table r = np.random.rand(*x.shape) y = x * (r > t) r = rand(size(x)) y = x .* (r > t) y = x .* (r .> t) Dropout f = lambda x, mu, sigma: np.exp(-(x-mu)**2/(2*sigma**2)) / sqrt(2*np.pi*sigma^2) f=@(x,mu,sigma) exp(-(x-mu)^2/(2*sigma^2)) / sqrt(2*pi*sigma^2) f(x,μ,𝛿)= exp(-(x-μ)^2/2𝛿^2)/sqrt(2π*𝛿^2) Gaussian density function

Beyond Syntax Close-to-C performance in native Julia code, typically do not need to explicitly vectorize your code (like what you have been doing in Matlab). Type annotation, LLVM-based just-in-time (JIT) compiler, easy parallelization with co-routine on single machine or over nodes of clusters; blah blah blah…

Convenient FFI Calling C/Fortran functions Calling Python functions (PyCall.jl, PyPlot.jl, IJulia, …) Interaction with C++ functions / objects directly, see Cxx.jl

Powerful Macros JuMP, optimization models OpenPPL, probabilistic programming

Disadvantages of Julia Still at early stage, so The ecosystem is still young (653 Julia packages vs. 66,687 PyPI packages) e.g. Images.jl still does not have a resize function… The core language is still evolving e.g. current v0.4-RC introduced a lot of breaking changes (and also exciting new features) ??

Mocha.jl Deep Learning in Julia

Image sources: Li Deng and Dong Yu Image sources: Li Deng and Dong Yu. Deep Learning – Methods and Applications. Zheng, Shuai et al. “Conditional Random Fields as Recurrent Neural Networks.” arXiv.org cs.CV (2015). Google Deep Mind. Human-level control through deep reinforcement learning. Nature, Feb. 2015. Andrej Karpathy and Li Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. CVPR 2015.

Why Deep Learning is Successful? Theoretical point of view Nowhere near a complete theoretical understanding of deep learning yet Practical point of view Big Data: large amount of (thank Internet) labeled (thank Amazon M-Turk), high- dimensional (large images, videos, speech and text corpus, etc.) Computational Power: GPUs, large clusters Human Power: the “deep learning conspiracy” Software Engineering: network architecture & computation components decoupled

Layers & back-propagate Top: Typical way of visualizing a neural network: clear and intuitive, but does not have well decomposition of computation into layers. Bottom: Alternative way of thinking about neural networks. Each layer is a black box that could carry out forward and backward computation. Important thing: the computation is complete encapsulated inside the layer, the black box does NOT need to know the external environment (e.g. the overall network architecture) to do the computation. e.g. Linear Layer (Input – Output) Forward: 𝑌= 𝜎(𝑊 𝑇 𝑋) Backward: 𝜕ℓ 𝜕𝑊 = 𝜕𝑌 𝜕𝑊 𝜕ℓ 𝜕𝑌 𝜕ℓ 𝜕𝑋 = 𝜕𝑌 𝜕𝑋 𝜕ℓ 𝜕𝑌 More generally, a deep neural network can be viewed as an directed acyclic graph (optionally with time-delayed recurrent connections)

Advantage of de-coupled view of NN Highly efficient computation components could be written by programmers and experts in high-performance computing and code optimization. E.g. cuDNN library from Nvidia Researchers can try out novel architectures easily without needing to worry too much about internal implementation of commonly used layers Some examples of complicated networks built with standard components: Network- in-Network, Siamese Networks, Fully-Convolutional Networks, etc. Image Source: J. Long, E. Shelhamer, T. Darrell. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015.

Deep Learning Libraries C++: Caffe (widely adopted in academia), dmlc/cxxnet, cuda-convnet, CNTK (by MSR), etc. Python: Theano (auto-differentiation) and various wrappers; NervanaSystems/neon; etc. Lua: Torch 7 (supported by Facebook and Google DeepMind) Matlab: MatConvNet (by VGG) Julia: pluskid/Mocha.jl …

Why Mocha.jl? Julia: written in Julia and easy interaction with the rest of the Julia ecosystem. Minimum dependency: the Julia backend runs out of the box. CUDA backend depends on Nvidia cuDNN. Correctness: all the computation components are unit-tested. Modular architecture: layers, activation functions, regularizers, network topology, solvers, etc. Julia compiles with LLVM, so potentially Julia code could be compiled directly to GPU devices in the future. After that, writing neural networks in Julia will be really enjoyable!

Image Classification IJulia Demo

Mini-Tutorial: ConvNets on MNIST MNIST: Handwritten digits Data preparation: Following convention, images are represented as 4D tensor: width-by-height-by- channels-by-count For MNIST: 28-by-28-by-1-by-64 Mocha.jl supports general ND tensors Data are stored in HDF5 file format Commonly supported by Matlab, Numpy, etc. See examples/mnist/gen-mnist.sh

Defining Network Architecture A network starts with data layers (inputs), and ends with prediction or loss layers data_layer = AsyncHDF5DataLayer(name="train-data", source="data/train.txt", batch_size=64, shuffle=true) Source file data/train.txt lists the HDF5 files for training set 64 images is provided for each mini-batch the data is shuffled to improve convergence async data layer use Julia’s @async to pre-read data while waiting for computation on CPU / GPU

Convolution Layer LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324. conv_layer = ConvolutionLayer(name="conv1", n_filter=20, kernel=(5,5), bottoms=[:data], tops=[:conv])

Pooling Layer pool_layer = PoolingLayer(name="pool1", kernel=(2,2), stride=(2,2), bottoms=[:conv], tops=[:pool]) Pooling layer operate on the output of convolution layer By default, MAX pooling is performed; can switch to MEAN pooling by specifying pooling=Pooling.Mean()

Constructing DAG with tops and bottoms Network architecture is determined by connecting tops (output) blobs to bottoms (input) blobs with matching blob names. Layers are automatically sorted and connected as a directed acyclic graph (DAG). The figure on the right shows the visualization of the LeNet for MNIST: conv-pool x2 + dense x2

Definition of the rest of the layers conv2_layer = ConvolutionLayer(name="conv2", n_filter=50, kernel=(5,5), bottoms=[:pool], tops=[:conv2]) pool2_layer = PoolingLayer(name="pool2", kernel=(2,2), stride=(2,2), bottoms=[:conv2], tops=[:pool2]) fc1_layer = InnerProductLayer(name="ip1", output_dim=500, neuron=Neurons.ReLU(), bottoms=[:pool2], tops=[:ip1]) fc2_layer = InnerProductLayer(name="ip2", output_dim=10, bottoms=[:ip1], tops=[:ip2]) loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip2, :label])

The Stochastic Gradient Descent Solver method = SGD() params = make_solver_parameters(method, max_iter=10000, regu_coef=0.0005, mom_policy=MomPolicy.Fixed(0.9), lr_policy=LRPolicy.Inv(0.01, 0.0001, 0.75), load_from=exp_dir) solver = Solver(method, params) Solvers have many customizable parameters, including learning-rate policy, momentum-policy, etc. Advanced policies like halfing the learning rate when performance on validation set drops are also supported. See Mocha.jl document for other available solvers.

Coffee Breaks … for the solver setup_coffee_lounge(solver, save_into="$exp_dir/statistics.jld", every_n_iter=1000) # report training progress every 100 iterations add_coffee_break(solver, TrainingSummary(), every_n_iter=100) # save snapshots every 5000 iterations add_coffee_break(solver, Snapshot(exp_dir), every_n_iter=5000)

Solver Statistics Solver statistics will be automatically saved if coffee lounge is set up. Snapshots save the training progress periodically, can resume training from the last snapshot after interruption.

Switching Backends: CPU vs GPU backend = use_gpu ? GPUBackend() : CPUBackend()

Thank you! http://julialang.org https://github.com/pluskid/Mocha.jl