Deep Dive with Microsoft Cognitive Toolkit (CNTK) 6/26/2018 1:18 PM Deep Dive with Microsoft Cognitive Toolkit (CNTK) Cha Zhang Principal Applied Science Manager Microsoft AI & Research © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6/26/2018 1:18 PM Introduction © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Microsoft Cognitive Toolkit (CNTK) 6/26/2018 1:18 PM Microsoft Cognitive Toolkit (CNTK) Microsoft’s open-source deep-learning toolkit https://github.com/Microsoft/CNTK Created by Microsoft Speech researchers in 2012, “Computational Network Toolkit” Open sourced on GitHub since Jan 2016 (MIT license) 12,000+ GitHub stars Third among all DL toolkits 140+ contributors from MIT, Stanford, Nvidia, Intel, and many others © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Microsoft Cognitive Toolkit (CNTK) 6/26/2018 1:18 PM Microsoft Cognitive Toolkit (CNTK) Runs over 80% Microsoft internal DL workload Cortana HoloLens Bing and Bing Ads Skype Translator Windows Office Microsoft Cognitive Services Microsoft Research Internal == External © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Microsoft Cognitive Toolkit (CNTK) 6/26/2018 1:18 PM Microsoft Cognitive Toolkit (CNTK) 1st-class on Linux and Windows Docker support Rich API support Mostly implemented in C++ (train and eval) Low level + high level Python API R and C# API for train and eval UWP, Java and Spark support Built-in readers for distributed learning Keras backend support (Beta) Model compression (Fast binarized evaluation) Latest version: v2.2 (Sep. 15, 2017) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Toolkit Benchmark http://dlbench.comp.hkbu.edu.hk/ 6/26/2018 1:18 PM Toolkit Benchmark Caffe: 1.0rc5(39f28e4) CNTK: 2.0 Beta10(1ae666d) MXNet: 0.93(32dc3a2) TensorFlow: 1.0(4ac9c09) Torch: 7(748f5e3) http://dlbench.comp.hkbu.edu.hk/ Benchmarking by HKBU, Version 8 Single Tesla K80 GPU, CUDA: 8.0 CUDNN: v5.1 Caffe Cognitive Toolkit MxNet TensorFlow Torch FCN5 (1024) 55.329ms 51.038ms 60.448ms 62.044ms 52.154ms AlexNet (256) 36.815ms 27.215ms 28.994ms 103.960ms 37.462ms ResNet (32) 143.987ms 81.470ms 84.545ms 181.404ms 90.935ms LSTM (256) (v7 benchmark) - 43.581ms (44.917ms) 288.142ms (284.898ms) (223.547ms) 1130.606ms (906.958ms) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Production Ready Scalability
Toolkit Delivering Near-Linear Multi-GPU Scaling Microsoft Cognitive Toolkit First Deep Learning Framework Fully Optimized for Pascal Toolkit Delivering Near-Linear Multi-GPU Scaling AlexNet Performance 170x Faster v. CPU Server images / sec AlexNet training batch size 128, Grad Bit = 32, Dual socket E5-2699v4 CPUs (total 44 cores) CNTK 2.0b3 (to be released) includes cuDNN 5.1.8, NCCL 1.6.1, NVLink enabled
Released in CNTK v2.2 (Sep. 2017) Scale with NCCL 2.0 GTC, May 2017 Released in CNTK v2.2 (Sep. 2017)
Scale to 1,000 GPUs
6/26/2018 1:18 PM CNTK Overview © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Installation $ pip install <url> Microsoft Build 2017 6/26/2018 1:18 PM Installation $ pip install <url> https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Tutorials Microsoft Build 2017 6/26/2018 1:18 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Tutorials on Azure (Free Pre-Hosted) https://notebooks.azure.com/cntk/libraries/tutorials
https://github.com/Microsoft/CNTK
CNTK expresses (nearly) arbitrary neural networks by composing simple building blocks into complex computational networks, supporting relevant network types and applications.
What is CNTK? Example: 2-hidden layer feed-forward NN h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1) h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2) P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout) with input x RM and one-hot label L RM and cross-entropy training criterion ce = LT log P ce = cross_entropy (L, P) Scorpusce = max
What is CNTK? Example: 2-hidden layer feed-forward NN h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1) h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2) P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout) with input x RM and one-hot label y RJ and cross-entropy training criterion ce = yT log P ce = cross_entropy (L, P) Scorpusce = max
What is CNTK? example: 2-hidden layer feed-forward NN h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1) h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2) P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout) with input x RM and one-hot label y RJ and cross-entropy training criterion ce = yT log P ce = cross_entropy (P, y) Scorpusce = max
What is CNTK? h1 = sigmoid (x @ W1 + b1) h2 = sigmoid (h1 @ W2 + b2) P = softmax (h2 @ Wout + bout) ce = cross_entropy (P, y)
What is CNTK? h1 = sigmoid (x @ W1 + b1) h2 = sigmoid (h1 @ W2 + b2) ce h1 = sigmoid (x @ W1 + b1) h2 = sigmoid (h1 @ W2 + b2) P = softmax (h2 @ Wout + bout) ce = cross_entropy (P, y) cross_entropy P softmax bout + Wout • h2 s b2 + W2 • h1 s b1 + W1 • x y
What is CNTK? Nodes: functions (primitives) Edges: values Can be composed into reusable composites Edges: values Incl. tensors, sparse Automatic differentiation ∂F / ∂in = ∂F / ∂out ∙ ∂out / ∂in Deferred computation execution engine Editable, clonable LEGO-like composability allows CNTK to support wide range of networks & applications ce cross_entropy P softmax bout + Wout • h2 s b2 + W2 • h1 s b1 + W1 • x y
CNTK Workflow trainer reader network • SGD (momentum, Adam, …) Microsoft Build 2017 6/26/2018 1:18 PM CNTK Workflow Script configure and executes through CNTK Python APIs… trainer • SGD (momentum, Adam, …) • minibatching reader • minibatch source • task-specific deserializer • automatic randomization • distributed reading corpus model network • model function • criterion function • CPU/GPU execution engine • packing, padding © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
As Easy as 1-2-3 from cntk import * # reader def create_reader(path, is_training): ... # network def create_model_function(): def create_criterion_function(model): # trainer (and evaluator) def train(reader, model): def evaluate(reader, model): # main function model = create_model_function() reader = create_reader(..., is_training=True) train(reader, model) reader = create_reader(..., is_training=False) evaluate(reader, model)
Workflow Prepare data Configure reader, network, learner (Python) Train: python my_cntk_script.py
Prepare Data: Reader def create_reader(map_file, mean_file, is_training): # image preprocessing pipeline transforms = [ ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio') ImageDeserializer.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear'), ImageDeserializer.mean(mean_file) ] # deserializer return MinibatchSource(ImageDeserializer(map_file, StreamDefs( features = StreamDef(field='image', transforms=transforms), ' labels = StreamDef(field='label', shape=num_classes) )), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)
Prepare Data: Reader def create_reader(map_file, mean_file, is_training): # image preprocessing pipeline transforms = [ ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio') ImageDeserializer.scale(width=image_width, height=image_height, channels=num_channels, interpolations='linear'), ImageDeserializer.mean(mean_file) ] # deserializer return MinibatchSource(ImageDeserializer(map_file, StreamDefs( features = StreamDef(field='image', transforms=transforms), ' labels = StreamDef(field='label', shape=num_classes) )), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP) Automatic on-the-fly randomization important for large data sets Readers compose, e.g. image text caption
Prepare Network: Multi-Layer Perceptron Microsoft Build 2017 6/26/2018 1:18 PM Prepare Network: Multi-Layer Perceptron Model z = model(x): h1 = Dense(400, act = relu)(x) h2 = Dense(200, act = relu)(h1) r = Dense(10, act = None)(h2) return r Loss cross_entropy_with_softmax(z,Y) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Prepare Network: Convolutional Neural Network Microsoft Build 2017 6/26/2018 1:18 PM Prepare Network: Convolutional Neural Network 28 pix z = model(x): h = Convolution2D((5,5),filt=8, …)(x) h = MaxPooling(…)(h) h = Convolution2D ((5,5),filt=16, …)((h) r = Dense(output_classes, act= None)(h) return r Model © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
An Sequence Example (many to many + 1:1) Problem: Tagging entities in Air Traffic Controller (ATIS) data Rec show o Rec burbank From_city Rec to o Rec seattle To_city Rec flights o Rec tomorrow Date http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Prepare Network: Recurrent Neural Network 1 𝑥 (t) E i = 943 O= 150 L i = 150 O= 300 D i = 300 O= 129 a = sigmoid ℎ (t-1) ℎ (t) 𝑦 (t) Text token Class label 1 x 943 z = model(): return Sequential([ Embedding(emb_dim=150), Recurrence(LSTM(hidden_dim=300), go_backwards=False), Dense(num_labels = 129) ])
Prepare Learner Many built-in learners Microsoft Build 2017 6/26/2018 1:18 PM Prepare Learner Many built-in learners SGD, SGD with momentum, Adagrad, RMSProp, Adam, Adamax, AdaDelta, etc. Specify learning rate schedule and momentum schedule If wanted, specify minibatch size schedule lr_schedule = C.learning_rate_schedule([0.05]*3 + [0.025]*2 + [0.0125], C.UnitType.minibatch, epoch_size=100) sgd_learner = C.sgd(z.parameters, lr_schedule) © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Overall Train Workflow z = model(): return Sequential([ Embedding(emb_dim=150), Recurrence(LSTM(hidden_dim=300), go_backwards=False), Dense(num_labels = 129) ]) . #1 #2 #3 #96 Input feature ( 96 x 𝑥 (t)) t23 t1 t15 t9 t12 ATIS Train (mini-batch) 96 samples Loss cross_entropy_with_softmax(z,Y) One-hot encoded Label (Y: 96 x 129/sample Or word in sequence) Error classification_error(z,Y) Choose a learner (SGD, Adam, adagrad etc.) Trainer(model, (loss, error), learner) Trainer.train_minibatch({X, Y})
Distributed training Prepare data Configure reader, network, learner (Python) Train: -- distributed! mpiexec --np 16 --hosts server1,server2,server3,server4 \ python my_cntk_script.py
6/26/2018 1:18 PM CNTK Deep Dive © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
CNTK Unique Features Symbolic loops over sequences with dynamic scheduling Turn graph into parallel program through minibatching Unique parallel training algorithms (1-bit SGD, Block Momentum)
Symbolic Loops over Sequential Data Extend our example to a recurrent network (RNN) h1(t) = s(W1 x(t) + H1 h1(t-1) + b1) h1 = sigmoid(x @ W1 + past_value(h1) + b1) h2(t) = s(W2 h1(t) + H2 h2(t-1) + b2) h2 = sigmoid(h1 @ W2 + past_value(h2) @ H2 + b2) P(t) = softmax(Wout h2(t) + bout) P = softmax(h2 @ Wout + bout) ce(t) = LT(t) log P(t) ce = cross_entropy(P, L) Scorpusce(t) = max no explicit notion of time
Symbolic Loops over Sequential Data Extend our example to a recurrent network (RNN) h1(t) = s(W1 x(t) + R1 h1(t-1) + b1) h1 = sigmoid(x @ W1 + past_value(h1) @ R1 + b1) h2(t) = s(W2 h1(t) + R2 h2(t-1) + b2) h2 = sigmoid(h1 @ W2 + past_value(h2) @ R2 + b2) P(t) = softmax(Wout h2(t) + bout) P = softmax(h2 @ Wout + bout) ce(t) = LT(t) log P(t) ce = cross_entropy(P, L) Scorpusce(t) = max
Symbolic Loops over Sequential Data • + s softmax W1 b1 Wout bout cross_entropy h1 P x y ce R1 z-1 W2 b2 h2 R2 h1 = sigmoid(x @ W1 + past_value(h1) @ R1 + b1) h2 = sigmoid(h1 @ W2 + past_value(h2) @ R2 + b2) P = softmax(h2 @ Wout + bout) ce = cross_entropy(P, L) CNTK automatically unrolls cycles at execution time cycles are detected with Tarjan’s algorithm Efficient and composable
Batch-Scheduling of Variable-Length Sequences Minibatches containing sequences of different lengths are automatically packed and padded CNTK handles the special cases: past_value operation correctly resets state and gradient at sequence boundaries non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”) sequence reductions time steps computed in parallel sequence 1 sequence 2 sequence 3 padding parallel sequences sequence 3 sequence 7 sequence 5 sequence 6
Batch-Scheduling of Variable-Length Sequences Minibatches containing sequences of different lengths are automatically packed and padded CNTK handles the special cases: past_value operation correctly resets state and gradient at sequence boundaries non-recurrent operations just pretend there is no padding (“garbage-in/garbage-out”) sequence reductions time steps computed in parallel sequence 1 sequence 2 sequence 3 padding parallel sequences sequence 4 schedule into the same slot, it may come for free! sequence 7 sequence 5 sequence 6
Batch-Scheduling of Variable-Length Sequences Minibatches containing sequences of different lengths are automatically packed and padded Fully transparent batching Recurrent CNTK unrolls, handles sequence boundaries Non-recurrent operations parallel Sequence reductions mask time steps computed in parallel sequence 1 sequence 2 sequence 3 padding parallel sequences sequence 4 sequence 7 sequence 5 sequence 6
Data-Parallel Training Data-parallelism: distribute minibatch over workers, all-reduce partial gradients all-reduce minibatch (sample frames) node 1 node 2 node 3 each node computes sub-mini-batch sub-gradient of one layer S
Data-Parallel Training Data-parallelism: distribute minibatch over workers, all-reduce partial gradients minibatch (sample frames) node 1 node 2 node 3 each node computes sub-mini-batch sub-gradient of one layer 1/K-th stripe (K-1) concurrent transfers of 1/K-th data step 1: each node owns aggregating a stripe
Data-Parallel Training Data-parallelism: distribute minibatch over workers, all-reduce partial gradients minibatch (sample frames) node 1 node 2 node 3 each node computes sub-mini-batch sub-gradient of one layer ring algorithm O(2 (K-1)/K M) O(1) w.r.t. K step 1: each node owns aggregating a stripe step 2: aggregated stripes are redistributed
Data-Parallel Training Data-parallelism: distribute minibatch over workers, all-reduce partial gradients O(1) enough? Example: DNN, MB size 1024, 160M model parameters compute per MB: 1/7 second communication per MB: 1/9 second (640M over 6 GB/s) can’t even parallelize to 2 GPUs: communication cost already dominates!
Data-Parallel Training How to reduce communication cost: communicate less each time 1-bit SGD: [F. Seide, H. Fu, J. Droppo, G. Li, D. Yu: “1-Bit Stochastic Gradient Descent...Distributed Training of Speech DNNs”, Interspeech 2014] Quantize gradients to 1 bit per value Trick: carry over quantization error to next minibatch communicate less often Automatic MB sizing [F. Seide, H. Fu, J. Droppo, G. Li, D. Yu: “ON Parallelizability of Stochastic Gradient Descent...”, ICASSP 2014] Block momentum [K. Chen, Q. Huo: “Scalable training of deep learning machines by incremental block training…,” ICASSP 2016] Very recent, very effective parallelization method Combines model averaging with error-residual idea
CNTK C# and R Bindings 6/26/2018 1:18 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
C# API Overview Basic C# Training API Is Added In CNTK 2.2 Release Microsoft Build 2017 6/26/2018 1:18 PM C# API Overview Basic C# Training API Is Added In CNTK 2.2 Release Supports Building of DNNs Supports DNN Training Supports DNN Evaluation End-To-End Examples in C# © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
CNTK C# API Highlights Over 100 Basic Operations To Build A Computation Network. Sigmoid, Tanh, ReLU, Plus, Minus, Convolution, Pooling, BatchNormalization… For example: Logistic Regression Loss Function: Function z = CNTKLib.Times(weightParam, input) + biasParam; Function loss = CNTKLib.CrossEntropyWithSoftmax(z, labelVariable); CNTK Function As A Primitive Element To Build A DNN Build a DNN through basic operation composition. For example to build a ResNet node: Function conv = CNTKLib.Pooling(CNTKLib.Convolution(convParam, input), PoolingType.Average, poolingWindowShape); Function resNetNode = CNTKLib.ReLU(CNTKLib.Plus(conv, input));
CNTK C# API Highlights Batching Supports Training Supports Provides MinibatchSource and MinibacthData to help data loading and batching Training Supports Supports SGD optimizers commonly seen in DNN literatures: MomentumSGDLearner, AdamLearner, AdaGradLearner, etc. For example, to train a model with a ADAM Stochastic Optimizer: var parameterLearners = new List<Learner>() { Learner.AdamLearner(classifierOutput.Parameters(), learningRate, momentum) }; var trainer = Trainer.CreateTrainer(classifierOutput, trainingLoss, prediction, parameterLearners); Evaluation Supports
Hand-writing Data Classification (MNIST) Build a Convolution Model Function conv = CNTKLib.ReLU(CNTKLib.Convolution(convParams, features, strides )); Function pooling = CNTKLib.Pooling(conv, PoolingType.Max, poolingWindow, stride, padding); Function classifier = TestHelper.Dense(pooling, numClasses, device, Activation.None); Prepare Data var minibatchSource = MinibatchSource.TextFormatMinibatchSource("Train_cntk_text.txt"), streamConfigurations, MinibatchSource.InfinitelyRepeat);
Hand-writing Data Classification(MNIST) Cont. Training (One Iteration) var minibatchData = minibatchSource.GetNextMinibatch(minibatchSize, device); var arguments = new Dictionary<Variable, MinibatchData> { { input, minibatchData[featureStreamInfo] }, { labels, minibatchData[labelStreamInfo] } }; trainer.TrainMinibatch(arguments, device);
CNTK C# API Examples More Examples At: https://github.com/Microsoft/CNTK/tree/master/Examples/TrainingCSharp CifarResNetClassifier: Shows how to do image classification using ResNet. Transfer learning: Demonstrates transfer learning using a pretrained ResNet model. LSTMSequenceClassifier: Shows how to build and train a recurrent neural network model.
CNTK R-Binding devtools::install_github("Microsoft/CNTK-R") Comprehensive coverage of CNTK functionality Readers (CTF, Image, HTK) Layers library (fully-connected, convolution, pooling, dropout, LSTM, GRU, …) Loss functions Normalizers Optimizers (including 1-bit SGD) Model saving and loading Exposed via idiomatic R functional APIs Interop with R objects and packages like ggplot Fluent syntax for constructing layers based on magittr High performance, including support for multi-node and multi-GPU
6/26/2018 1:18 PM Implementation R bindings based on reticulate package (https://rstudio.github.io/reticulate/) Enables importing and calling Python functions from R Ensures full parity with functionality in Python bindings (such as layers library) Enables interop between R objects and NumPy However, requires full installation of Python bindings Converting back and forth between R and Python objects expensive However, not an issue if only done at the beginning and end of training Or using the native Python readers and deserializers © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
CIFAR-10 Training Example Import libraries Define reader function
CIFAR-10 Training Example Define feature and label variables in the graph Low-level operations start with an “op” prefix Use functional notion instead of operator overloading Normalize the inputs
CIFAR-10 Training Example Define network with four convolution and two max-pool layers using the Layers library Notice use of the “For” loop construct from CNTK
CIFAR-10 Training Example Define loss function Initialize the learning algorithm z$parameters notation used for accessing properties of a Python object
CIFAR-10 Training Example Run the training algorithm until convergence Notice the use of fluent syntax using magittr pipe notation, “%>%”
CIFAR-10 Training Example Convergence plot using ggplot
Future Work More examples and vignettes Tighter R integration Particularly text and NLP Tighter R integration Formula objects Data frames and matrices Integration of trained models with summary and predict generic function Integration with TensorBoard
More Information Source code Vignettes and documentation https://github.com/Microsoft/CNTK-R Vignettes and documentation https://microsoft.github.io/CNTK-R/ Issues and feedback: https://github.com/Microsoft/CNTK-R/issues https://docs.microsoft.com/en-us/cognitive-toolkit/Feedback-Channels
CNTK Moving Forward 6/26/2018 1:18 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Open Neural Network Exchange (ONNX) An open source intermediate representation (IR) of computation graph (https://github.com/onnx/onnx) With defined common OPs and their semantics Released on Sep. 7, 2017 Collaboration between Microsoft and Facebook A share library with a Caffe2 example as reference Permissive MIT license and no patents
ONNX Motivation Allow interoperability between frameworks Starting with CNTK, Caffe2 and PyTorch Allow hardware vendor to focus on one IR in their backend optimization Allow train in one toolkit and deploy in another
ONNX Vision
Current Status V1 release in Github, focus on the basic Support only inference, no loop and no RNN Caffe2 and PyTorch support now, CNTK support early October In V2, we expect to add: RNN support Loop and control Model Zoo Look forward to more support from other frameworks
CNTK Upcoming New Features 6/26/2018 1:18 PM CNTK Upcoming New Features Model optimization and codegen for fast inference Model compression via SVD, quantization, etc. Fast inference via codegen C# API for training Low-level C# API released in v2.2 (Sep. 2017) In process to design high-level API © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
CNTK Upcoming New Features (cont.) 6/26/2018 1:18 PM CNTK Upcoming New Features (cont.) CrossTalk Porting models created from other toolkits to CNTK Caffe, Caffe2 and TensorFlow Will use ONNX as intermediate representation Dynamic graph support Data-dependent graph Ease of debugging More examples, better documentation © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6/26/2018 1:18 PM Conclusions © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Conclusions Performance Programmability 6/26/2018 1:18 PM Conclusions Performance Speed: faster than others, 5-10x faster on recurrent networks Scalability: few lines of change to scale to thousands of GPUs Built-in readers: efficient distributed readers Programmability Powerful C++ library for enterprise users Intuitive and performant Python APIs Supports C#/.NET, UWP, R, Java, etc. Extensible via user custom layers, learners, readers, etc. © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Thank You! https://github.com/Microsoft/CNTK 6/26/2018 1:18 PM Thank You! https://github.com/Microsoft/CNTK © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Please evaluate this session Your feedback is important to us! 6/26/2018 1:18 PM Please evaluate this session Your feedback is important to us! The slide will be replaced onsite through Silver Fox Productions with an updated QR code. This slide is required. Do NOT delete or alter the slide. From your PC or Tablet visit MyIgnite at http://myignite.microsoft.com From your phone download and use the Ignite Mobile App by scanning the QR code above or visiting https://aka.ms/ignite.mobileapp © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
6/26/2018 1:18 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.