PipeDream: Pipeline Parallelism for DNN Training

Slides:

Advertisements

Similar presentations

A Real Time Radiosity Architecture for Video Games

Advertisements

also known as the “Perceptron”

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

Chapter 9 Neural Network.

Backpropagation An efficient way to compute the gradient Hung-yi Lee.

Hartman1P1004 Leo Hartman Canadian Space Agency A VHDL Implementation of an On-board Autonomy Solution.

11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.

A Neural Network Implementation on the GPU By Sean M. O’Connell CSC 7333 Spring 2008.

M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)

Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,

ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.

Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.

Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.

Lecture 3a Analysis of training of NN

Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 10 – Circuit Switching and Packet Switching.

TensorFlow The Deep Learning Library You Should Be Using.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Stanford University.

TensorFlow– A system for large-scale machine learning

RNNs: An example applied to the prediction task

CSE 190 Caffe Tutorial.

Aaron Harlap, Alexey Tumanov*, Andrew Chung, Greg Ganger, Phil Gibbons

Chilimbi, et al. (2014) Microsoft Research

Week 3 (June 6 – June10 , 2016) Summary :

Neural Network Implementations on Parallel Architectures

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Parallel Programming By J. H. Wang May 2, 2017.

Parallel ODETLAP for Terrain Compression and Reconstruction

Matt Gormley Lecture 16 October 24, 2016

Photorealistic Image Colourization with Generative Adversarial Nets

COMP61011 : Machine Learning Ensemble Models

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.

GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server Henggang Cui Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons,

Systems for ML Clipper Gaia Training (TensorFlow)

Programmable Logic Controllers (PLCs) An Overview.

CS6890 Deep Learning Weizhen Cai

Neural Networks and Backpropagation

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

RNNs: Going Beyond the SRN in Language Prediction

Summary Background Introduction in algorithms and applications

Logistic Regression & Parallel SGD

XOR problem Input 2 Input 1

Relations vs. Functions Function Notation, & Evaluation

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Backpropagation.

COMP60621 Fundamentals of Parallel and Distributed Systems

LoGPC: Modeling Network Contention in Message-Passing Programs

Neural Networks Geoff Hulten.

Backpropagation.

Backpropagation.

Forward and Backward Max Pooling

Artificial Neural Networks

Gandiva: Introspective Cluster Scheduling for Deep Learning

Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Kriti shreshtha.

Parallel ODETLAP for Terrain Compression and Reconstruction

RNNs: Going Beyond the SRN in Language Prediction

Backpropagation Disclaimer: This PPT is modified based on

Artificial Neural Networks

Backpropagation.

COMP60611 Fundamentals of Parallel and Distributed Systems

David Kauchak CS158 – Spring 2019

低比特卷积神经网络的量化研究介绍主讲人：朱锋.

LHC beam mode classification

Artificial Neural Networks / Spring 2002

Mohsen Imani, Saransh Gupta, Yeseong Kim, Tajana Rosing

Deep CNN for breast cancer histology Image Analysis

Presentation transcript:

PipeDream: Pipeline Parallelism for DNN Training Aaron Harlap, Deepak Narayanan, Amar Phanishayee*, Vivek Seshadri*, Greg Ganger, Phil Gibbons

Example DNN 1000s

DNN - How they Work Good NFL Coach? Yes: 12% No: 88% Yes: 99% No: 1% Input Output

DNN Training - How do they Learn Forward Pass - Make a Prediction Input Output EVALUATE! Backward Pass - Update Solution Depending on Error

Data-Parallel Training Separate copy of model on each machine Communicate updates to model parameters various synchronization models can be used

Problem w/ Data Parallel Communication overhead can be high Less communication lowers quality of work Communication Computation VGG OverFeat Inception-BN GooLeNet

New Approach: Pipeline Parallel Assign layers to machines communicate inter-layer activations Machine 1 Machine 2 Machine 3

Alternate forward / backward work Each machine does forward work then backwards work, then forward work etc… Forward Compute Work #5 Backward Compute Work #1 Fw. Send #5 Backward Compute Work #2 Fw. Send #4 Backward Compute Work #3 Machine 1 Forward Compute Work #4 Machine 2 Bw. Send #1 Forward Compute Work #3 Machine 3 Bw. Send #2

One Quick Result - Pipeline 2.5x better VGG Network w/ ImageNet Dataset

Many More Interesting Details Staleness vs Speed trade-off How staleness in pipelining is different from data-parallel staleness How to divide works amongst machines GPU Memory Management This and more at our poster!

Summary New way to parallelize for DNN training Communicate layer activations instead of model parameters Improves over data-parallel for networks w/ large solution state