PipeDream: Pipeline Parallelism for DNN Training

Slides:



Advertisements
Similar presentations
A Real Time Radiosity Architecture for Video Games
Advertisements

also known as the “Perceptron”
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Chapter 9 Neural Network.
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Hartman1P1004 Leo Hartman Canadian Space Agency A VHDL Implementation of an On-board Autonomy Solution.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
A Neural Network Implementation on the GPU By Sean M. O’Connell CSC 7333 Spring 2008.
M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)
Solving the straggler problem with bounded staleness Jim Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Gregory R. Ganger, Garth Gibson, Kimberly Keeton*,
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Joe Bradish Parallel Neural Networks. Background  Deep Neural Networks (DNNs) have become one of the leading technologies in artificial intelligence.
Lecture 3a Analysis of training of NN
Data and Computer Communications Eighth Edition by William Stallings Lecture slides by Lawrie Brown Chapter 10 – Circuit Switching and Packet Switching.
TensorFlow The Deep Learning Library You Should Be Using.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Stanford University.
TensorFlow– A system for large-scale machine learning
RNNs: An example applied to the prediction task
CSE 190 Caffe Tutorial.
Aaron Harlap, Alexey Tumanov*, Andrew Chung, Greg Ganger, Phil Gibbons
Deep Learning.
Chilimbi, et al. (2014) Microsoft Research
Week 3 (June 6 – June10 , 2016) Summary :
Neural Network Implementations on Parallel Architectures
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Parallel Programming By J. H. Wang May 2, 2017.
Parallel ODETLAP for Terrain Compression and Reconstruction
Matt Gormley Lecture 16 October 24, 2016
Photorealistic Image Colourization with Generative Adversarial Nets
COMP61011 : Machine Learning Ensemble Models
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.
GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server Henggang Cui Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons,
Systems for ML Clipper Gaia Training (TensorFlow)
Programmable Logic Controllers (PLCs) An Overview.
CS6890 Deep Learning Weizhen Cai
Neural Networks and Backpropagation
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
RNNs: Going Beyond the SRN in Language Prediction
Summary Background Introduction in algorithms and applications
Logistic Regression & Parallel SGD
XOR problem Input 2 Input 1
Relations vs. Functions Function Notation, & Evaluation
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Backpropagation.
COMP60621 Fundamentals of Parallel and Distributed Systems
LoGPC: Modeling Network Contention in Message-Passing Programs
Neural Networks Geoff Hulten.
Backpropagation.
Backpropagation.
Forward and Backward Max Pooling
Artificial Neural Networks
Gandiva: Introspective Cluster Scheduling for Deep Learning
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Kriti shreshtha.
Parallel ODETLAP for Terrain Compression and Reconstruction
RNNs: Going Beyond the SRN in Language Prediction
Backpropagation Disclaimer: This PPT is modified based on
Artificial Neural Networks
Backpropagation.
COMP60611 Fundamentals of Parallel and Distributed Systems
David Kauchak CS158 – Spring 2019
低比特卷积神经网络的量化研究介绍 主讲人:朱锋.
LHC beam mode classification
Artificial Neural Networks / Spring 2002
Mohsen Imani, Saransh Gupta, Yeseong Kim, Tajana Rosing
Deep CNN for breast cancer histology Image Analysis
Presentation transcript:

PipeDream: Pipeline Parallelism for DNN Training Aaron Harlap, Deepak Narayanan, Amar Phanishayee*, Vivek Seshadri*, Greg Ganger, Phil Gibbons

Example DNN 1000s

DNN - How they Work Good NFL Coach? Yes: 12% No: 88% Yes: 99% No: 1% Input Output

DNN Training - How do they Learn Forward Pass - Make a Prediction Input Output EVALUATE! Backward Pass - Update Solution Depending on Error

Data-Parallel Training Separate copy of model on each machine Communicate updates to model parameters various synchronization models can be used

Problem w/ Data Parallel Communication overhead can be high Less communication lowers quality of work Communication Computation VGG OverFeat Inception-BN GooLeNet

New Approach: Pipeline Parallel Assign layers to machines communicate inter-layer activations Machine 1 Machine 2 Machine 3

Alternate forward / backward work Each machine does forward work then backwards work, then forward work etc… Forward Compute Work #5 Backward Compute Work #1 Fw. Send #5 Backward Compute Work #2 Fw. Send #4 Backward Compute Work #3 Machine 1 Forward Compute Work #4 Machine 2 Bw. Send #1 Forward Compute Work #3 Machine 3 Bw. Send #2

One Quick Result - Pipeline 2.5x better VGG Network w/ ImageNet Dataset

Many More Interesting Details Staleness vs Speed trade-off How staleness in pipelining is different from data-parallel staleness How to divide works amongst machines GPU Memory Management This and more at our poster!

Summary New way to parallelize for DNN training Communicate layer activations instead of model parameters Improves over data-parallel for networks w/ large solution state