Download presentation
Presentation is loading. Please wait.
1
基于多核加速计算平台的深度神经网络 分割与重训练技术
基于多核加速计算平台的深度神经网络 分割与重训练技术 Deep Neural Network Partitioning in Distributed Computing System Jiyuan Shen [ ] Computer Science and Technology Shanghai Jiao Tong University Mentor: Li Jiang
2
Distributed DNN Partitioning ∎ 3 Experiments & Main Contribution
∎ 1 Motivation ∎ 2 Framework GraphDNN ∎ 3 Experiments & Main Contribution
3
1 Motivation
4
∎ 1 Motivation If Deep Neural Network combined with Cloud Computing:
input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.
5
DNN in cloud × DNN in IOT √
∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.
6
∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?
7
GAP!!!? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit ? ∎ 1 Motivation
If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?
8
∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Parameter Server DistBelief Framework
9
Data Paralellism! Model Paralellism!
∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework
10
Distributed DNN Partition !
DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition ! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework
11
∎ 1 Motivation New Solution: Distributed DNN Partition
[ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized.
12
Distributed DNN Partition: communication cost ?
DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition: communication cost ? ∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized. GraphDNN !
13
2 Framework GraphDNN
14
∎ 2 Framework GraphDNN (refer: [7]) (refer: [12]) (refer: [7])
[ Prune Smallest Cross-Weights ] (refer: [7]) [ Prune as much Cross-Weights ] [ Change format of zeros ]
15
∎ 2 Framework GraphDNN How GraphDNN works? The Original DNN
16
∎ 2 Framework GraphDNN How GraphDNN works? Static1: Compression
The original complex deep neural network is compressed to a much more sparse deep neural network, i.e. less synapses in this network, while maintain the accuracy performance as the original one.
17
∎ 2 Framework GraphDNN How GraphDNN works? Static2: Partition
Then we will partition the sparse deep neural network in a layer-layer principle. Regard each layer as a mathematical graph and utilize spectral graph partition to deploy the network. In the left graph, red lines represents those cross-partition synapses after partitioning.
18
∎ 2 Framework GraphDNN How GraphDNN works? Node 1 Node 2
19
∎ 2 Framework GraphDNN How GraphDNN works? Dynamic1: Dynamic Pruning
In the retraining process, we will always keep the smallest weight. There will not exist synapses below that value. Key: Intuitive [weight] Prune Smallest Cross-Weights
20
∎ 2 Framework GraphDNN How GraphDNN works?
Dynamic2: Greedy Cross-Weight Fixing In the retraining process, we utilize the greedy idea. Each time we focus on cross-partition weights and fix them as much as possible as zero. [inter-partition weights to compensate for them] Key: Greedy [synapse distance] Prune as much Cross-Weights
21
∎ 2 Framework GraphDNN How GraphDNN works?
Dynamic2: Greedy Cross-Weight Fixing Unexpected change: Fixing: fix cross-partition synapses, and retrain.
22
∎ 2 Framework GraphDNN How GraphDNN works?
Dynamic3: Explorations on the Relu Function In the retraining process, we change the zero formats when they are transferred between distributed computing nodes. [Because Relu produces 50% zeros] Key: Format Change transfer format of zeros
23
3 Experiments & Main Contribution
24
∎ 3 Experiments & Main Contribution
- static compression effects
25
∎ 3 Experiments & Main Contribution
- static compression effects
26
∎ 3 Experiments & Main Contribution
- later partition effects Experiments ❖ Simulations: implemented a full software for DNN: general use[2084-L]; GraphDNN[387-L], (line statistics not include referred codes). ❖ Reals: 1> configure spark and caffe on tk1 boards; 2> write GraphDNN in caffe. Conclusion ❖ Later optimizations can theoretically produce a more reduction Combine with static effects, GraphDNN can reduce costs to its 0.1* =
27
∎ 3 Experiments & Main Contribution
- demo ❖ GraphDNN Framework: proposed theoretical algorithms. ❖ C++ source code tool: implemented a complete software for DNN related analysis. [flexibly utilized] ❖ Real distributed system: implemented GraphDNN in real distributed boards. [tk1 boards, caffe, spark]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.