基于多核加速计算平台的深度神经网络 分割与重训练技术 基于多核加速计算平台的深度神经网络 分割与重训练技术 Deep Neural Network Partitioning in Distributed Computing System Jiyuan Shen [5130309194] Computer Science and Technology Shanghai Jiao Tong University Mentor: Li Jiang
Distributed DNN Partitioning ∎ 3 Experiments & Main Contribution ∎ 1 Motivation ∎ 2 Framework GraphDNN ∎ 3 Experiments & Main Contribution
1 Motivation
∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.
DNN in cloud × DNN in IOT √ ∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.
∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?
GAP!!!? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit ? ∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?
∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Parameter Server DistBelief Framework
Data Paralellism! Model Paralellism! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework
Distributed DNN Partition ! DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition ! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework
∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized.
Distributed DNN Partition: communication cost ? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition: communication cost ? ∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized. GraphDNN !
2 Framework GraphDNN
∎ 2 Framework GraphDNN (refer: [7]) (refer: [12]) (refer: [7]) [ Prune Smallest Cross-Weights ] (refer: [7]) [ Prune as much Cross-Weights ] [ Change format of zeros ]
∎ 2 Framework GraphDNN How GraphDNN works? The Original DNN
∎ 2 Framework GraphDNN How GraphDNN works? Static1: Compression The original complex deep neural network is compressed to a much more sparse deep neural network, i.e. less synapses in this network, while maintain the accuracy performance as the original one.
∎ 2 Framework GraphDNN How GraphDNN works? Static2: Partition Then we will partition the sparse deep neural network in a layer-layer principle. Regard each layer as a mathematical graph and utilize spectral graph partition to deploy the network. In the left graph, red lines represents those cross-partition synapses after partitioning.
∎ 2 Framework GraphDNN How GraphDNN works? Node 1 Node 2
∎ 2 Framework GraphDNN How GraphDNN works? Dynamic1: Dynamic Pruning In the retraining process, we will always keep the smallest weight. There will not exist synapses below that value. Key: Intuitive [weight] Prune Smallest Cross-Weights
∎ 2 Framework GraphDNN How GraphDNN works? Dynamic2: Greedy Cross-Weight Fixing In the retraining process, we utilize the greedy idea. Each time we focus on cross-partition weights and fix them as much as possible as zero. [inter-partition weights to compensate for them] Key: Greedy [synapse distance] Prune as much Cross-Weights
∎ 2 Framework GraphDNN How GraphDNN works? Dynamic2: Greedy Cross-Weight Fixing Unexpected change: Fixing: fix cross-partition synapses, and retrain.
∎ 2 Framework GraphDNN How GraphDNN works? Dynamic3: Explorations on the Relu Function In the retraining process, we change the zero formats when they are transferred between distributed computing nodes. [Because Relu produces 50% zeros] Key: Format Change transfer format of zeros
3 Experiments & Main Contribution
∎ 3 Experiments & Main Contribution - static compression effects
∎ 3 Experiments & Main Contribution - static compression effects
∎ 3 Experiments & Main Contribution - later partition effects Experiments ❖ Simulations: implemented a full software for DNN: general use[2084-L]; GraphDNN[387-L], (line statistics not include referred codes). ❖ Reals: 1> configure spark and caffe on tk1 boards; 2> write GraphDNN in caffe. Conclusion ❖ Later optimizations can theoretically produce a more reduction 0.40715189. Combine with static effects, GraphDNN can reduce costs to its 0.1*0.40715189=0.040715189. 0.61473859 0.58134519 0.55087291 0.40715189
∎ 3 Experiments & Main Contribution - demo ❖ GraphDNN Framework: proposed theoretical algorithms. ❖ C++ source code tool: implemented a complete software for DNN related analysis. [flexibly utilized] ❖ Real distributed system: implemented GraphDNN in real distributed boards. [tk1 boards, caffe, spark]