基于多核加速计算平台的深度神经网络分割与重训练技术

基于多核加速计算平台的深度神经网络分割与重训练技术
基于多核加速计算平台的深度神经网络分割与重训练技术 Deep Neural Network Partitioning in Distributed Computing System Jiyuan Shen [ ] Computer Science and Technology Shanghai Jiao Tong University Mentor: Li Jiang

Distributed DNN Partitioning ∎ 3 Experiments & Main Contribution
∎ 1 Motivation ∎ 2 Framework GraphDNN ∎ 3 Experiments & Main Contribution

1 Motivation

∎ 1 Motivation If Deep Neural Network combined with Cloud Computing:
input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.

DNN in cloud × DNN in IOT √
∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.

∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?

GAP!!!? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit ? ∎ 1 Motivation
If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?

∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Parameter Server DistBelief Framework

Data Paralellism! Model Paralellism!
∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework

Distributed DNN Partition !
DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition ! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework

∎ 1 Motivation New Solution: Distributed DNN Partition
[ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized.

Distributed DNN Partition: communication cost ?
DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition: communication cost ? ∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized. GraphDNN !

2 Framework GraphDNN

∎ 2 Framework GraphDNN (refer: [7]) (refer: [12]) (refer: [7])
[ Prune Smallest Cross-Weights ] (refer: [7]) [ Prune as much Cross-Weights ] [ Change format of zeros ]

∎ 2 Framework GraphDNN How GraphDNN works? The Original DNN

∎ 2 Framework GraphDNN How GraphDNN works? Static1: Compression
The original complex deep neural network is compressed to a much more sparse deep neural network, i.e. less synapses in this network, while maintain the accuracy performance as the original one.

∎ 2 Framework GraphDNN How GraphDNN works? Static2: Partition
Then we will partition the sparse deep neural network in a layer-layer principle. Regard each layer as a mathematical graph and utilize spectral graph partition to deploy the network. In the left graph, red lines represents those cross-partition synapses after partitioning.

∎ 2 Framework GraphDNN How GraphDNN works? Node 1 Node 2

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic1: Dynamic Pruning
In the retraining process, we will always keep the smallest weight. There will not exist synapses below that value. Key: Intuitive [weight] Prune Smallest Cross-Weights

∎ 2 Framework GraphDNN How GraphDNN works?
Dynamic2: Greedy Cross-Weight Fixing In the retraining process, we utilize the greedy idea. Each time we focus on cross-partition weights and fix them as much as possible as zero. [inter-partition weights to compensate for them] Key: Greedy [synapse distance] Prune as much Cross-Weights

Dynamic2: Greedy Cross-Weight Fixing Unexpected change: Fixing: fix cross-partition synapses, and retrain.

Dynamic3: Explorations on the Relu Function In the retraining process, we change the zero formats when they are transferred between distributed computing nodes. [Because Relu produces 50% zeros] Key: Format Change transfer format of zeros

3 Experiments & Main Contribution

∎ 3 Experiments & Main Contribution
- static compression effects

- later partition effects Experiments ❖ Simulations: implemented a full software for DNN: general use[2084-L]; GraphDNN[387-L], (line statistics not include referred codes). ❖ Reals: 1> configure spark and caffe on tk1 boards; 2> write GraphDNN in caffe. Conclusion ❖ Later optimizations can theoretically produce a more reduction Combine with static effects, GraphDNN can reduce costs to its 0.1* =

- demo ❖ GraphDNN Framework: proposed theoretical algorithms. ❖ C++ source code tool: implemented a complete software for DNN related analysis. [flexibly utilized] ❖ Real distributed system: implemented GraphDNN in real distributed boards. [tk1 boards, caffe, spark]

基于多核加速计算平台的深度神经网络分割与重训练技术

Similar presentations

Presentation on theme: "基于多核加速计算平台的深度神经网络分割与重训练技术"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

基于多核加速计算平台的深度神经网络 分割与重训练技术

Similar presentations

Presentation on theme: "基于多核加速计算平台的深度神经网络 分割与重训练技术"— Presentation transcript:

Similar presentations

About project

Feedback

基于多核加速计算平台的深度神经网络分割与重训练技术

Presentation on theme: "基于多核加速计算平台的深度神经网络分割与重训练技术"— Presentation transcript: