基于多核加速计算平台的深度神经网络 分割与重训练技术

Slides:



Advertisements
Similar presentations
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Advertisements

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Zach Ramaekers Computer Science University of Nebraska at Omaha Advisor: Dr. Raj Dasgupta 1.
A Batch-Language, Vector-Based Neural Network Simulator Motivation: - general computer languages (e.g. C) lead to complex code - neural network simulators.
Overview and Mathematics Bjoern Griesbach
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS
Learning of Word Boundaries in Continuous Speech using Time Delay Neural Networks Colin Tan School of Computing, National University of Singapore.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
A Study of Central Auction Based Wholesale Electricity Markets S. Ceppi and N. Gatti.
Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Application Development in Engineering Optimization with Matlab and External Solvers Aalto University School of Engineering.
Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.
Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Introduction to Machine Learning, its potential usage in network area,
SketchVisor: Robust Network Measurement for Software Packet Processing
INTRODUCTION TO WIRELESS SENSOR NETWORKS
Energy System Control with Deep Neural Networks
Introduction to Oracle Forms Developer and Oracle Forms Services
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Impact of Interference on Multi-hop Wireless Network Performance
Big data classification using neural network
TensorFlow– A system for large-scale machine learning
Decision Support System for School Cricket in Sri Lanka (CricDSS)
Current Generation Hypervisor Type 1 Type 2.
Near-Optimal Spectrum Allocation for Cognitive Radios: A Frequency-Time Auction Perspective Xinyu Wang Department of Electronic Engineering Shanghai.
Deep Feedforward Networks
Benchmarking Deep Learning Inference
Dynamo: A Runtime Codesign Environment
Introduction to Oracle Forms Developer and Oracle Forms Services
Self Healing and Dynamic Construction Framework:
Computing and Compressive Sensing in Wireless Sensor Networks
Introduction | Model | Solution | Evaluation
Tohoku University, Japan
A Study of Group-Tree Matching in Large Scale Group Communications
Introduction to Oracle Forms Developer and Oracle Forms Services
Professor Martin McGinnity1,2, Dr. John Wade1 and MSc. Pedro Machado1
Algorithms for Big Data Delivery over the Internet of Things
Cloud Computing By P.Mahesh
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
A Novel Framework for Software Defined Wireless Body Area Network
Collaborative Offloading for Distributed Mobile-Cloud Apps
Applying Twister to Scientific Applications
Development of the Nanoconfinement Science Gateway
Effective Social Network Quarantine with Minimal Isolation Costs
Chapter 6 : Game Search 게임 탐색 (Adversarial Search)
Divide Areas Algorithm For Optimal Multi-Robot Coverage Path Planning
A Cognitive Approach for Cross-Layer Performance Management
ExaO: Software Defined Data Distribution for Exascale Sciences
Xinbing Wang*, Qian Zhang**
A Fusion-based Approach for Tolerating Faults in Finite State Machines
Systems Analysis and Design in a Changing World, 6th Edition
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
An Introduction to Software Architecture
Dynamic Authentication of Typing Patterns
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
TensorFlow: A System for Large-Scale Machine Learning
Sanguthevar Rajasekaran University of Connecticut
Search-Based Approaches to Accelerate Deep Learning
Efficient Aggregation over Objects with Extent
Single Parameter Tuning
ADDITIONAL ANALYSIS TECHNIQUES
Presentation transcript:

基于多核加速计算平台的深度神经网络 分割与重训练技术 基于多核加速计算平台的深度神经网络 分割与重训练技术   Deep Neural Network Partitioning in Distributed Computing System Jiyuan Shen [5130309194] Computer Science and Technology Shanghai Jiao Tong University Mentor: Li Jiang

Distributed DNN Partitioning ∎ 3 Experiments & Main Contribution ∎ 1 Motivation ∎ 2 Framework GraphDNN ∎ 3 Experiments & Main Contribution

1 Motivation

∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.

DNN in cloud × DNN in IOT √ ∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.

∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?

GAP!!!? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit ? ∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?

∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Parameter Server DistBelief Framework

Data Paralellism! Model Paralellism! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework

Distributed DNN Partition ! DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition ! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework

∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized.

Distributed DNN Partition: communication cost ? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition: communication cost ? ∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized. GraphDNN !

2 Framework GraphDNN

∎ 2 Framework GraphDNN (refer: [7]) (refer: [12]) (refer: [7]) [ Prune Smallest Cross-Weights ] (refer: [7]) [ Prune as much Cross-Weights ] [ Change format of zeros ]

∎ 2 Framework GraphDNN How GraphDNN works? The Original DNN

∎ 2 Framework GraphDNN How GraphDNN works? Static1: Compression The original complex deep neural network is compressed to a much more sparse deep neural network, i.e. less synapses in this network, while maintain the accuracy performance as the original one.

∎ 2 Framework GraphDNN How GraphDNN works? Static2: Partition Then we will partition the sparse deep neural network in a layer-layer principle. Regard each layer as a mathematical graph and utilize spectral graph partition to deploy the network. In the left graph, red lines represents those cross-partition synapses after partitioning.

∎ 2 Framework GraphDNN How GraphDNN works? Node 1 Node 2

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic1: Dynamic Pruning In the retraining process, we will always keep the smallest weight. There will not exist synapses below that value. Key: Intuitive [weight] Prune Smallest Cross-Weights

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic2: Greedy Cross-Weight Fixing In the retraining process, we utilize the greedy idea. Each time we focus on cross-partition weights and fix them as much as possible as zero. [inter-partition weights to compensate for them] Key: Greedy [synapse distance] Prune as much Cross-Weights

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic2: Greedy Cross-Weight Fixing Unexpected change: Fixing: fix cross-partition synapses, and retrain.

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic3: Explorations on the Relu Function In the retraining process, we change the zero formats when they are transferred between distributed computing nodes. [Because Relu produces 50% zeros] Key: Format Change transfer format of zeros

3 Experiments & Main Contribution

∎ 3 Experiments & Main Contribution - static compression effects

∎ 3 Experiments & Main Contribution - static compression effects

∎ 3 Experiments & Main Contribution - later partition effects Experiments ❖ Simulations: implemented a full software for DNN: general use[2084-L]; GraphDNN[387-L], (line statistics not include referred codes). ❖ Reals: 1> configure spark and caffe on tk1 boards; 2> write GraphDNN in caffe. Conclusion ❖ Later optimizations can theoretically produce a more reduction 0.40715189. Combine with static effects, GraphDNN can reduce costs to its 0.1*0.40715189=0.040715189. 0.61473859 0.58134519 0.55087291 0.40715189

∎ 3 Experiments & Main Contribution - demo ❖ GraphDNN Framework: proposed theoretical algorithms. ❖ C++ source code tool: implemented a complete software for DNN related analysis. [flexibly utilized] ❖ Real distributed system: implemented GraphDNN in real distributed boards. [tk1 boards, caffe, spark]