基于多核加速计算平台的深度神经网络分割与重训练技术

Slides:

Advertisements

Similar presentations

1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.

Advertisements

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Zach Ramaekers Computer Science University of Nebraska at Omaha Advisor: Dr. Raj Dasgupta 1.

A Batch-Language, Vector-Based Neural Network Simulator Motivation: - general computer languages (e.g. C) lead to complex code - neural network simulators.

Overview and Mathematics Bjoern Griesbach

 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.

Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.

Naixue GSU Slide 1 ICVCI’09 Oct. 22, 2009 A Multi-Cloud Computing Scheme for Sharing Computing Resources to Satisfy Local Cloud User Requirements.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS

Learning of Word Boundaries in Continuous Speech using Time Delay Neural Networks Colin Tan School of Computing, National University of Singapore.

A performance evaluation approach openModeller: A Framework for species distribution Modelling.

M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.

Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,

A Study of Central Auction Based Wholesale Electricity Markets S. Ceppi and N. Gatti.

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.

Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.

Application Development in Engineering Optimization with Matlab and External Solvers Aalto University School of Engineering.

Optimal Relay Placement for Indoor Sensor Networks Cuiyao Xue †, Yanmin Zhu †, Lei Ni †, Minglu Li †, Bo Li ‡ † Shanghai Jiao Tong University ‡ HK University.

Dynamic Mobile Cloud Computing: Ad Hoc and Opportunistic Job Sharing.

BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.

Introduction to Machine Learning, its potential usage in network area,

SketchVisor: Robust Network Measurement for Software Packet Processing

INTRODUCTION TO WIRELESS SENSOR NETWORKS

Energy System Control with Deep Neural Networks

Introduction to Oracle Forms Developer and Oracle Forms Services

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Impact of Interference on Multi-hop Wireless Network Performance

Big data classification using neural network

TensorFlow– A system for large-scale machine learning

Decision Support System for School Cricket in Sri Lanka (CricDSS)

Current Generation Hypervisor Type 1 Type 2.

Near-Optimal Spectrum Allocation for Cognitive Radios: A Frequency-Time Auction Perspective Xinyu Wang Department of Electronic Engineering Shanghai.

Deep Feedforward Networks

Benchmarking Deep Learning Inference

Dynamo: A Runtime Codesign Environment

Introduction to Oracle Forms Developer and Oracle Forms Services

Self Healing and Dynamic Construction Framework:

Computing and Compressive Sensing in Wireless Sensor Networks

Introduction | Model | Solution | Evaluation

Tohoku University, Japan

A Study of Group-Tree Matching in Large Scale Group Communications

Introduction to Oracle Forms Developer and Oracle Forms Services

Professor Martin McGinnity1,2, Dr. John Wade1 and MSc. Pedro Machado1

Algorithms for Big Data Delivery over the Internet of Things

Cloud Computing By P.Mahesh

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

A Novel Framework for Software Defined Wireless Body Area Network

Collaborative Offloading for Distributed Mobile-Cloud Apps

Applying Twister to Scientific Applications

Development of the Nanoconfinement Science Gateway

Effective Social Network Quarantine with Minimal Isolation Costs

Chapter 6 : Game Search 게임 탐색 (Adversarial Search)

Divide Areas Algorithm For Optimal Multi-Robot Coverage Path Planning

A Cognitive Approach for Cross-Layer Performance Management

ExaO: Software Defined Data Distribution for Exascale Sciences

Xinbing Wang*, Qian Zhang**

A Fusion-based Approach for Tolerating Faults in Finite State Machines

Systems Analysis and Design in a Changing World, 6th Edition

Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs

An Introduction to Software Architecture

Dynamic Authentication of Typing Patterns

The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’

TensorFlow: A System for Large-Scale Machine Learning

Sanguthevar Rajasekaran University of Connecticut

Search-Based Approaches to Accelerate Deep Learning

Efficient Aggregation over Objects with Extent

Single Parameter Tuning

ADDITIONAL ANALYSIS TECHNIQUES

Presentation transcript:

基于多核加速计算平台的深度神经网络分割与重训练技术基于多核加速计算平台的深度神经网络分割与重训练技术 Deep Neural Network Partitioning in Distributed Computing System Jiyuan Shen [5130309194] Computer Science and Technology Shanghai Jiao Tong University Mentor: Li Jiang

Distributed DNN Partitioning ∎ 3 Experiments & Main Contribution ∎ 1 Motivation ∎ 2 Framework GraphDNN ∎ 3 Experiments & Main Contribution

1 Motivation

∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.

DNN in cloud × DNN in IOT √ ∎ 1 Motivation If Deep Neural Network combined with Cloud Computing: input output 1. Inconvenience: lots of data from IOT to Cloud. 2. Cost-ineffectiveness: pay for all resources. 3. Inflexibility: Cannot applied to mobile devices.

∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?

GAP!!!? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit ? ∎ 1 Motivation If Deep Neural Network (DNN) combined with IOT: GAP Problem: DNN Mem-demand >>> IOT Resource GAP!!!?

∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Parameter Server DistBelief Framework

Data Paralellism! Model Paralellism! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework

Distributed DNN Partition ! DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition ! ∎ 1 Motivation To solve GAP Problem, all of related works are system-level data parallelism. Data Paralellism! Model Paralellism! node 1 Parameter Server node 2 node 3 node 4 node 5 DistBelief Framework

∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized.

Distributed DNN Partition: communication cost ? DNN in cloud × DNN in IOT √ DNN-IOT GAP Limit : Distributed DNN Partition: communication cost ? ∎ 1 Motivation New Solution: Distributed DNN Partition [ Property ] Software-level Model Parallelism. [ Concept ] 1 Given a distributed computing system with k computing nodes: 2 Partition the whole deep neural network into k individual network component; 3 Run them in corresponding distributed computing node. At the same time, we should follow the basic rules that first, each computing node maintain workload balance; second, inter computing node communication costs are minimized. GraphDNN !

2 Framework GraphDNN

∎ 2 Framework GraphDNN (refer: [7]) (refer: [12]) (refer: [7]) [ Prune Smallest Cross-Weights ] (refer: [7]) [ Prune as much Cross-Weights ] [ Change format of zeros ]

∎ 2 Framework GraphDNN How GraphDNN works? The Original DNN

∎ 2 Framework GraphDNN How GraphDNN works? Static1: Compression The original complex deep neural network is compressed to a much more sparse deep neural network, i.e. less synapses in this network, while maintain the accuracy performance as the original one.

∎ 2 Framework GraphDNN How GraphDNN works? Static2: Partition Then we will partition the sparse deep neural network in a layer-layer principle. Regard each layer as a mathematical graph and utilize spectral graph partition to deploy the network. In the left graph, red lines represents those cross-partition synapses after partitioning.

∎ 2 Framework GraphDNN How GraphDNN works? Node 1 Node 2

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic1: Dynamic Pruning In the retraining process, we will always keep the smallest weight. There will not exist synapses below that value. Key: Intuitive [weight] Prune Smallest Cross-Weights

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic2: Greedy Cross-Weight Fixing In the retraining process, we utilize the greedy idea. Each time we focus on cross-partition weights and fix them as much as possible as zero. [inter-partition weights to compensate for them] Key: Greedy [synapse distance] Prune as much Cross-Weights

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic2: Greedy Cross-Weight Fixing Unexpected change: Fixing: fix cross-partition synapses, and retrain.

∎ 2 Framework GraphDNN How GraphDNN works? Dynamic3: Explorations on the Relu Function In the retraining process, we change the zero formats when they are transferred between distributed computing nodes. [Because Relu produces 50% zeros] Key: Format Change transfer format of zeros

3 Experiments & Main Contribution

∎ 3 Experiments & Main Contribution - static compression effects

∎ 3 Experiments & Main Contribution - static compression effects

∎ 3 Experiments & Main Contribution - later partition effects Experiments ❖ Simulations: implemented a full software for DNN: general use[2084-L]; GraphDNN[387-L], (line statistics not include referred codes). ❖ Reals: 1> configure spark and caffe on tk1 boards; 2> write GraphDNN in caffe. Conclusion ❖ Later optimizations can theoretically produce a more reduction 0.40715189. Combine with static effects, GraphDNN can reduce costs to its 0.1*0.40715189=0.040715189. 0.61473859 0.58134519 0.55087291 0.40715189

∎ 3 Experiments & Main Contribution - demo ❖ GraphDNN Framework: proposed theoretical algorithms. ❖ C++ source code tool: implemented a complete software for DNN related analysis. [flexibly utilized] ❖ Real distributed system: implemented GraphDNN in real distributed boards. [tk1 boards, caffe, spark]