Effects of Limiting Numerical Precision on Neural Networks An Empirical Study on Deep Learning Accelerators vinu@cs.utah.edu chandru@cs.utah.edu
Deep Learning Applications Neural Network Compute-intensive embedded projects like Drones, Autonomous robotic systems, Mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, Neural Network Training Inference
Problem Statement (New age) Neural Network Architects are having a hard time Training Time Compute Resources Hyperparameter search … Inference Power Budget Accuracy
Problem Statement (New age) Neural Network Architects are having a hard time Training Time Compute Resources Hyperparameter search … Inference Power Budget Accuracy
Lecture: Tools to Explore Accelerators We saw Minerva In Class … Topics: the Minerva tool to explore the design space, prune/quantize, lower voltages Project prep
Mine-rva Keras Aladdin Madonna Overview Keras Aladdin Madonna
Motivation
Project Proposal MADONNA : A tool for Measurement & Assistance in Design Of NN to its Architect Measurement Nvidia jetson TK-1 Assistance FPTuner Ristretto
Numeric Precision
Assistance
Measurement
Architecting a good Deep Learning Applicaiton Low power User defined accuracy Best possible within power budget MAUD Our project, Framework for NN architect Principle: Measure As yoU Design
Hardware Jetson Embedded Platform Measurement hardware Topology NVIDIA Jetson with GPU-accelerated parallel processing. Leading embedded visual computing platform. It features high-performance, low-energy computing for deep learning and computer vision Ideal for compute-intensive embedded projects like drones, autonomous robotic systems, mobile medical imaging, and Intelligent Video Analytics (IVA). OEMs, independent developers, Makers and hobbyists can use the NVIDIA Jetson TX1 to explore the future of embedded computing. Measurement hardware Yokogawa wt310 The WT300E series digital power analyzer Provides extremely low current measurement capability down to 50 micro-Amps, This instrument is ideal for engineers performing stand-by power measurements. Topology The head/gateway node is mir.cs.utah.edu This is the large tower computer on the floor. mir is then connected to the switch on the table. This switch is then connected to the nvidia jetson tk-1 boards (mir01, mir02,.... mir16).
nVIDIA Tegra, Jetson K1 Board
GPU based Accelerator Tegra K1 GPU NVIDIA® Kepler™ Architecture TEGRA K1 PROCESSOR SPECIFICATIONS - See more at: http://www.nvidia.com/object/tegra-k1-processor.html#sthash.8CHMnKLF.dpuf GPU based Accelerator Tegra K1 GPU NVIDIA® Kepler™ Architecture 192 NVIDIA CUDA® Cores CPU CPU Cores and Architecture NVIDIA 4-Plus-1™ Quad-Core ARM Cortex-A15 "r3" Max Clock Speed 2.3 GHz Memory Memory Type DDR3L and LPDDR3 Max Memory Size 8 GB (with 40-bit address extension) Display LCD 3840x2160 HDMI 4K (UltraHD, 4096x2160) Package Package Size/Type 23x23 FCBGA 16x16 S-FCCSP 15x15 FC PoP Process 28 nm
Workflow Design
Software Measurement Assistance eServer.exe (Backend) eNergy.py (Frontend) Assistance Caffe FPTuner Ristretto
Caffe: make –j 4 all 10 W 8690 Joules
Caffe: make clean 5 W 34 Joules
Performance Metrics (No free lunch) DOUBLE I1207 02:12:46.015034 4489 caffe.cpp:275] Batch 49, loss = 0.783467I1207 02:12:46.015295 4489 caffe.cpp:280] Loss: 0.749668I1207 02:12:46.015519 4489 caffe.cpp:292] accuracy = 0.7538I1207 02:12:46.015750 4489 caffe.cpp:292] loss = 0.749668 (* 1 = 0.749668 loss) SINGLE I1207 04:02:18.003669 19796 caffe.cpp:275] Batch 49, loss = 0.780145I1207 04:02:18.003912 19796 caffe.cpp:280] Loss: 0.748495I1207 04:02:18.004142 19796 caffe.cpp:292] accuracy = 0.7488I1207 04:02:18.004376 19796 caffe.cpp:292] loss = 0.748495 (* 1 = 0.748495 loss)
34K Joules 346 Joules 12K Joules 244 Joules 6W 10 W 34K Joules 346 Joules Power Metrics 10W 5.5 W 12K Joules 244 Joules CIFAR 10
11K Joules 62 Joules 4K Joules 45 Joules 8 W 4.5W 11K Joules 62 Joules 8 W 4 W 4K Joules 45 Joules LENET - MNIST
Next Steps Measurement Assistance Complete measurement studies CaffeNet ImageNet Study and Report Impact of Precision on energy consumption Assistance Attempt implementing fixed point support in Ristretto / native Caffe Resume FPTuner addition in workFlow, aiming for automation