Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Neural Network with Stochastic Computing

Similar presentations


Presentation on theme: "Deep Neural Network with Stochastic Computing"— Presentation transcript:

1 Deep Neural Network with Stochastic Computing
Jan. 19, 2016 Kyounghoon Kim and Kiyoung Choi

2 Precision vs. Efficiency
Conventional binary computing Accurate with full precision binary computing High cost in area and energy consumption Compromising precision for efficiency Human brain Consumes ~20W power Does not perform precise computing Very well recognizes objects Approaches (approximate computing) Limited precision binary computing Near-threshold computing Neural processing Stochastic computing ...

3 Stochastic Number Responses of a cortical neuron 110001110001001011

4 Stochastic Number Coin flipping Encoding: head-->1, tail-->0
Toss two coins (X and Y) eight times to obtain X= Y= x=P(Xi=1)=4/8=0.5=value of stochastic number X y=P(Yi=1)=4/8=0.5=value of stochastic number Y x*y=0.5*0.5=0.25=P(Xi=1 and Yi=1) can be calculated by bitwise AND of X and Y, i.e., ( )&( )=( ) --> Multiplication can be done with an AND gate

5 Stochastic Computing Computing with stochastic numbers

6 Stochastic Computing Logic gates are used for SC (stochastic computing)

7 Stochastic Computing Example: trilinear interpolation
Used in volume rendering q = xyzv1 + xyzv2 + xyzv4 + xyzv7 + xyv0 + xyv3 + xzv0 + xzv5 + xv1 + yzv0 + yzv6 + yv2 + zv4 + v0 − xyzv0 − xyzv3 − xyzv5 − xyzv6 − xyv1 − xyv2 − xzv1 − xzv4 − xv0 − yzv2 − yzv4 − yv0 − zv0, where x, y, and z are fractional values for current coordination and v0~v7 are voxel values v6 v7 v4 v5 (x,y,z) v2 v3 v0 v1

8 Stochastic Computing Example: trilinear interpolation
Huge gain in area, latency, and power

9 Stochastic Computing Advantages (at low precision) Low cost
Low latency Low power Error tolerance Uniform weight of bits --> single bit-flip causes a small change in the value

10 Stochastic Computing Challenges Addition with a MUX
Scaled --> precision loss Random number for C should be generated --> area overhead Stochastic numbers must be independent of each other Can affect the accuracy A Y B C=0.5 y= =0.5(a+b) (1 c ) a+cb 1,1,0,1,1,1,1,0 (6/8) A Y 1,0,0,1,0,0,1,0 (3/8) 1,0,1,1,0,0,1,0 (4/8) B 1,1,0,1,1,1,1,0 (6/8) A Y 1,1,0,1,0,0,1,0 (4/8) 1,1,0,1,0,0,1,0 (4/8) B

11 Stochastic Computing Challenges Exponential length of bit-stream
3-bit binary --> 8-bit stream Can be parallelized --> performance-cost tradeoff Difficult to synthesize How to generate a logic network that implements a given expression? A B Y 1,1,0,1,1,1,1,0 (6/8) 1,0,1,1,0,0,1,0 (4/8) 1,0,0,1,0,0,1,0 (3/8) 1,1,0,1 1,0,1,1 1,0,0,1 1,1,1,0 0,0,1,0 1 (3/8) (6/8) (4/8) A B C D Y E = (1- ab ) cd + ab ( d+e - de y = abd+abe+cd-abcd-abde P(Y= 1 )=y

12 Problems in Applying SC to DNN
Multiplication error Many near-zero weights Bigger error near zero <200x100 weights multiplied by zero> <XNOR gate>

13 Problems in Applying SC to DNN
Accumulation Scaled addition Low precision Saturated addition Sensitive to input correlation Limited range [-1 1]

14 Proposed Solutions Multiplication error Accumulation
Remove near-zero weights and re-train Weights scaling Accumulation Merge accumulator and activate function using counter-based FSM Limited range [-1 1] Adjust weights and re-train

15 Early Decision Termination
Progressive precision Adjusting bit-length according to the precision Without hardware modification Early decision termination Most data are far from decision boundary Energy efficiency Faster decision 1024-bit stream 256 bits

16 Experimental Environment
MNIST hand-written Dataset 60000: training set 10000: test set Network Fully connected network Identical to the previous work [Sanni, 2015] 784 x 100 x 200 x 10 Verilog HDL Synopsys Design Compiler, TSMC 45nm 784 100 200 10

17 Accuracy Comparison Accuracy of DNN Using SC compared to
32-bit floating-point Previous work [Sanni, CISS, 2015] Accuracy with progressive precision 1024-bit stream (32 bits/step)

18 Early Decision Termination
# of EDT steps 1 step: 32 bits 1024-bit stream 32 steps 512-bit stream 16 steps Trade-off Normalized energy Error rate

19 Comparison of Synthesis Results
Area, power, critical path delay, energy One neuron with 200 inputs (512-bit streams) Overhead: stochastic number generator (SNG) State-of-the-art SNG: SNG with MTJ [Rangharajan, DATE, 2015] 80.2% 53.8%

20 ISO-Area Comparison Iso-area
9-bit Fixed-point: 3-stage pipeline, 72,104 um2 Parallelism: SC (120x), SC-SNG (70x), SC-MTJ-SNG (109x)

21 Previous Work Summary Deep neural network Classification error
comparison Contribution B. D. Brown and H. C. Card Trans. Comput., 2001 No (Soft competitive learning network) N/A Basic idea for neural network using SC State-machine based activation function N. Nedjah and L. de Macedo Mourelle, Proc. DSD, 2003 No (Normal neural network) FPGA implementation H. Li, D. Zhang, and S. Y. Foo Trans. Power Electronics, 2006 Application (Neural network controller for small wind turbine systems) D. Zhang and H. Li, Trans. Industrial Electronics, 2008 Application (Controller for an induction motor) Y. Ji, F. Ran, C. Ma, and D. J. Lilja Proc. DATE, 2015 No (Radial basis function) 2.7% (FP) 55%(SC , 1024 bits) (Iris flower dataset) Radial basis function neural network using SC K. Sanni, et al. Proc. CISS, 2015 Yes (Deep belief network) 5.8% (FP) 18.2%(SC , 1024 bits) (MNIST dataset) DBN FPGA implementation using SC Proposed Yes (Fully-connected network) 2.23% (FP) 2.41%(SC , 1024 bits) (MNIST dataset) Accuracy enhancement - Removing near-zero weights - Weight-scaling Early decision termination Merge of accumulation and activation

22 Conclusion Deep neural network by using stochastic computing
Removing near-zero weights Weight-scaling Improved FSM-based activation function Early decision termination Experimental results Accuracy is close to that of floating-point implementation Reduction of area, power, delay, and energy Depending on stochastic number generator

23 Thank you!


Download ppt "Deep Neural Network with Stochastic Computing"

Similar presentations


Ads by Google