Progress Report 2019/5/5 PHHung
Previously An efficient ConvNet (deep learning) analytics engine 2019/5/5 An efficient ConvNet (deep learning) analytics engine For Deep learning engine with different model : HMAX ASIC: 205mW in 256x256 (Neocortical from 332 , ISSCC 12) ConvNet FPGA: 10W in 500x375 (NeuFlow from NYU LeCun , CVPR 11) ASIC: 580mW in 500x375 (NeuFlow from Purdue & NYU , MWCAS 12) FPGA: 4W (TeraDeep from Purdue , CVPR 14) Server
Given a ConvNet model Total #W = 523,328 Input 46*46 image (RGB) 2019/5/5 Input 46*46 image (RGB) Layer 1 : Spatial conv with 32 hidden node , 7*7 kernel (#weight: 32*7*7*3 + 32) 2*2 Spatial pooling Layer 2: Spatial conv with 64 hidden node , 7*7 kernel (#weight: 64*7*7*32 + 64) Layer 3: Spatial conv with 128 hidden node , 7*7 kernel (#weight: 128*7*7*64 + 128) Layer 4: Fully connect with 128 hidden node (#weight: 128*128) Output 2 output (Person or not) (#weight: 2*128) 46*46 40*40 20* 20 14* 14 7*7 7*7 Total #W = 523,328
Result 2019/5/5 Person or not ?
Bottleneck 1. So many memory 2. Floating point operation 2019/5/5 1. So many memory Total #W 523,328 IEEE 754 floating point (32bits) => ~2.1MB 2. Floating point operation
Solution? Floating point -> Fix point Unfortunately… 2019/5/5 Floating point -> Fix point Unfortunately… “Fixed-Point Feedforward Deep Neural Network Design Using Weights +1, 0, and -1” SiPS 14 1. Naive approach : Quantize W directly 2. Quantize W directly and do backpropagation to refine W quantize BP quantize 示意圖 示意圖
How about that ! 1. Naive approach : Quantize W directly (SiPS 14) 2019/5/5 1. Naive approach : Quantize W directly (SiPS 14) 2. Quantize W directly and do backpropagation to refine W (SiPS 14) 3. Add a quantize term @ loss function Loss function : 𝐽 𝑥 =𝛼 |ℎ 𝑥 −𝑦| 2 => Modify loss function : 𝐽 𝑥 =𝛼 |ℎ 𝑥 −𝑦| 2 +(1−𝛼) |𝑤 −𝑞| 2 𝑞=𝑛𝑒𝑎𝑟𝑒𝑠𝑡 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒 𝑏𝑖𝑛 𝛼=0.9 𝛼=0.1 示意圖 示意圖
𝐽 𝑥 =𝛼 |ℎ 𝑥 −𝑦| 2 +(1−𝛼) |𝑤 −𝑞| 2 2019/5/5 𝐽 𝑥 =𝛼 |ℎ 𝑥 −𝑦| 2 +(1−𝛼) |𝑤 −𝑞| 2 𝜕𝐽 𝑥 𝜕 𝑤 𝑗 =𝛼 h x −y × 𝜕ℎ(𝑥) 𝜕 𝑤 𝑗 + 1−𝛼 𝜕 𝑤 2 −2𝑤𝑞+ 𝑞 2 𝜕 𝑤 𝑗 =𝛼 h x −y × 𝜕(𝑓( 𝑗 𝑤 𝑗 × 𝑛𝑒𝑡 𝑗 )) 𝜕 𝑤 𝑗 + 1−𝛼 2 𝑤 𝑗 −2𝑞 =𝛼 h x −y × ℎ 𝑥 × 1−h x ∗ 𝑛𝑒𝑡 𝑗 + 1−𝛼 2 𝑤 𝑗 −2𝑞
Distributed Deep Networks ? 2019/5/5 Large Scale Distributed Deep Networks –NIPS2012 (google)
Another Distributed Deep Networks? 2019/5/5 Input layer Node (shallow) Compressed layer for transmit Weak Classifier @ Node Server (Deep) Strong Classifier @ Server
Some problem… 2019/5/5 1. Let’s put classifier accuracy aside…. Can it compress transmit data? 2. How about the classifier accuracy @ server? Especially when there is a compress layer 3. How about the classifier accuracy @ node? How many layer & node will “enough” ?
Let’s put classifier accuracy aside… Can it compress transmit data? 2019/5/5 Node (shallow) #Input : 𝐻×𝑊×3 (𝐻𝑖𝑔ℎ𝑡,𝑊𝑖𝑑𝑡ℎ,𝐶ℎ𝑎𝑛𝑛𝑒𝑙) #L1: 3× 𝐻− 𝐾 𝐻1 +1 × 𝑊− 𝐾 𝑊1 +1 × 𝑛 1 (𝐾:𝑘𝑒𝑟𝑛𝑒𝑙 𝑠𝑖𝑧𝑒 ,𝑛:ℎ𝑖𝑑𝑑𝑒𝑛 𝑙𝑎𝑦𝑒𝑟 𝑛𝑢𝑚𝑏𝑒𝑟) #L2: 𝑛 1 × 𝐻− 𝐾 𝐻1 − 𝐾 𝐻2 +2 × 𝑊− 𝐾 𝑊1 − 𝐾 𝑊2 +2 × 𝑛 2 How to determine n ,K to make #L2 < #Input ? => lower n , bigger K
How about the classifier accuracy @ server? 2019/5/5 When we squeeze #node2 from 64 -> 16 -> 8 -> 4 -> 2 #hidden node1_#hidden node2_ …
Conclusion A distributed model for node / server 2019/5/5 A distributed model for node / server A quantize approach to minimize computation A chip (brain) for sensor ?
About Vivotek 2019/5/5 What to do? How long? Basic neural network?