Download presentation
Presentation is loading. Please wait.
Published byNickolas Cooper Modified over 6 years ago
1
FPGA implementation of CNN Convolution layer logic
Di Wu Role. Directory based cache,
2
Introduction to CNN Deep learning network
Suitable for image recognition and classification Feature extraction layer Convolutional computation Pooling layer Down sampling Linear Classify layer Classic ANN network
3
CNN convolutional layer
Nested For Loop // Each output kernel can be calculated separately for (to=0;to<M;to++) { // Each input layer is first calculated separately and then summed up for (ti=0;ti<N;ti++) { // The inner convolutional part of one input image for (row=0;row<R;row++) { for (col=0;col<C;col++) { for (i=0;i<K;i++) { for (j=0;j<K;j++) { output_fm[to][row][col] += weights[to][ti][i][j]*input_fm[ti][row+i][col+j]; } } } } } }
4
CNN convolutional layer
Intuition: Systolic Array Implementation Coupled with each cache entries in LLC remain Invalid
5
Systolic Array Implementation
Derive the systolic array for each line of input source Data dependency graph
6
Systolic Array Implementation
1D systolic array structure for partial convolution computation 2D systolic array structure for partial convolution computation 2D systolic array structure for convolution computation with final adder stage.
7
Synthesis and P&R result
32 bit fixed point representation Kernel size 4x4, input channel = 3, fifo size = 1024 Xilinx xq7z100-rf1156 50MHz LUT / 0.456% FF 3389/0.611% BRAM 12/1.589% DSP 288/14.257%
8
Discussion FPGA Utilization issue Bandwidth issue
DSP slice resource Bandwidth issue 32-bit fixed point representation, 50MHz clock, input channel = 3 Bandwidth requirement is 600MBps
9
Q & A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.