Download presentation
Presentation is loading. Please wait.
1
Optimizing stencil code for FPGA
Yang Liu
2
Overall Motivation Accelerate Stencil code on both software and hardware level. Software optimization: Algorithm level optimization Hardware optimization: Data transfer rate, parallelism, and a specially designed memory controller
3
Executive summary This project is intended to optimize stencil code performance on FPGA using OpenCL framework.
4
SDAccel Xilinx’s design acceleration tool enable faster development and better performance Supports standard OpenCL API to abstract hardware performance and optimize code to hardware Available on AWS cloud
5
SDAccel Design Flow
6
Stencil Algorithm Application
Computer Fluid Simulation Partial Differential equation Many more..
7
Stencil Algorithm Depend on nearest neighbor 2D 1D
8
Why we need to improve
9
Current Progress 1-D, 2-D implementation of stencil code is completed.
Optimization of 1-D, 2-D is half-way though. Will be able to meet the goal of my proposal.
10
System Design: Data Data set consists of 4096 bits random generated data. Generated using C random function
11
System Design: Program
The stencil program is handwritten. Then OpenCl configuration code are based on Xilinx Sdaccel Example
12
Loop Unrolling out[i] = ALPHA * in1[i - 1] + in1[i + 1] + BETA * in1[i]; Vout_buffer[j] = ALPHA ^2 *(in1[j - 2] + v1_buffer[j + 2] + 2 * v1_buffer[j]) + BETA * ALPHA^2 * v1_buffer[j + 1] * v1_buffer[j - 1] + v1_buffer[j];
13
Loop unrolling problem
Unused data at boundary will be larger. Compute Data Area Original Compute Data Area Unroll three times 3
14
Buffering Data movement between host and board have a very high leniency Resolution: Local buffer store part of the data Host 4096 Board Original Optimized 1024
15
Multiple instance Why just one, when we can have plenty?
16
System Test: Platform Based on Xilinx FPGA Local test KCU1500
Future test environment AWS F1 instance(VU9P)
17
Results: 1-D VS 1-D Optimized (Stencil Only)
18
Results: 2-D VS 2-D Optimized (Stencil only)
19
Results: 1-D VS 1-D Optimized (With Transfer)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.