Download presentation
Presentation is loading. Please wait.
Published byBrenda Barker Modified over 9 years ago
1
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick FPGAs A field-programmable gate array (FPGA) is a programmable logic device which can be configured to implement any logical function They are made up of: configurable logic blocks programmable interconnects FPGAs are programmed with a schematic or hardware description language (HDL) design CLB architecture CLB pin layout Design Flow FPGAs and application-specific integrated circuits (ASICs) are designed according to HDL hardware design flow Traditional HDLs include VHDL and Verilog Shared Memory and OpenMP A shared memory system has multiple processing cores with access to a common, shared memory Shared memory can be accessed by each processor simultaneously Communication and synchronization is achieved through shared variables OpenMP is an API for shared memory parallel programming in C/C++ Parallelism is specified explicitly through a set of pragma directives Run-time library functions control environment settings such as the number of threads The Handel-C Language Handel-C is a behavioral HDL by Celoxica It is made up of: A subset of ANSI-C language elements Extensions for concurrency A set of variable width primitive types A set of architectural types such as interfaces and rams Each assignment statement takes 1 clock cycle Example 8-bit multiplier in Handel-C: set clock = external; void main (void) { int 8 result; interface bus_in (int 8 a, int 8 b) input (); interface bus_out () output (int 8 data_out = result); result = input.x * input.y; } images: http://en.wikipedia.org/wiki/fpga
2
Variable Bit Width Better control over resource usage should lead to better performance A new compiler directive was implemented to allow variable bit width Register widths are automatically adjusted when translating expressions whose widths don’t match Preliminary Results The Mandelbrot set was generated with a resolution of 640x480 pixels Varying bit width settings were used for program variables Resulting resource usage and performance data was collected Ran out of hardware resources for the 48-bit version after 6 threads Resource usage and execution time decreased OpenMP-Handel-C Translator Wong et al. created the OpenMP-Handel-C translator [1] It is based on C-Breeze, a C compiler infrastructure Their modifications include: Addition of new abstract syntax tree nodes for OpenMP pragmas Addition of the OpenMP grammar to the GNU Flex/Bison-based parser Modifications to C-Breeze’s built-in C-to-C translator enabling C-to-Handel-C translation based on a set of porting rules The OpenMP abstract syntax tree nodes generate Handel-C code that implement the supported OpenMP directives Data types supported for translation are: int, char, and long Representing a C program Source code is parsed and represented as an abstract syntax tree Future Work Complete the remaining benchmark tests Implementation of OpenMP library functions such as omp_get_thread_id() Study the feasibility of a tool that determines the optimal number of threads Integrate the improved translator with other tools being developed by the Reconfigurable Computing Research Group #pragma handelc width 8 int x; #pragma handelc function return 8 params (8, 16) int my_function (int param1, int param2); Example C program fragment with bit width annotations int 8 x; inline int 8 my_function (int 8 param1, int 16 param2); Translated C program fragment [1] Leow, Y.Y.; Ng, C.Y.; Wong, W.F. Generating Hardware from OpenMP Programs. IEEE International Conference on Field-Programmable Technology 2006 / FPT 2006. 73-80. Benchmark Methodology An initial set of tests have been developed: A Mandelbrot set generator Miller-Rabin primality test Systolic sequence alignment The translated OpenMP programs are compiled to VHDL in Celoxica’s DK 5.0, and then the VHDL is synthesized into hardware using Xilinx’s ISE 9.1 Resource usage and performance data is recorded Translator Limitations No OpenMP run-time library functions Number of threads is fixed at compile time Nested parallelism is not supported Parallel reduction variables must be 32-bit integers All variables of type int map to 32-bit registers, which may use more resources than necessary int is_even (int x) { if (x % 2 == 0) return 1; else return 0; } Example source program and AST representation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.