Download presentation
Presentation is loading. Please wait.
Published byEsther Tiffany Cole Modified over 8 years ago
1
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman Lysecky, Jonathan Sprinkle Department of Electrical and Computer Engineering University of Arizona amilakov@ece.arizona.edu, vsg@ece.arizona.edu, rlysecky@ece.arizona.edu, sprinkle@ece.arizona.edu This work was supported in part by the National Science Foundation under Grant CNS-0915010.
2
Introduction & Motivation Data Adaptable Approach Increasingly Complex Applications Demands Complex Algorithms Compute Intensive Highly-Configurable Example: JPEG2000 Image Compression Provides significant advantages – quality and compression – over JPEG standard Support for configurability at each processing stage (e.g. color transform, wavelet, block encoding, code stream) Results in high-computational demands and larger design space
3
Introduction & Motivation Traditional SW & HW Solutions µPµPµPµP µPµPµPµP µPµPµPµP µPµPµPµP Software Only (single/multicore) Hardware Accelerated (Dedicated HW IP) µPµPµPµP I$ D$ JPEG2000Co-Processor µPµPµPµP I$ D$ FPGA Coprocessor Bitstream Memory Reconfigurable (FPGA supporting dynamic reconfiguration) GoalsSoftware OnlyHardware AcceleratedReconfigurable Configurability/FlexibilityYesNoYes PerformanceNoYes
4
Introduction & Motivation Data-Adaptable Reconfigurable Embedded Systems (DARES) Reconfigurable systems for high-configurable/compute-intensive applications Can be reconfigured at runtime for immediate application needs How/when to reconfigure specific to application and data input Goal: Reconfigure hardware tasks within FPGA based upon the current data profile Input stream...10110000 Output stream 10011000... µP Reconfigurable FPGA Task A (512x 512) Task B (5/3) Task C (Cas ual) New Input Stream...000010100 New Data Profile: - 14-bits/channel - Task A (1024x1024, 4:4:2) - Task B (Wavelet 5/3) - Task C (Error Resilient) Task A (1024x 1024) HW Task Implementations Task C Task B Task D Task A
5
µPµPµPµP I$ D$ FPGA Reconfigurable (FPGA supporting dynamic reconfiguration) Introduction & Motivation Related Work Coprocessor Bitstream Memory Data/Input-driven Reconfigurable Systems Adaptive Programming [Santambrogio et al., ICCAD 2007] Utilizing programming technique the eliminates the need to re-compute computations if the inputs have not changed Can be applied both to software execution and reconfigurable hardware Function-level Data Adaptation [Sima & Bertels, ISDPS 2009]. Determines if executing HW will be beneficial given the overhead of HW communication Performed at function level by monitoring/profiling input parameters
6
DARES Approach Project Goals DARES Approach Goals Develop model-driven framework for specifyingapplication tasks, processing requirements, data configurability, and target data profiles for hardware support Develop runtime middleware and communication framework for runtime communication and system reconfiguration Develop automated tool flow supporting the proposed methodology Current Focus Provide overview of design methodology and toolchain for: Application modeling Software and hardware task generation and compilation Integration with runtime middleware and communication framework Consider a small case study using the less complex JPEG image compression
7
DARES Approach Design Methodology and Toolchain – Overview 1.Modeling Framework 2.SW Task Generation/Compilation 3.HW Coprocessor Generation/Synthesis 4.HW/SW Communication Framework 5.Final Software/Hardware Implementation HW/SW Codesign (Model Interpreter) Software Binary HWTask Hardware Task Bitstreams Application and Data Profile Model Software Model for HW Tasks HW Task Hardware Tasks HW/SW Comm. Framework Xilinx ISE ImpulseC CoDeveloper Software Threads Communication Middleware Software Compiler (gcc) (1) Init. Code 01000100101101010101010 10101111101010110010010 01000101010101001001000 10001010010000001111110 HWTaskHWTaskHWTask (2)(3) (4) (5)
8
DARES Approach Design Methodology and Toolchain DARES Modeling Framework Modeling Language to express application as a composition of Communicating Sequential Dataflow Tasks (CSDT) Capture application and task level data profiles Allow designers to specify configuration of tasks for the target data profiles Perform design space exploration to determine the Pareto optimal system implementation Generate source code for SW and HW task configurations HW/SW Codesign (Model Interpreter) Application and Data Profile Model Task Configurations of Row DCT Task Application Tasks and Dataflow Model (JPEG)
9
DARES Approach Design Methodology and Toolchain Software Threads Communication Middleware Software Compiler (gcc) Init. Code Software Task Generation and Compilation HW/SW Codesign Interpreter transforms the C code for application task configurations Generate Pthread implementation for all SW task configurations Communication Middleware APIs providing the methods accesses to input and output buffers identified by the specific tasks // Original Task Configuration code void FuncName() { #pragma DARES_DECL_PART int data[64];... #pragma DARES_COMP_BEGIN #pragma DARES_READ_INTO(data) // Computation #pragma DARES_WRITE_FROM(data,64) #pragma DARES_cOMP_END } // Pthread implementation. void* FuncName() {... INTx DARES_SAMPLE_INPUT; int DARES_loop_iter; do{... do{ for( DARES_loop_iter = 0; DARES_loop_iter<DEPTH; ++DARES_loop_iter ) { if ( Fifo_Read_Single( ID1, &DARES_SAMPLE_INPUT ) == 0 ) DARES_INPUT[DARES_loop_iter] = DARES_SAMPLE_INPUT; } … for( DARES_loop_iter = 0;DARES_loop_iter<TOKENS;++DARES_loop_iter) { Fifo_Write_Single(ID2, &DARES_OUTPUT[DARES_loop_iter]); } } while(!Fifo_Eos(ID1));... } while(1); } Codesign Interpreter
10
DARES Approach Design Methodology and Toolchain Software Model for HW Tasks HW Task Hardware Tasks ImpulseC CoDeveloper Hardware Coprocessor Generation and Synthesis HW/SW Codesign Interpreter generates ImpulseC function for all HW task configurations Utilizes co_stream interface for FIFO input/output Utilize ImpulseC CoDeveloper to synthesize VHDL implementations Provides rich support for optimizing loops and analyzing the pipelined loops // Original Task Configuration code void FuncName() { #pragma DARES_DECL_PART int data[64];... #pragma DARES_COMP_BEGIN #pragma DARES_READ_INTO(data) // Computation #pragma DARES_WRITE_FROM(data,64) #pragma DARES_cOMP_END } // ImpulseC implementation. void FuncName( co_stream fifo1, co_stream fifo2 ) {... INT8 DARES_SAMPLE_INPUT; int DARES_loop_iter; do {... do { for( DARES_loop_iter = 0;DARES_loop_iter<DEPTH;++DARES_loop_iter) { if ( co_stream_read(_INFIFO_, &DARES_SAMPLE_INPUT, sizeof(WIDTH1)) == co_err_none ) DARES_INPUT[DARES_loop_iter] = DARES_SAMPLE_INPUT; }... for( DARES_loop_iter = 0;DARES_loop_iter<TOKENS;++DARES_loop_iter) { co_stream_write(_OUTFIFO_, &DARES_OUTPUT[DARES_loop_iter], sizeof(WIDTH2)); } } while(!co_stream_eos(_INFIFO_));... } while(1); } Codesign Interpreter
11
DARES Approach Design Methodology and Toolchain HW Task Hardware Tasks HW/SW Comm. Framework Xilinx ISE HWTask Hardware/Software Communication Framework Hardware coprocessors integrated with hardware/software communication framework Supports seamless communication between software and hardware tasks in conjunction with communication middleware Efficient communication mechanisms supported for communication between adjacent and non-adjacent hardware tasks System Bus (PLB) User IP (HW Task) FIFO Bus Interface (Memory Mapped) FIFO In Bus2FIFO FIFO Out FIFO2Bus Fpout_wren Fpout_wdata Fpout_full Fpin_wren Fpin_wdata Fpin_full
12
DARES Approach Design Methodology and Toolchain Final Hardware/Software Implementation Software threads and hardware coprocessors combined for final system implementation Requires manual – although automatable – creation of system initialization code System configuration for current data profile not supported at runtime Software BinaryHardware Task Bitstreams HWTaskHWTaskHWTaskHWTask µP Reconfigurable FPGA Coprocessor Bitstream Memory 01000100101101010101010 10101111101010110010010 01000101010101001001000 10001010010000001111110
13
DARES Approach Case Study – JPEG (not 2000) Experimental Setup Consider a JPEG image compression application Generated software and hardware implementations for JPEG encoding tasks using DARES toolchain Discrete cosine transform (dct), quantization (qnt), zig-zag ordering (zz), and run-length encoding (rle) Independently verified software and hardware accelerated implementation Evaluated system performance for all combination of hardware coprocessors available within system Manually configured communication between SW and HW tasks to measure system performance of HW accelerated implementation Virtex-5 FX FPGA (ML507 board) 400 MHz PowerPC processor with 100 MHz PLB bus µP Reconfigurable FPGA
14
DARES Approach Case Study – JPEG (not 2000) Experimental Results Achieves performance improvement of 1.8X for single hardware task and up to 5X with all tasks executing in hardware Compared to software-only implementations Further demonstrates the needs for considering the communication method (e.g. DMA) and latency in determining the Pareto optimal system configuration
15
Conclusions and Future Work Conclusions Presented a proof-on-concept of DARES methodology and toolchain Demonstrated automated toolchain for generating software and hardware tasks from application model Highlight the potential performance advantages for a simplified JPEG case study Up to 5X performance improvement compared to software-only implementation Future Work Develop fully automated tools for DARES approaches including: Synthesis-in-the-loop hardware coprocessor performance/area/power estimation Develop hardware/software codesign methods for determining optimal system implementation subject to area (physical and memory) constraints Integrate methods to generating partial bitstream supporting dynamic reconfiguration Develop runtime manager for reconfiguring the HW/SW implementation at runtime given input data profile Apply DARES approaches to relevant highly-configurable, compute-intensive applications e.g. JPEG2000
16
Question? Thank You!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.