Download presentation
Presentation is loading. Please wait.
Published byCandace Hodges Modified over 9 years ago
1
Modeling and Codesign Methods for Data Adaptable Reconfigurable Embedded Systems Roman Lysecky Department of Electrical and Computer Engineering University of Arizona rlysecky@ece.arizona.edu Collaborators: Jonathan Sprinkle, Jerzy Rozenblit, Michael Marcellin Students: Andrew Milakovich, Vijay Shankar Gopinath, Sachidanand Mahadevan, Sean Whitsitt, Nathan Sandoval, Casey Mackin, Kyle Merry This work was supported in part by the National Science Foundation under Grant CNS-0915010.
2
Introduction & Motivation Data Adaptable Approach Increasingly Complex Applications Demands Complex Algorithms Compute Intensive Highly-Configurable Example: JPEG2000 Image Compression Provides significant advantages – quality and compression – over JPEG standard Support for configurability at each processing stage (e.g. color transform, wavelet, block encoding, code stream) Results in high-computational demands and larger design space
3
Introduction & Motivation Traditional SW & HW Solutions µPµPµPµP µPµPµPµP µPµPµPµP µPµPµPµP Software Only (single/multicore) Hardware Accelerated (Dedicated HW IP) µPµPµPµP I$ D$ JPEG2000Co-Processor µPµPµPµP I$ D$ FPGA Coprocessor Bitstream Memory Reconfigurable (FPGA supporting dynamic reconfiguration) GoalsSoftware OnlyHardware AcceleratedReconfigurable Configurability/FlexibilityYesNoYes PerformanceNoYes
4
Introduction & Motivation Data-Adaptable Reconfigurable Embedded Systems (DARES) Reconfigurable systems for high-configurable/compute-intensive applications Can be reconfigured at runtime for immediate application needs How/when to reconfigure specific to application and data input Goal: Reconfigure hardware tasks within FPGA based upon the current data profile Input stream...10110000 Output stream 10011000... µP Reconfigurable FPGA Task A (512x 512) Task B (5/3) Task C (Cas ual) New Input Stream...000010100 New Data Profile: - 14-bits/channel - Task A (1024x1024, 4:4:2) - Task B (Wavelet 5/3) - Task C (Error Resilient) Task A (1024x 1024) HW Task Implementations Task C Task B Task D Task A
5
Introduction & Motivation Data-Adaptable Reconfigurable Embedded Systems (DARES) DARES COMPONENTS & METHODOLOGY Model-driven framework for specifying application tasks, processing requirements, data configurability, and target data profiles for hardware support Runtime middleware and communication framework for runtime communication, system reconfiguration, and process scheduling Automated tool flow supporting the proposed methodology Input stream...10110000 Output stream 10011000... µP Reconfigurable FPGA Task A (512x 512) Task B (5/3) Task C (Cas ual) New Input Stream...000010100 Task A (1024x 1024) HW Task Implementations Task C Task B Task D Task A
6
DARES Approach Design Methodology and Toolchain – Overview 1.Modeling Framework 2.SW Task Generation/Compilation 3.HW Coprocessor Generation/Synthesis 4.HW/SW Communication Framework 5.Final Software/Hardware Implementation HW/SW Codesign (Model Interpreter) Software Binary HWTask Hardware Task Bitstreams Application and Data Profile Model Software Model for HW Tasks HW Task Hardware Tasks HW/SW Comm. Framework Xilinx ISE ImpulseC CoDeveloper Software Threads Communication Middleware Software Compiler (gcc) (1) Init. Code 01000100101101010101010 10101111101010110010010 01000101010101001001000 10001010010000001111110 HWTaskHWTaskHWTask (2)(3) (4) (5)
7
DARES Approach Design Methodology and Toolchain DARES Modeling Framework Modeling Language to express application as a composition of Communicating Sequential Dataflow Tasks (CSDT) Capture application and task level data profiles Allow designers to specify configuration of tasks for the target data profiles Perform design space exploration to determine the Pareto optimal system implementation Generate source code for SW and HW task configurations HW/SW Codesign (Model Interpreter) Application and Data Profile Model Task Configurations of Row DCT Task Application Tasks and Dataflow Model (JPEG) Modelin g Langua ge Types Semanti cs Constrai nts
8
DARES Approach Design Methodology and Toolchain DARES Modeling Language Developed using Generic Modeling Environment (GME) Types Task – Models functional unit of application Config – Models configurability of application task TaskInstance – Models the instance of an application task. Constraints Simple – Unique Identifiers Legal dataflow specifications 1-1 correspondence between IN and OUT ports in a Config and parent Task Semantics (i.e. Model Interpreter) Driven by Hardware Software Codesign methodology HW/SW Codesign (Model Interpreter) Application and Data Profile Model Modelin g Langua ge Types Semanti cs Constrai nts
9
DARES Approach Design Methodology and Toolchain HW/SW Codesign (Model Interpreter) Application and Data Profile Model
10
DARES Approach Design Methodology and Toolchain Design Space Pruning Latency Estimation Optimization Off-chip Memory Allocation Source Code Generation HW/SW Codesign (Model Interpreter) Application and Data Profile Model DARES HW/SW Codesign Methodology Design Space Pruning Find all compatible combinations of specific task configurations Subject to area constraint of FPGA Latency Estimation For all possible application configurations, estimates the end-to-end latency Estimation considers: Task configuration latency Communication overhead Required input/output data within all tasks Mode of operation of specific task configurations
11
DARES Approach Design Methodology and Toolchain Design Space Pruning Latency Estimation Optimization Off-chip Memory Allocation Source Code Generation HW/SW Codesign (Model Interpreter) Application and Data Profile Model DARES HW/SW Codesign Methodology Optimization - Design Space Pruning Find Pareto optimal combinations of task configurations Defines all possible application configurations that will yield best area/latency tradeoff
12
DARES Approach Design Methodology and Toolchain Design Space Pruning Latency Estimation Optimization Off-chip Memory Allocation Source Code Generation HW/SW Codesign (Model Interpreter) Application and Data Profile Model DARES HW/SW Codesign Methodology Off-chip Memory Allocation Designer can specify a set of application profiles that must be supported Designer can additionally choose from Pareto optimal configuration If off-chip configuration memory is still available, selects additional task configuration to support to increase runtime adaptability
13
DARES Approach Design Methodology and Toolchain Design Space Pruning Latency Estimation Optimization Off-chip Memory Allocation Code Synthesis HW/SW Codesign (Model Interpreter) Application and Data Profile Model DARES HW/SW Codesign Methodology Code Synthesis Source files generated for all SW task configurations and selected HW task configurations Transforms input C code to Pthread implementation with Communication Middleware APIs providing the methods accesses to input and output buffers identified by the unique IDs
14
DARES Approach Design Methodology and Toolchain Software Threads Communication Middleware Software Compiler (gcc) Init. Code Software Task Generation and Compilation HW/SW Codesign Interpreter transforms the C code for application task configurations Generate Pthread implementation for all SW task configurations Communication Middleware APIs providing the methods accesses to input and output buffers identified by the specific tasks // Original Task Configuration code void FuncName() { #pragma DARES_DECL_PART int data[64];... #pragma DARES_COMP_BEGIN #pragma DARES_READ_INTO(data) // Computation #pragma DARES_WRITE_FROM(data,64) #pragma DARES_cOMP_END } // Pthread implementation. void* FuncName() {... INTx DARES_SAMPLE_INPUT; int DARES_loop_iter; do{... do{ for( DARES_loop_iter = 0; DARES_loop_iter<DEPTH; ++DARES_loop_iter ) { if ( Fifo_Read_Single( ID1, &DARES_SAMPLE_INPUT ) == 0 ) DARES_INPUT[DARES_loop_iter] = DARES_SAMPLE_INPUT; } … for( DARES_loop_iter = 0;DARES_loop_iter<TOKENS;++DARES_loop_iter) { Fifo_Write_Single(ID2, &DARES_OUTPUT[DARES_loop_iter]); } } while(!Fifo_Eos(ID1));... } while(1); } Codesign Interpreter
15
DARES Approach Design Methodology and Toolchain Software Model for HW Tasks HW Task Hardware Tasks ImpulseC CoDeveloper Hardware Coprocessor Generation and Synthesis HW/SW Codesign Interpreter generates ImpulseC function for all HW task configurations Utilizes co_stream interface for FIFO input/output Utilize ImpulseC CoDeveloper to synthesize VHDL implementations Provides rich support for optimizing loops and analyzing the pipelined loops // Original Task Configuration code void FuncName() { #pragma DARES_DECL_PART int data[64];... #pragma DARES_COMP_BEGIN #pragma DARES_READ_INTO(data) // Computation #pragma DARES_WRITE_FROM(data,64) #pragma DARES_cOMP_END } // ImpulseC implementation. void FuncName( co_stream fifo1, co_stream fifo2 ) {... INT8 DARES_SAMPLE_INPUT; int DARES_loop_iter; do {... do { for( DARES_loop_iter = 0;DARES_loop_iter<DEPTH;++DARES_loop_iter) { if ( co_stream_read(_INFIFO_, &DARES_SAMPLE_INPUT, sizeof(WIDTH1)) == co_err_none ) DARES_INPUT[DARES_loop_iter] = DARES_SAMPLE_INPUT; }... for( DARES_loop_iter = 0;DARES_loop_iter<TOKENS;++DARES_loop_iter) { co_stream_write(_OUTFIFO_, &DARES_OUTPUT[DARES_loop_iter], sizeof(WIDTH2)); } } while(!co_stream_eos(_INFIFO_));... } while(1); } Codesign Interpreter
16
DARES Approach Design Methodology and Toolchain HW Task Hardware Tasks HW/SW Comm. Framework Xilinx ISE HWTask Hardware/Software Communication Framework Hardware coprocessors integrated with hardware/software communication framework Supports seamless communication between software and hardware tasks in conjunction with communication middleware Efficient communication mechanisms supported for communication between adjacent and non-adjacent hardware tasks System Bus (PLB) User IP (HW Task) FIFO Bus Interface (Memory Mapped) FIFO In Bus2FIFO FIFO Out FIFO2Bus Fpout_wren Fpout_wdata Fpout_full Fpin_wren Fpin_wdata Fpin_full
17
DARES Approach Hardware/Software Communication Methods Software to Software (SW Buffer) System Bus (PLB) User IP (HW Task) FIFO Bus Interface (Memory Mapped) FIFO In Bus2FIFO FIFO Out FIFO2Bus Fpout_wren Fpout_wdata Fpout_full Fpin_wren Fpin_wdata Fpin_full µP Mem... Task Software to Software (HW FIFO) µPMem... Task
18
DARES Approach Hardware/Software Communication Methods Software to HardwareHardware to Hardware (Adjacent) µPMem... Task Task Task µP Mem... Task Task System Bus (PLB) User IP (HW Task) FIFO Bus Interface (Memory Mapped) FIFO In Bus2FIFO FIFO Out FIFO2Bus Fpout_wren Fpout_wdata Fpout_full Fpin_wren Fpin_wdata Fpin_full
19
DARES Approach Hardware/Software Communication Methods Hardware to Hardware (Non-Adjacent)Hardware to Software µPMem... Task TaskTask µP Mem... Task Task System Bus (PLB) User IP (HW Task) FIFO Bus Interface (Memory Mapped) FIFO In Bus2FIFO FIFO Out FIFO2Bus Fpout_wren Fpout_wdata Fpout_full Fpin_wren Fpin_wdata Fpin_full
20
DARES Approach Design Methodology and Toolchain Final Hardware/Software Implementation Software threads and hardware coprocessors combined for final system implementation Requires manual – although automatable – creation of system initialization code System configuration for current data profile not supported at runtime Software BinaryHardware Task Bitstreams HWTaskHWTaskHWTaskHWTask µP Reconfigurable FPGA Coprocessor Bitstream Memory 01000100101101010101010 10101111101010110010010 01000101010101001001000 10001010010000001111110
21
DARES Approach Case Study 1 – JPEG (not 2000) Experimental Setup Consider a JPEG image compression application Generated software and hardware implementations for JPEG encoding tasks using DARES toolchain Discrete cosine transform (dct), quantization (qnt), zig-zag ordering (zz), and run-length encoding (rle) Independently verified software and hardware accelerated implementation Evaluated system performance for all combination of hardware coprocessors available within system Manually configured communication between SW and HW tasks to measure system performance of HW accelerated implementation Virtex-5 FX FPGA (ML507 board) 400 MHz PowerPC processor with 100 MHz PLB bus µP Reconfigurable FPGA
22
DARES Approach Case Study 1 – JPEG (not 2000) Experimental Results Achieves performance improvement of 1.8X for single hardware task and up to 5X with all tasks executing in hardware Compared to software-only implementations Needs for considering the communication method (e.g. DMA, bus hierarchy, NOCs) and latency in determining the Pareto optimal system configuration
23
DARES Approach Case Study 2 – JPEG2000 (the real deal…sort of) Experimental Setup Consider a JPEG2000 compression using Jasper’s software implementation All stages can be configured differently for lossy or lossless compression Bulk of the execution time is spent in Tier 1 Encoder (typically 50% or more) Block Encoder Bit Plane Encoder MQEncoder Forward Multi- Component Transform Forward Wavelet Transform Quantization Tier 1 Encoder Tier 2 Encoder Rate Control
24
DARES Approach Case Study 2 – JPEG2000 (the real deal…sort of) Experimental Setup Utilized DARES approach to create data-profile specific implementation of MQEncoder Data profile supporting 32x32 block size in HW task All other block sizes supported in software Separated JPEG2000 software into multiple threads i.e. extracted MQEncode process as separate thread Adapted software to fit DARES dataflow model Modeling environment and toolchain used to generate SW and HW source Results: Image Format Image Size (KB) #of Blocks MQEncoder % Exec. Time Estimated Speedup Actual Speedup BMP374861.42.6X2.5X BMP25725671.53.5X3.3X PGM14716772.83.7X3.5X
25
Future Work Future Directions/Open Research: Data Informed/Static Scheduling: Development of static scheduling methods that are aware of impact of data on execution time of software/hardware tasks Potential to produce a set of static schedule based upon data input/profiles Distributed Synchronization Methods Need efficient methods for synchronization – can be based upon data profile and data stream of application Typical OS synchronization methods are capable but not efficient Need for distributed synchronization methods (hThreads) Synthesis-in-the-Loop: Utilize synthesis tools during HW/SW codesign process to better estimate actual performance and area utilization (i.e. design space exploration) Optimal Synthesis of Communication Framework Communication framework can be optimized for a specific application or set of specific tasks configurations Consider alternative bus hierarchies, NoC communication networks, transaction scheduling, DMA controls, etc. Efficient Runtime Partial Reconfiguration Proof of concept demonstration of approach with runtime partial reconfiguration Many challenges ahead (future is bright, but path is trecherous)
26
Question? Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.