Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paper Review: XiSystem - A Reconfigurable Processor and System

Similar presentations


Presentation on theme: "Paper Review: XiSystem - A Reconfigurable Processor and System"— Presentation transcript:

1 Paper Review: XiSystem - A Reconfigurable Processor and System
David Hermann ENGG*6090: Reconfigurable Computing Systems University of Guelph Review a pair of papers which cover the design and use of a reconfigurable computing platform called XiSystem This platform is composed of a system-on-chip including a RISC based processor with two major reconfigurable components which will be discussed

2 Overview High Level Overview Classification Motivation Design Issues
Detailed Design: Reconfigurable Function Unit Reconfigurable I/O Module Hardware/Software Co-Design Summary and Conclusions Overview: 1. Present a high-level overview of the XiSystem platform, reviewing the overall system architecture. 2. Fit this system into the established categories for classifying reconfigurable systems. 3. Why I think it is worthwhile to study this system and others like it. 4. Review the design issues associated with this particular kind of system. 5. Detailed design of the two reconfigurable elements in this system – a reconfigurable functional unit for the RISC processor and a reconfigurable I/O module which is connected to a standard peripheral bus. 6. Overview of the hardware/software co-design methodology which was developed for this system. 7. Wrap up with some discussion of the key points and conclusions.

3 High Level Overview Developed by ARCES lab at University of Bologna and STMicroelectronics VLIW RISC processor with reconfigurable functional unit: XiRisc Reconfigurable functional unit extends RISC processor capabilities for specialized DSP-like tasks Add reconfigurable logic to XiRisc processor specifically for reconfigurable IO tasks : XiSystem Reconfigurable IO allows implementation of application-specific interface protocols in hardware The XiSystem and XiRISC processor were developed by the ARCES lab at University of Bologna in Italy in collaboration with STMicrolectronics. Main processor in this system is Variable-Length-Instruction-Word RISC based processor with a reconfigurable logic-based functional unit as part of its datapath This reconfigurable unit is meant to extend the processor’s specialized DSP capabilities. Second reconfigurable part of this system is a reconfigurable logic-based I/O module which extends the XiRISC core Designed specifically for implementing input/output interfaces and protocols in reconfigurable logic

4 System Architecture Reconfigurable logic functional unit closely tied to core general-purpose processor (GPP) Reconfigurable IO module connected to GPP via system bus System architecture highlights: RISC core AHB – System bus APB – Peripheral bus Reconfigurable components (details to be covered later): Reconfigurable logic embeddeded in the RISC processor as a functional unit PiCoGA – Pipelined Configurable Gate Array Reconfigurable logic connected to system bus for dedicated reconfigurable I/O eFPGA – “embedded” FPGA

5 Preliminary Questions
How do we classify this type of reconfigurable computing system? Why are we interested in this type of system? What are some general design issues to consider for this kind of architecture? Some general question we would like to answer in studying this system…

6 Classifications of Reconfigurable Computing Systems
Review of classifications Coupling Granularity Heterogeneous Vs. Homogeneous Routing and Topology Reconfiguration Methodology Where does this system fit in these classifications? Coupling – how closely connected are the components, i.e. functional unit, coprocessor or peripheral level coupling? Granularity – how fine or coarse-grained are the reconfigurable elements in the system? H vs H – is the reconfigurable fabric homogeneous (all the same) or heterogenous (a mix of components) Routing & Topology – how are reconfigurable elements connected and layed out? Reconfiguration – how do we reconfigure the system? Can it be done once or at run-time?

7 System Classification
Coupling Very tight coupling at functional unit or “close” co-processor level Granularity Low-to-medium granularity, some specialized functional blocks Architecture, Routing and Topology Generally homogeneous logic cells Additional components and architecture make overall design heterogeneous Specialized “one-dimensional” routing and logic cell layout for functional unit Reconfiguration Methodology Well developed hardware/software co-design Some emphasis on run-time reconfiguration

8 Comparison to Other Systems
This type of system occupies one “corner” of a multi-dimensional design space for reconfigurable computing systems. Other “corners”: Soft-core in FPGAs Fixed GPPs in FPGAs Coarse-grained FPGAs All variations on architectural combinations between Reconfigurable Logic Fixed-function Logic General-purpose Processors Interconnection (i.e. coupling) Options Another way to classify this system is to compare it to other reconfigurable computing systems which we are familiar with…

9 Motivation Why be interested?
An architecture for applications where “small” amounts of reconfigurable logic can offer vast speed-ups An architecture for extending existing SoCs with reconfigurable logic Excellent candidates for complete hardware/software co-design flows Interesting contrast to recent FPGA technology developments

10 Design Issues Low-level design details
What kind of reconfigurable logic and supporting components What kind of routing/topology How do these issues relate to achieving a maximum performance from the reconfigurable logic? Integration of Functional Unit into Existing Core Instruction set changes Control unit changes Parallelism with existing datapath Memory access contention Communication with IO Co-processor Speed, bandwidth, overhead Hardware/Software Co-design How to integrate GPP programming (C-based) with reconfigurable functional unit? How to integrate interface/protocol design with reconfigurable IO module?

11 XiRisc: Reconfigurable Functional Unit
Extend a VLIW RISC-type processor with a reconfigurable functional unit 32-bit load/store architecture DSP-like functional units (multiply-accumulators) Pipelined Configurable Gate Array – PiCoGA Used to map complex, multi-cycle, pipelined data processing Features: Configurable datapath and pipelining Standard reconfigurable logic cells Asymmetric (directional) routing Run-time reconfiguration Detailed Design : PiCoGA Reconfigurable Functional Unit Key features: Datapath centric with heavy emphasis on pipelining in both logic design and usage Uses fairly standard (i.e. common) structure for RLCs Has some degree of directionality to the routing which supports the pipelining approach Has special emphasis on run-time reconfiguration which helps ensure the unit is re-usable in an application

12 XiRisc Processor Architecture
Existing datapath: 2 parallel ALUs plus other shared functional units Highlights of the RISC processor: Common 32-bit register file for each channel of the datapath Existing datapath is two parallel and symmetric data channels + some shared functional units PiCoGA – essentially a third datapath channel with its own connections to the register file PiCoGA: connected to same register file input/outputs as other function units

13 PiCoGA Structure & Routing
Inputs : 4 x 32-bit values Output: 2 x 32-bit values 192-bit configuration bus (related to run-time reconfiguration which will be discussed) RLC covered in more detail on next slide Routing can be seen – divided into horizontal and vertical connections which don’t need to be the same. Not a lot of detail provided on this feature, but vertical connections are meant to represent moving from one pipeline stage to another as opposed to horizontal connections which represent routing for one stage of the computation.

14 PiCoGA Reconfigurable Logic Cell
RLC features: Some input control logic (invert/swap/etc) Two standard LUTs – 4 input, 2 output (i.e. 16-entry LUT with 2-bit outputs) Registered outputs Dedicated look-ahead carry logic

15 Run-time Reconfiguration
Configuration cache holds four independent configurations for each logic cell Context switch in one cycle via special instruction Configuration/processing partitioning Different configurations can be loaded in different PiCoGA regions One computation and one reconfiguration can be executed simultaneously Second-level reconfiguration can occur through dedicated 192-bit bus in only 16 cycles Run-time reconfiguration is explicitly built into the system A dedicated configuration cache holds four configurations Loading a new configuration from this cache takes ONLY ONE CYCLE! Partial run-time reconfiguration possible by partitioning the PiCoGA Second-level reconfiguration possible as well – requires a second-level cache and takes 16 clock cycles

16 XiRisc: Sample Benchmarks
Enhance computational performance for DSP algorithms (encryption, coding, filtering, etc) Substantially reduced power consumption 15-20% of architecture without PiCoGA Best power consumption reduction is as high as 92% Partially from reduced memory access! Some bench marks: Highest speedup in encryption and coding/decoding type tasks Some filtering tasks offer good speed-ups as well Oddly enough, CRC is not sped-up very much? Variable number of PiCoGA rows required – demonstrates the level of pipelining involved Key point: Power consumption reduction! Paper reports as little as 15-20% (i.e % reduction) power consumption as the original architecture with best case of 92% reduction. Their analysis reports this is largely due to reduced memory accesses because intermediate data is stored in the PiCoGA and does not have to be constantly re-fetched from memory.

17 XiRisc: Design Advantages
Architecture provides computational speedup & power consumption improvements Functional unit parallelism maximizes usage of GPP and reconfigurable logic Efficient run-time reconfiguration improves re-usability of the reconfigurable logic

18 XiRisc: Design Drawbacks
Overall performance still heavily limited by memory access Size of reconfigurable logic is 50+% of the silicon area

19 XiSystem: XiRisc with Reconfigurable IO
Add reconfigurable logic to XiRisc processor specifically for reconfigurable IO tasks : XiSystem Reconfigurable IO connected via 32-bit system bus Used to implement application-specific protocols and interfaces Features Dedicated FIFO buffers Dedicated control, state and synchronization registers Reconfigurable logic fabric for implementing custom interfaces Directly connected to configurable IO pads Detailed Design : eFPGA Reconfigurable IO Module Useful for implementing application-specific protocols/interfaces Rarely need every interface in an embedded system With this, we only implement what we need! Key features: Dedicated buffers (not really part of reconfigurable logic) Dedicated registers around the reconfigurable logic to support control and synchronization Reconfigurable fabric is connected directly to the IO pads of the chip

20 Reconfigurable IO Architecture
Architecture highlights: Connected to AHB system bus eFPGA fabric is a “standard” reconfigurable logic fabric with typical RLCs (not covered in detail) Synchronization registers provide holding spot for data on two sides of the clocking domains (external clock and internal clock) Dedicated 1 KB FIFOs are the primary interface to and from the interface – controlled by GPP using dedicated control and state registers

21 XiSystem: Sample Results
Able to implement a variety of protocols and algorithms within the available reconfigurable logic RS232 – 39% logic utilization I2C – 8% logic utilization CRC– 32% logic utilization Reed-Solomon Coding– 20% logic utilization Reconfigurable logic allows the “interface” to offload computation from general purpose processor Pre/post data formatting and processing Error detection and correction Results: can implement some interface and protocol level features with varying levels of logic utilization Key point: reconfigurable logic can offload some protocol & processing tasks such as formatting, error detection, error correction,etc – see Reed-Solomon Coding example above.

22 Hardware/Software Co-Design
Complicated GPP and reconfigurable logic system Requires a good hardware/software co-design workflow Mixed design flow C-based flow for application processing Can be used by software engineers HDL-based flow for protocol/interface design Can be used by hardware engineers Co-design – important for this kind of highly integrated system, can make or break whether it is usable in the “real-world” This has been dealt with by the researchers: C-based co-design flow for software developers which automatically takes advantage of the PiCoGA functional unit HDL-based design flow exclusively for the hardware engineers using the eFPGA, but is still integrated into the co-design toolchain

23 XiSystem Application Design Flow
Application processing including PiCoGA mapping starts with specialized C-code Application flow points: Application starts with C-code Moves through loop of execution, simulation and profiling Profiling produces PiCoGA mapping for computational kernels PiCoGA mapping feeds into optimized versions of the C-code HDL code for eFPGA developed in parallel Synthesized eFPGA mapping is integrated into the design flow along with the PiCoGA mapping Reconfigurable IO mapping starts with customized HDL code

24 Summary & Conclusions Reviewed a complete SoC
Traditional GPP architecture Additional reconfigurable logic Specialized data processing Specialized interfacing System offers a number of key advantages Improved performance and power consumption Flexibility for application-specific changes and variable interfacing needs Complete hardware/software co-design balanced between sotware and hardware design needs

25 References Compton, K. and Hauck, S., “Reconfigurable Computing: A Survey of Systems and Software”, ACM Computing Surveys, Vol. 34, No.2, Jun. 2002 Todman et al., “Reconfigurable Computing: Architectures and Design Methods”, IEE Proc. Of Computers and Digital Techniques, Vol. 152, No. 2, Mar. 2005 Lodi et al., “A VLIW Processor with Reconfigurable Instruction Set for Embedded Applications”, IEEE Journal of Solid-State Circuits, Vol. 38, No. 11, Nov. 2003 Lodi et al., “XiSystem: A XiRisc-Based SoC with Reconfigurable IO Module”, IEEE Journal of Solid-State Circuits, Vol. 41, No. 1, Jan. 2006


Download ppt "Paper Review: XiSystem - A Reconfigurable Processor and System"

Similar presentations


Ads by Google