Download presentation
Presentation is loading. Please wait.
Published byAudra Ball Modified over 9 years ago
1
1 © FASTER Consortium Proprietary Novel Design Methods and a Tool Flow for Unleashing Dynamic Reconfiguration Kyprianos Papadimitriou, Christian Pilato, Dionisios Pnevmatikatos, Marco. D. Santambrogio, Catalin Ciobanu, Tim Todman, Tobias Becker, Tom Davidson, Xinyu Niu, Georgi Gaydadjiev, Wayne Luk, Dirk Stroobandt Foundation for Research and Technology-Hellas (FORTH) IEEE/IFIP Int’l Conference on Embedded and Ubiquitous Computing (EUC) Cyprus, Dec 5-7, 2012
2
2 © FASTER Consortium Proprietary Reconfiguration “The process of physically altering the location or functionality of network or system elements. Automatic configuration describes the way sophisticated networks can readjust themselves in the event of a link or device failing, enabling the network to continue operation” Gerald Estrin, 1960
3
3 © FASTER Consortium Proprietary Reconfigurable Technology Technology for adaptable hardware systems –Add/remove components at run-time/product lifetime –Flexibility at hardware speed (not quite ASIC) –Parallelism at hardware level (depending on application) –Ideally: alter function & interconnection of blocks; dynamically Implementation in: –Field Programmable Gate Arrays (FPGAs): fine grain, complex gate + memory blocks + DSP blocks, etc. –Coarse Grain chips: multiple ALUs, multiple (simple) programmable processing blocks, etc.
4
4 © FASTER Consortium Proprietary Technological Status - Opportunity Programming has become very difficult –Impossible to balance all constraints manually & effectively More than ever before –Cores are free, reconfigurable computational horse-power logic available on chip, cores can be heterogeneous Energy tends to be #1 in priority –Software must become energy and space-aware FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) vision: –Optimize and meet changing requirements while taking advantage of the underlying complex architectures
5
5 © FASTER Consortium Proprietary Partners
6
6 © FASTER Consortium Proprietary Outline Motivation Scope Design Methods 1.High-level Analysis 2.Partitioning Methodology 3.Region-based Reconfiguration 4.Micro-reconfiguration 5.Baseline Scheduling and Mapping 6.Verification Tool Flow Run-time System Manager Experimental Systems
7
7 © FASTER Consortium Proprietary FASTER Motivation Focus on fine-grain reconfiguration (but not-limited) Creating reconfigurable systems is not straightforward. The designer has to: –Identify portions to be reconfigured –Establish a schedule that (a) respects dependencies, (b) achieves good performance, (c) meets constraints –Manage system resources (reconfiguration area mainly) –Consider reconfiguration cost –Verify a changing system Tool support for these tasks is still quite basic
8
8 © FASTER Consortium Proprietary FASTER Scope Include reconfigurability as an explicit design concept in designing systems with reconfigurable resources Balance effectively performance, power, area Propose new design methods for HW/SW systems; integrate them into a unified tool flow Provide flexibility, while keeping complexity low Efficient and transparent runtime support
9
9 © FASTER Consortium Proprietary 1. High-level Analysis Automatically identify and exploit run-time reconfiguration opportunities –While optimizing resource utilization Based on –Data Flow Graph (DFG) –Application parameters, e.g. input data size –Physical design constraints, e.g. area, memory bandwidth Output –Estimated values for consumed area, computation time, reconfiguration time, power consumption –Identification of partitions (determination of data dependencies, idle functions etc)
10
10 © FASTER Consortium Proprietary 2. Partitioning Methodology Employ methods for –Partitioning tasks between SW and HW –Identifying proper level of reconfiguration for HW tasks, i.e. none, region-based, micro-reconfiguration –Task graph transformation, e.g. clustering consecutive tasks assigned to the same processing element Characteristics taken into account –Communication costs –Logic dedicated to cores –Physical design constraints, e.g. area, memory bandwidth –Power consumption –Computation time –Reconfiguration time
11
11 © FASTER Consortium Proprietary 3. Region-based Reconfiguration Function(s) encapsulated into a specific region of the FPGA –Process carried out at design time by creating bitfiles for specific regions A region can be reconfigured while the rest FPGA executes –“On the fly reconfiguration” FASTER research challenge: relocation support –Loading function(s) into a different region than it was originally created for
12
12 © FASTER Consortium Proprietary 4. Micro-reconfiguration In some applications we can identify fast changing inputs vs. slow ‐ changing “parameters” –Triggers a small-scale reconfiguration to specialize a circuit dynamically –Results in smaller and faster circuit vs. original one We want to –Identify the parameters (use of profiler vs. manually) –Create bitfile with “holes” –Parameter values => reconfiguration bits for missing “holes” –Perform fine grain changes; allows for fast reconfiguration –Extend the idea from logic (TLUT) to wires (TCON)
13
13 © FASTER Consortium Proprietary 4. Micro-reconfiguration (cont’d)
14
14 © FASTER Consortium Proprietary 5. Baseline Scheduling and Mapping Performed after generation of HW cores & interfaces Cores characterized in terms of resources –To evaluate compatibility of the implementation of a reconfigurable region candidate for the mapping –To annotate the corresponding implementation associated with each task Determination of reconfigurable regions –Amount of regions; Size; Position; Constraints Provide initial assignment of the tasks tagged for region-based reconfiguration onto specific regions Supports alternative mapping Baseline scheduling; to drive the runtime scheduler
15
15 © FASTER Consortium Proprietary 6. Verification Check if behaviour of optimized design (target) = unoptimized design (source) Traditional approach: extensive simulation –Large test inputs; all cases covered? FASTER approach –Combine symbolic simulation with equivalence checking In some cases static approaches aren’t enough. Dynamic aspects of behaviour to be verified at – compile time (virtual multiplexers to model mutually exclusive reconfigurable regions), – runtime (light-weight support for small impact on performance)
16
16 © FASTER Consortium Proprietary 6. Verification (cont’d) Source Equivalent? Equivalent Not equivalent, couter-example Checker Symbolic simulator Compiler Target Transformations Symbolic input Output (from source) Output (from target) SourceTarget YesNo Compile to simulation Design optimization Symbolic simulation Validation Source Target
17
17 © FASTER Consortium Proprietary Shaping the Tool Flow - 1 System described in XML format 4 independent XML parts are analyzed, generated and updated through iterations Starting point is a C description of the application –Annotated with OpenMP pragmas
18
18 © FASTER Consortium Proprietary Shaping the Tool Flow - 2 High-level analysis Estimation of metrics (power, speed, area) App task profiling + Identification of reconfigurable cores Optimization of app for Region-based & micro- reconfiguration Compile-time baseline scheduling and core mapping into reconfigurable regions Platform Architecture App Task Graph Performance Characteristics
19
19 © FASTER Consortium Proprietary Tool Flow - Putting it All Together Incorporates all design methods Design automation methodology to generate HW and SW components Exploits dynamic reconfiguration for different target platforms Runtime system support Outcome quality is evaluated with regard to: –Amount of FPGA resources –Clock frequency –Reconfiguration time –Power and energy consumption
20
20 © FASTER Consortium Proprietary Run-Time System Manager (RTSM) RTSM in traditional systems –Low level operations to relieve the programmer from dealing with delicate operations –Actions transparent to the programmer –Implemented as a standard library RTSM for partial and dynamic reconfiguration –Extend the OS capabilitilies –Seamless, easy integration into the existing system –Handle efficient on-line scheduling and placement of tasks Advanced mechanisms need to be supported –Scheduling –Defragmentation = f(relocation, scheduling) –Configuration management (caching, prefetching) –Thermal management
21
21 © FASTER Consortium Proprietary Configuration Content Agnostic ISA I/F Based on the Molen model FPGA viewed as co-processor, extends the GPP architecture Arbiter between the memory and the GPP Register File XREGs used to pass parameters between GPP and reconfigurable units ISA needs to be expanded with more instructions –Minimal set: SET, EXECUTE, MOVTX and MOVFX –Additional instructions to support partial reconfiguration, prefetching, and GPP-FPGA parallel execution
22
22 © FASTER Consortium Proprietary Actions at Design Time Task Configuration Microcode –Stored at Bitstream (BS) Address –BS length has the bitstream size –Task Parameter Address (TPA) points to the input/output parameters –Task width/height –Execution Time Per Unit of Data (ETPUD)
23
23 © FASTER Consortium Proprietary Actions at Design Time (cont’d) Micro-reconfiguration support –RT flag : reconfiguration type –N : the number of parameters of the parameterized configuration –N parameter width / XREG index pairs –A binary representation of the parameterized configuration data
24
24 © FASTER Consortium Proprietary RTSM Scheduler Responsible for –The time slot in which reconfiguration of a task module will occur –The portion of the FPGA in which a HW task will be placed –The time slot in which its execution will start How –Input from a dependency/communication graph –Based on a list of criteria, e.g. reconfiguration time, area constraint, precedence between the modules, fragmentation level, power –Directions from baseline scheduling
25
25 © FASTER Consortium Proprietary RTSM Scheduler Input Requirements Static parameters, i.e determined at compile time and are not changed at runtime –Size of reconfigurable areas –Reconfiguration time = f (bitstream size, reconfiguration mechanism+path) –ETPUD (Execution Time Per Unit of Data) –Tasks assigned to be executed in fixed Processing Elements (PE), i.e. CPU or static HW certain reconfigurable areas Dynamic parameters, i.e. updated at runtime –ABD
26
26 © FASTER Consortium Proprietary Fixed interface for communication of cores with a runtime manager Scheduling policies implemented as libraries Software cores taken into account during exploration Edge Detection app running on a XUPV5 FPGA board –An embedded processor used as the execution manager –2 nd processor for execution of SW tasks and reconfiguration GUI for the designer (minimize errors in XML file) –Add implementations –Task mapping –Selection of parameters of the architecture (e.g. memory addresses) Experimental System - Embedded
27
27 © FASTER Consortium Proprietary XUPV5 FPGA board plugged onto a PCIe 1x –CentOS; transactions performed through DMA (1.5 Gbps) SW components include a user application and a kernel driver –User application issues an IOCTL call to send/receive data to/from kernel driver –Driver is responsible for low-level data transfer Reconfiguration performed through JTAG –Using vendor's USB programmer Host SW awaiting user’s selection –Different bitstreams stored in host HDD –Select precompiled circuit; configure FPGA; control communication between host-FPGA; deliver results back to the user; operation transparent to the user (runtime system) Experimental System - Desktop
28
28 © FASTER Consortium Proprietary Summary Reconfiguration feature inserted at early stages of system designing New design methods combined to form a new tool flow Abstract view of the system but efficient use Target application domains: embedded, desktop, high performance computing http://www.fp7-faster.eu/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.