Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ottawa, January 9, 2014 FETCH FlexTiles: runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore Olivier Sentieys INRIA.

Similar presentations


Presentation on theme: "Ottawa, January 9, 2014 FETCH FlexTiles: runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore Olivier Sentieys INRIA."— Presentation transcript:

1 Ottawa, January 9, FETCH FlexTiles: runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore Olivier Sentieys INRIA and University of Rennes CAIRN project-team Jan. 2014

2 The Story What is a heterogeneous multi-core?
Coupling FPGA fabrics to many-core in 2D Then what brings 3D? Flexibility and better resource usage FlexTiles architecture Specific features for task migration Virtual Bit-Stream The case for heterogeneous fabrics CAIRN project-team Jan. 2014

3 Domain-Specific System-on-chip (SoC) with Hardware Reconfiguration
Dynamically adapt the hardware to the application Energy-efficiency High-performance Flexibility Self-adapting devices Continuously adapt to changing environments Other advantages Error, fault and variation tolerance Security against attacks Complementary to general-purpose architectures, “domain-specific soc” are specifically designed for an application domain (such as a wireless terminal or a set-top-box). They classically include several processors (GPP, DSP) and many HW IP blocks so that the energy efficiency and the performance are maximized. Our aim is to propose new architectures and tools for these SoCs with a particular emphasis on reconfigurable hardware. By adapting the hardware to the application, this flexible hardware exhibit a good trade-off between performance, energy and flexibility. Moreover, by taking advantage of dynamic reconfiguration, which means that the hardware can be reconfigured at run-time during execution, we can propose self-adapting devices that can continuously adapt their structure to the environments and of course to the application that is run on it. Finally, other constraints can benefit from these reconfigurable SoC. hardware reconfiguration can be used to mitigate errors, temporary faults or process variation, and can moreover be used to increase the protection against attacks when security matters. SoC from CEA with DART reconfigurable architecture from IRISA/INRIA - CAIRN CAIRN project-team Jan. 2014

4 Heterogeneous Multicores
Many cores on a single chip for both general-purpose and embedded computing Heterogeneous manycores to cope with energy and performance constraints Core Another strong trend in our domain is the possibility to integrate in a near future thousands of cores on a single chip. And this is true for both general-purpose and embedded computing architectures. Cairn will of course continue to focus only on the second category. We foresee that these systems will be heterogeneous multicores to cope with both energy and performance constraints. In our case a heterogeneous multicore architecture is a regular multicore in which each basic core is not only a processor+memory, but also some HW accelerators. And of course we will continue to study the impact of reconfigurable (fine-grain or coarse-grain) accelerators in these multicores and their ability for • power management and for • fault tolerance. Proc. Reconf. HW M HW IPs CAIRN project-team July 2012

5 Multicores Coupled with Reconfig. HW
2D SoC Tightly- coupled HW Proc. Reconf. HW M Proc. HW IP M CAIRN project-team Jan. 2014

6 Multicores Coupled with Reconfig. HW
2D SoC Loosely- coupled HW I/O Configuration RAM Configuration Controler DSP RAM HW Accelerator #1 Core CAIRN project-team Jan. 2014

7 Can 3D Stacking Help? 3D-Stacked Reconfigurable Accelerators
Improved performance Improved flexibility Improved resource usage Core reconfigurable layer multicore layer As just mentioned we will particularly focus our research on “how to embed into a multicore efficient hardware accelerators with run-time reconfiguration”. In this aim we will propose and design new configuration structures such that the dynamic reconfiguration becomes much more efficient and such that the reconfiguration layer can be virtualized to be more efficiently managed by the software interface. It means we want that a task can be moved inside the fabric very easily without a new place&route. We will also study how such architectures can take advantage of 3D stacking. In the context of the FP7 FLEXTILES project we will propose • 3D-stacked reconfigurable accelerators. The main advantage is to have a unified reconfigurable fabric which is linked to the multicore by a 3D network on chip. In that case, there is no predefined area dedicated to one specific core and this is much more flexible than the 2D case. Core CAIRN project-team Jan. 2014

8 Reconfiguration Controler
What’s new with 3D? 3D NI Reconfiguration RAM Reconfiguration Controler RAM DSP CAIRN project-team Jan. 2014

9 FP7 FlexTiles Project in a Nutshell
FlexTiles: Self adaptive heterogeneous manycore based on Flexible Tiles Oct — Sept. 2014 Partners: Thales, Sundance, ACE, UR1, CEA, KIT, TU/e, CSEM, RUB Provide a heterogeneous many-core architecture offering Large flexibility High-performance, energy efficiency Raised programming efficiency Self-adaptation through virtualisation CAIRN project-team Jan. 2014

10 FlexTiles Architecture Overview
3D-Stacked Heterogeneous manycore General Purpose Processors (GPP), for flexibility and programming homogeneity Accelerators, for computing efficiency Digital Signal Processors (DSP) Dedicated hardware accelerators on an embedded FPGA (eFPGA) Network On Chip (NoC): ANoC and Aethereal Reconfigurable layer with improved relocation and migration capabilities Virtualization layer to provide an abstraction of the manycore and self adaptive services Tool-chain for parallelisation and compilation CAIRN project-team Jan. 2014

11 FlexTiles Architecture Overview
Physical nodes GPP node DSP node DDR node eFPGA acc. A “Tile” associates 1 master node 1+ slave nodes A tile is a logical view for architecture programming GPP Node AI DSP eFPGA Fabric NI NoC Config. Ctrl. DDR Node Tile I/O HW acc. Physical nodes in consideration Logical nodes from a functional point of view CAIRN project-team Jan. 2014

12 FlexTiles Architecture Overview
CAIRN project-team Jan. 2014

13 Outline What is a heterogeneous multi-core?
Coupling FPGA fabrics to many-core in 2D Then what brings 3D? Flexibility and better resource usage FlexTiles architecture Specific features for task migration Virtual Bit-Stream The case for heterogeneous fabrics CAIRN project-team Jan. 2014

14 Task Allocation & Migration in FPGA
Predefined reconfigurable regions Bit-stream depends on task location HW Accelerator #1 BS #1 HW Accelerator #1 BS #2 CAIRN project-team Jan. 2014

15 HW Task Migration in eFPGA
3D NI RAM HW Accelerator #1 BS #1 HW Accelerator #2 BS #2 CAIRN project-team Jan. 2014

16 Concept of Virtual Bit-Stream
A task is synthesized and place&route into a Virtual Bit-Stream (VBS) Independent from task physical location in the fabric No predefined configuration domains Resource sharing/distribution easiness, simplified task migration Quartus II CAIRN project-team Jan. 2014

17 Virtual Bit-Stream: Example
CLBIN[1] CLBIN[2] CLBIN[3] CLBOUT CLBIN[0] 16 17 18 Hiding routing details Full BS is 129 bits Could be reduced by giving less details CAIRN project-team Jan. 2014

18 Virtual Bit-Stream: Example
16 17 18 Hiding routing details List of I/O and connections 20  8 1  9 5  18 CAIRN project-team Jan. 2014

19 Virtual Bit-Stream VBS generation principle can be extended for a set of routing resources Smaller size in configuration memory CAIRN project-team Jan. 2014

20 Results: BS Sizes on MCNC Benchmarks
CAIRN project-team Jan. 2014

21 Results: VBS Sizes on MCNC Benchmarks
CAIRN project-team Jan. 2014

22 eFPGA Architecture using VBS
Reconfiguration controller generate final BS at run-time Reconfiguration controller External memory VBS 1 VBS 2 VBS 3 VBS N Buffer memory data control 1 2 VBS en mémoire externe Requête d’un nœud de supervision => chargement VBS ou refus Finalisation du VBS (routage), placement relatif des éléments fixe CAIRN project-team Jan. 2014

23 Outline What is a heterogeneous multi-core?
Coupling FPGA fabrics to many-core in 2D Then what brings 3D? Flexibility and better resource usage FlexTiles architecture Specific features for task migration Virtual Bit-Stream The case for heterogeneous fabrics CAIRN project-team Jan. 2014

24 Task Placement & Migration
Homogeneous case No constraint on task placement Regular routing architecture Cope with heterogeneity RAM, DSP, 3D I/Os Migration is limited vertically to the same column to the next column containing same complex blocks Logic Element (LE) Configured LE Task CAIRN project-team Jan. 2014

25 eFPGA: Complex blocks handling
Heterogeneous blocks routing is abstracted from logic routing Long lines allow a trade-off between placement flexibility and routing complexity A two-level routing is performed at runtime: Logic routing (as in the homogeneous case) Heterogeneous block routing through long lines CAIRN project-team Jan. 2014

26 eFPGA: Complex blocks handling
Delay depends on final placement Only worst-case delay can be estimated offline Flexibility is still limited in the vertical axis multiple of block height Length of long lines and connections long- lines – routing-resources should be limited Area overhead CAIRN project-team Jan. 2014

27 Conclusion FlexTiles: a self-adaptive heterogeneous multicore
eFPGA layer 3D-stacked to processor layer Flexible resource allocation/sharing Seamless task migration Virtual Bit-Stream CAIRN project-team Jan. 2014

28 Thanks! Questions? CAIRN project-team Jan. 2014


Download ppt "Ottawa, January 9, 2014 FETCH FlexTiles: runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore Olivier Sentieys INRIA."

Similar presentations


Ads by Google