Ottawa, January 9, 2014 FETCH FlexTiles: runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore Olivier Sentieys INRIA.

Slides:



Advertisements
Similar presentations
Professur für Technische Informatik A Self Distributing Virtual Machine for FPGA Multicores Klaus Waldschmidt J. W. Goethe-University Technische Informatik.
Advertisements

Reconfigurable Computing After a Decade: A New Perspective and Challenges For Hardware-Software Co-Design and Development Tirumale K Ramesh, Ph.D. Boeing.
Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Computer Science, University of Oklahoma Reconfigurable Versus Fixed Versus Hybrid Architectures John K. Antonio Oklahoma Supercomputing Symposium 2008.
HTR: On-Chip Hardware Task Relocation for Partially Reconfigurable FPGAs + Also Affiliated with NSF Center for High- Performance Reconfigurable Computing.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Hardwired networks on chip for FPGAs and their applications
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Dr. José M. Reyes Álamo 1.  Course website  Syllabus posted.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
General Purpose FIFO on Virtex-6 FPGA ML605 board midterm presentation
- 1 - A Powerful Dual-mode IP core for a/b Wireless LANs.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Reconfigurable Hardware in Wearable Computing Nodes Christian Plessl 1 Rolf Enzler 2 Herbert Walder 1 Jan Beutel 1 Marco Platzner 1 Lothar Thiele 1 1 Computer.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Operating Systems for Reconfigurable Systems John Huisman ID:
Paper Review: XiSystem - A Reconfigurable Processor and System
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Processes and OS basics. RHS – SOC 2 OS Basics An Operating System (OS) is essentially an abstraction of a computer As a user or programmer, I do not.
J. Christiansen, CERN - EP/MIC
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Embedded Runtime Reconfigurable Nodes for wireless sensor networks applications Chris Morales Kaz Onishi 1.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
2D/3D Integration Challenges: Dynamic Reconfiguration and Design for Reuse.
Reconfigurable Computing Ender YILMAZ, Hasan Tahsin OĞUZ.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
© imec 2003 Designing an Operating System for a Heterogeneous Reconfigurable SoC Vincent Nollet, P. Coene, D. Verkest, S. Vernalde, R. Lauwereins IMEC,
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
Heterogeneous Technology Alliance Heterogeneous multi-core.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Embedded Systems. What is Embedded Systems?  Embedded reflects the facts that they are an integral.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Software and Communication Driver, for Multimedia analyzing tools on the CEVA-X Platform. June 2007 Arik Caspi Eyal Gabay.
Programmable Logic Devices
Programmable Hardware: Hardware or Software?
Nios II Processor: Memory Organization and Access
Flexible FPGA based platform for variable rate signal generation
Dynamo: A Runtime Codesign Environment
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
James Coole PhD student, University of Florida Aaron Landy Greg Stitt
System On Chip.
Fault-Tolerant NoC-based Manycore system: Reconfiguration & Scheduling
FPGA: Real needs and limits
Hierarchical Architecture
FPGAs in AWS and First Use Cases, Kees Vissers
Improving java performance using Dynamic Method Migration on FPGAs
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
Wireless ATM PRESENTED BY : NIPURBA KONAR.
Anne Pratoomtong ECE734, Spring2002
Dynamically Reconfigurable Architectures: An Overview
Embedded systems, Lab 1: notes
System Architecture for On-Chip Networks
Characteristics of Reconfigurable Hardware
RECONFIGURABLE NETWORK ON CHIP ARCHITECTURE FOR AEROSPACE APPLICATIONS
Network-on-Chip Programmable Platform in Versal™ ACAP Architecture
Design.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Ottawa, January 9, 2014 FETCH FlexTiles: runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore Olivier Sentieys INRIA and University of Rennes CAIRN project-team http://www.irisa.fr/cairn/ Jan. 2014

The Story What is a heterogeneous multi-core? Coupling FPGA fabrics to many-core in 2D Then what brings 3D? Flexibility and better resource usage FlexTiles architecture Specific features for task migration Virtual Bit-Stream The case for heterogeneous fabrics CAIRN project-team Jan. 2014

Domain-Specific System-on-chip (SoC) with Hardware Reconfiguration Dynamically adapt the hardware to the application Energy-efficiency High-performance Flexibility Self-adapting devices Continuously adapt to changing environments Other advantages Error, fault and variation tolerance Security against attacks Complementary to general-purpose architectures, “domain-specific soc” are specifically designed for an application domain (such as a wireless terminal or a set-top-box). They classically include several processors (GPP, DSP) and many HW IP blocks so that the energy efficiency and the performance are maximized. Our aim is to propose new architectures and tools for these SoCs with a particular emphasis on reconfigurable hardware. By adapting the hardware to the application, this flexible hardware exhibit a good trade-off between performance, energy and flexibility. Moreover, by taking advantage of dynamic reconfiguration, which means that the hardware can be reconfigured at run-time during execution, we can propose self-adapting devices that can continuously adapt their structure to the environments and of course to the application that is run on it. Finally, other constraints can benefit from these reconfigurable SoC. hardware reconfiguration can be used to mitigate errors, temporary faults or process variation, and can moreover be used to increase the protection against attacks when security matters. SoC from CEA with DART reconfigurable architecture from IRISA/INRIA - CAIRN CAIRN project-team Jan. 2014

Heterogeneous Multicores Many cores on a single chip for both general-purpose and embedded computing Heterogeneous manycores to cope with energy and performance constraints Core Another strong trend in our domain is the possibility to integrate in a near future thousands of cores on a single chip. And this is true for both general-purpose and embedded computing architectures. Cairn will of course continue to focus only on the second category. We foresee that these systems will be heterogeneous multicores to cope with both energy and performance constraints. In our case a heterogeneous multicore architecture is a regular multicore in which each basic core is not only a processor+memory, but also some HW accelerators. And of course we will continue to study the impact of reconfigurable (fine-grain or coarse-grain) accelerators in these multicores and their ability for • power management and for • fault tolerance. Proc. Reconf. HW M HW IPs CAIRN project-team July 2012

Multicores Coupled with Reconfig. HW 2D SoC Tightly- coupled HW Proc. Reconf. HW M Proc. HW IP M CAIRN project-team Jan. 2014

Multicores Coupled with Reconfig. HW 2D SoC Loosely- coupled HW I/O Configuration RAM Configuration Controler DSP RAM HW Accelerator #1 Core CAIRN project-team Jan. 2014

Can 3D Stacking Help? 3D-Stacked Reconfigurable Accelerators Improved performance Improved flexibility Improved resource usage Core reconfigurable layer multicore layer As just mentioned we will particularly focus our research on “how to embed into a multicore efficient hardware accelerators with run-time reconfiguration”. In this aim we will propose and design new configuration structures such that the dynamic reconfiguration becomes much more efficient and such that the reconfiguration layer can be virtualized to be more efficiently managed by the software interface. It means we want that a task can be moved inside the fabric very easily without a new place&route. We will also study how such architectures can take advantage of 3D stacking. In the context of the FP7 FLEXTILES project we will propose • 3D-stacked reconfigurable accelerators. The main advantage is to have a unified reconfigurable fabric which is linked to the multicore by a 3D network on chip. In that case, there is no predefined area dedicated to one specific core and this is much more flexible than the 2D case. Core CAIRN project-team Jan. 2014

Reconfiguration Controler What’s new with 3D? 3D NI Reconfiguration RAM Reconfiguration Controler RAM DSP CAIRN project-team Jan. 2014

FP7 FlexTiles Project in a Nutshell FlexTiles: Self adaptive heterogeneous manycore based on Flexible Tiles Oct. 2011 — Sept. 2014 http://www.flextiles.eu Partners: Thales, Sundance, ACE, UR1, CEA, KIT, TU/e, CSEM, RUB Provide a heterogeneous many-core architecture offering Large flexibility High-performance, energy efficiency Raised programming efficiency Self-adaptation through virtualisation CAIRN project-team Jan. 2014

FlexTiles Architecture Overview 3D-Stacked Heterogeneous manycore General Purpose Processors (GPP), for flexibility and programming homogeneity Accelerators, for computing efficiency Digital Signal Processors (DSP) Dedicated hardware accelerators on an embedded FPGA (eFPGA) Network On Chip (NoC): ANoC and Aethereal Reconfigurable layer with improved relocation and migration capabilities Virtualization layer to provide an abstraction of the manycore and self adaptive services Tool-chain for parallelisation and compilation CAIRN project-team Jan. 2014

FlexTiles Architecture Overview Physical nodes GPP node DSP node DDR node eFPGA acc. A “Tile” associates 1 master node 1+ slave nodes A tile is a logical view for architecture programming GPP Node AI DSP eFPGA Fabric NI NoC Config. Ctrl. DDR Node Tile I/O HW acc. Physical nodes in consideration Logical nodes from a functional point of view CAIRN project-team Jan. 2014

FlexTiles Architecture Overview CAIRN project-team Jan. 2014

Outline What is a heterogeneous multi-core? Coupling FPGA fabrics to many-core in 2D Then what brings 3D? Flexibility and better resource usage FlexTiles architecture Specific features for task migration Virtual Bit-Stream The case for heterogeneous fabrics CAIRN project-team Jan. 2014

Task Allocation & Migration in FPGA Predefined reconfigurable regions Bit-stream depends on task location HW Accelerator #1 BS #1 HW Accelerator #1 BS #2 CAIRN project-team Jan. 2014

HW Task Migration in eFPGA 3D NI RAM HW Accelerator #1 BS #1 HW Accelerator #2 BS #2 CAIRN project-team Jan. 2014

Concept of Virtual Bit-Stream A task is synthesized and place&route into a Virtual Bit-Stream (VBS) Independent from task physical location in the fabric No predefined configuration domains Resource sharing/distribution easiness, simplified task migration Quartus II CAIRN project-team Jan. 2014

Virtual Bit-Stream: Example CLBIN[1] CLBIN[2] CLBIN[3] CLBOUT CLBIN[0] 4 5 6 7 12 13 14 15 0 1 2 3 8 9 10 11 16 17 18 19 20 Hiding routing details Full BS is 129 bits Could be reduced by giving less details CAIRN project-team Jan. 2014

Virtual Bit-Stream: Example 4 5 6 7 0 1 2 3 8 9 10 11 16 17 18 19 20 12 13 14 15 Hiding routing details List of I/O and connections 20  8 1  9 5  18 CAIRN project-team Jan. 2014

Virtual Bit-Stream VBS generation principle can be extended for a set of routing resources Smaller size in configuration memory CAIRN project-team Jan. 2014

Results: BS Sizes on MCNC Benchmarks CAIRN project-team Jan. 2014

Results: VBS Sizes on MCNC Benchmarks CAIRN project-team Jan. 2014

eFPGA Architecture using VBS Reconfiguration controller generate final BS at run-time Reconfiguration controller External memory VBS 1 VBS 2 VBS 3 VBS N … Buffer memory data control 1 2 VBS en mémoire externe Requête d’un nœud de supervision => chargement VBS ou refus Finalisation du VBS (routage), placement relatif des éléments fixe CAIRN project-team Jan. 2014

Outline What is a heterogeneous multi-core? Coupling FPGA fabrics to many-core in 2D Then what brings 3D? Flexibility and better resource usage FlexTiles architecture Specific features for task migration Virtual Bit-Stream The case for heterogeneous fabrics CAIRN project-team Jan. 2014

Task Placement & Migration Homogeneous case No constraint on task placement Regular routing architecture Cope with heterogeneity RAM, DSP, 3D I/Os Migration is limited vertically to the same column to the next column containing same complex blocks Logic Element (LE) Configured LE Task CAIRN project-team Jan. 2014

eFPGA: Complex blocks handling Heterogeneous blocks routing is abstracted from logic routing Long lines allow a trade-off between placement flexibility and routing complexity A two-level routing is performed at runtime: Logic routing (as in the homogeneous case) Heterogeneous block routing through long lines CAIRN project-team Jan. 2014

eFPGA: Complex blocks handling Delay depends on final placement Only worst-case delay can be estimated offline Flexibility is still limited in the vertical axis multiple of block height Length of long lines and connections long- lines – routing-resources should be limited Area overhead CAIRN project-team Jan. 2014

Conclusion FlexTiles: a self-adaptive heterogeneous multicore eFPGA layer 3D-stacked to processor layer Flexible resource allocation/sharing Seamless task migration Virtual Bit-Stream CAIRN project-team Jan. 2014

Thanks! Questions? CAIRN project-team Jan. 2014