Split Compilation for Accelerator-based Multicores Panagiotis Theocharis Computer Systems Lab (CSL) Ghent University.

Slides:

Advertisements

Similar presentations

1 VIRTUAL MACHINES By: Sai Siddharth Kumar Dantu.

Advertisements

Goal: Split Compiler LLVM LLVM – DRESC bytecode staticdeployment time optimized architecture description compiler strategy ML annotations C code ADRES.

A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.

Original Development Team The Compiler and Architecture Research Group (formerly part of Hewlett-Packard Laboratories) Illinois Microarchitecture Project.

University of Michigan Electrical Engineering and Computer Science 1 Polymorphic Pipeline Array: A Flexible Multicore Accelerator with Virtualized Execution.

Where Do the 7 layers “fit”? Or, where is the dividing line between hdw & s/w? ? ?

University of Michigan Electrical Engineering and Computer Science 1 Reducing Control Power in CGRAs with Token Flow Hyunchul Park, Yongjun Park, and Scott.

Components for high performance grid programming in the GRID.it project 1 Workshop on Component Models and Systems for Grid Applications - St.Malo 26 june.

1 STRUCTURE CHARTS Elements and Definitions. 2 Software System Design translates SRS into a ===> software system architecture: –system’s static structure.

Using a CSP based Programming Model for Reconfigurable Processor Arrays By: Zain-ul-Abdin

Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.

1 Programming Languages Translation  Lecture Objectives:  Be able to list and explain five features of the Java programming language.  Be able to explain.

Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)

HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.

Code Coverage Testing Using Hardware Performance Monitoring Support Alex Shye, Matthew Iyer, Vijay Janapa Reddi and Daniel A. Connors University of Colorado.

5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 MESCAL Application Modeling and Mapping: Warpath Andrew Mihal and the MESCAL team UC Berkeley.

An Overview of Virtual Machine Architectures by J.E. Smith and Ravi Nair presented by Sebastian Burckhardt University of Pennsylvania CIS 700 – Virtualization.

Dr. José M. Reyes Álamo 1.  Course website  Syllabus posted.

1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.

Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.

- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”

EECE **** Embedded System Design

Mahdi Hamzeh, Aviral Shrivastava, and Sarma Vrudhula School of Computing, Informatics, and Decision Systems Engineering Arizona State University June 2013.

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,

Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:

1 Optimizing compiler tools and building blocks project Alexander Drozdov, PhD Sergey Novikov, PhD.

CML REGISTER FILE ORGANIZATION FOR COARSE GRAINED RECONFIGURABLE ARCHITECTURES (CGRAs) Dipal Saluja Compiler Microarchitecture Lab, Arizona State University,

Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.

Task Graph Scheduling for RTR Paper Review By Gregor Scott.

Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.

Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.

A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

Full and Para Virtualization

CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.

Introduction Why are virtual machines interesting?

Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

Computing Systems: Next Call for Proposals Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.

Machine Learning in Compiler Optimization By Namita Dave.

Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Intelligent Compilation John Cavazos Computer & Information Sciences Department.

Hyunchul Park†, Kevin Fan†, Scott Mahlke†,

Design and implementation Chapter 7 – Lecture 1. Design and implementation Software design and implementation is the stage in the software engineering.

ECE 587 Hardware/Software Co- Design Lecture 26/27 CUDA to FPGA Flow Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.

ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.

Welcome! Simone Campanoni

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Dynamo: A Runtime Codesign Environment

Ph.D. in Computer Science

SOFTWARE DESIGN AND ARCHITECTURE

Java programming lecture one

Chapter 1: Introduction

CMPE419 Mobile Application Development

1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

LAB 01 Installation of VIRTUAL MACHINE and LINUX

URECA: A Compiler Solution to Manage Unified Register File for CGRAs

Compiler Construction

An Overview of Virtual Machine Architectures

Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,

CMPE419 Mobile Application Development

(via graph coloring and spilling)

Presentation transcript:

Split Compilation for Accelerator-based Multicores Panagiotis Theocharis Computer Systems Lab (CSL) Ghent University

Problem Statement Ever increasing performance and power-efficiency needs ASICs/ASIPs are becoming unaffordable Heterogeneous concurrency

Accelerator-based Multicores General purpose cores Specialized accelerators Interfacing

Virtualizing the HW/SW Interface Reuse legacy code Auto-tune for efficiency Immune to hardware innovation Flexible resource allocation

Our approach: Split Compilation Static Phase Dynamic Phase Dynamic Phase bytecode code executable annotations Architecture description Offline Time-consuming analyses Hardware-independent optimizations Install/Load/Run time Quick decisions Actual code mapping

Target Hardware Platform Two functional views/operation modes Features heterogeneous FUs, local RFs, direct connections between FUs Reconfigurable every cycle Tightly coupled to control processor IMEC ADRES CGRA Coarse-Grained Reconfigurable Array

Phase 1 (current) OpenIMPACT DRESC IR architecture description LLVM Heuristical back-end Low Level Virtual Machine compiler infrastructure Replace existing simulated-annealing-based backend Quick decision based on heuristics Depends on code/hardware features Parameterizable for DSE C code ADRES executable

Phase 2: Design Space Exploration LLVM DRESC + IR architecture description M achine L earning Design Space Exploration optimized architecture description optimized architecture description optimized compiler strategy C code ADRES executable

Phase 3: Virtualization LLVM LLVM – DRESC C code ADRES executable bytecode optimized architecture description deployment time compiler strategy static

ML Phase 4: Mapping Automation LLVM LLVM – DRESC bytecode deployment time abstract architecture description optimized architecture description compiler strategy C code static ADRES executable

Phase 5: True Split Compilation LLVM LLVM – DRESC bytecode staticdeployment time optimized architecture description compiler strategy ML abstract architecture description annotations C code ADRES executable

Split Compilation for Accelerator-based Multicores Panagiotis Theocharis Computer Systems Lab (CSL) Ghent University