Liquid computing – the rVEX approach

Liquid computing – the rVEX approach
Liquid computing – the rVEX approach ILP-driven dynamic core adaptations Joost J. Hoozemans – Computer Engineering, TU Delft Monday, 19 November 2018

Observation Past Current Future
Embedded workloads becoming increasingly Dynamic Intensity (nr of tasks) Characteristics (amount, type of parallelism) Requirements (criticality)

Dynamic workloads call for dynamic computing platforms
Realization Dynamic workloads call for dynamic computing platforms

Vision – Liquid Architectures
Implementing a system that constantly optimizes its hardware for all its running tasks

Current state of the art: Heterogeneous Multicore processors
Core A Core B

Heterogeneous Multicore (big.LITTLE) - Problem
Core A Core B Core A

Source: ARM – Programmers guide for ARMv8

Core A Core B Source: Anandtech

Instruction-Level Parallelism (ILP)
Heterogeneous Multicore (big.LITTLE) - Problem Some programs cannot make use of additional processor resources (parallel datapaths) Should use a better metric for choosing between big or little Instruction-Level Parallelism (ILP)

Superscalar processors: ILP is implicit Measure ILP: run on largest core/configuration ILP-extraction = power hungry Source: Nvidia Tegra 4 Family CPU architecture whitepaper

Superscalar processors: ILP is implicit Measure ILP: run on largest core/configuration Solution: VLIW-based dynamic processor

Super-scalar VLIW Program Program Compiler Compiler Sequential binary
Explicitly Parallel binary Datapath Scheduler Super-scalar Datapath Datapath Datapath Datapath VLIW

VLIW: explicit parallelism (ILP)
VLIW processors: ILP is explicit Encoded in binary by compiler Bundle boundaries (stopbits)

VLIW processors: ILP is explicit Encoded in binary by compiler VLIW and add nop

VLIW processors: ILP is explicit Encoded in binary by compiler VLIW shl add nop

VLIW processors: ILP is explicit Encoded in binary by compiler VLIW sub add nop

VLIW processors: ILP is explicit Encoded in binary by compiler VLIW stw add nop goto

Heterogeneous Multicore (big.LITTLE) – Problem 2: Migration penalty
Task 2 Underutilization Core A Task 1 Save Task 1 Restore Task 2 Unused ILP Task 2 Save Task 2 Restore Task 1 Task 1 Core B t Migration penalty!

Solution: Liquid Computing
Dynamic processor Assigning datapaths to threads Datapath 1 & 2 Datapath 3 & 4 Datapath 5 & 6 Datapath 7 & 8 t

Solution: Liquid Computing
Dynamic processor Assigning datapaths to threads Task 2 Task 2 Task 2 Task 2 Task 4 Task 3 Task 1 Task 3 t 5 clock cycles

Heterogeneous Multicore (big.LITTLE) – Problem 3: reactive
Response time + migration penalty Source: ARM – Programmers guide for ARMv8

Phases ILP changes too rapidly for heterogeneous core migrations.
But not for our dynamic processor!

Phases - Solution The compiler analyses loops…

Phases - Solution … and writes ILP info into a control register
The compiler analyses loops…

Coverage Up to 72% avg.

Overhead Up to 2.35% avg.

Dynamic 20% faster than heterogeneous
Throughput Dynamic 20% faster than heterogeneous

Demo Liefst wil ik een plaatje met 4 contexts die ILP info in hun control registers schrijven en de runtime die adh daarvan de beste configuratie gaat berekenen

Liquid Computing – Advantages
High single-thread performance High multi-thread throughput Low configuration overhead (no migration penalties) Low interrupt latency

Applications - Image processing pipeline (Rolf, Joost), Doom (Koray, Jeroen), Demos (Muneeb, Joost, Jeroen), Benchmarks SPEC, MiBench, Malardalen, Powerstone (Anthony, Joost) Operating System support - Linux (Mainly Joost, some low-level code written/fixed/updated by Anthony & Jeroen), FreeRTOS (Jeroen, Muneeb) Runtime libraries - Newlib (Joost, Anthony), uCLibc (Tom, Joost), Floating Point & Division, math (Joost) Compilers - HP VEX, GCC (IBM, Anthony, Joost), Cosy (Hugo), LLVM (Maurice, Hugo), Open64 (Joost) Binutils - Assembler, linker, etc. (Anthony), VEXparse (Anthony, Jeroen) Architectural Simulator (Joost) Debug hardware, tools and interface (Jeroen) Hardware design - VHDL (Jeroen) ASIC manufacturing effort - core (Lennart), interface (Shizao) supported by Jeroen

Liquid Computing – Fault-tolerance
Protected Task 2 Task 2 Task 2 Task 2 Task 2 t

Image processing FPGA overlay fabric Streaming architecture 16x4 cores 194 MHz

Liquid computing – the rVEX approach

Similar presentations

Presentation on theme: "Liquid computing – the rVEX approach"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Liquid computing – the rVEX approach

Similar presentations

Presentation on theme: "Liquid computing – the rVEX approach"— Presentation transcript:

Similar presentations

About project

Feedback