Download presentation
Presentation is loading. Please wait.
1
Liquid computing – the rVEX approach
Liquid computing – the rVEX approach ILP-driven dynamic core adaptations Joost J. Hoozemans – Computer Engineering, TU Delft Monday, 19 November 2018
2
Observation Past Current Future
Embedded workloads becoming increasingly Dynamic Intensity (nr of tasks) Characteristics (amount, type of parallelism) Requirements (criticality)
3
Dynamic workloads call for dynamic computing platforms
Realization Dynamic workloads call for dynamic computing platforms
4
Vision – Liquid Architectures
Implementing a system that constantly optimizes its hardware for all its running tasks
5
Current state of the art: Heterogeneous Multicore processors
Core A Core B
6
Heterogeneous Multicore (big.LITTLE) - Problem
Core A Core B Core A
7
Heterogeneous Multicore (big.LITTLE) - Problem
Source: ARM – Programmers guide for ARMv8
8
Heterogeneous Multicore (big.LITTLE) - Problem
Core A Core B Source: Anandtech
9
Instruction-Level Parallelism (ILP)
Heterogeneous Multicore (big.LITTLE) - Problem Some programs cannot make use of additional processor resources (parallel datapaths) Should use a better metric for choosing between big or little Instruction-Level Parallelism (ILP)
10
Heterogeneous Multicore (big.LITTLE) - Problem
Superscalar processors: ILP is implicit Measure ILP: run on largest core/configuration ILP-extraction = power hungry Source: Nvidia Tegra 4 Family CPU architecture whitepaper
11
Heterogeneous Multicore (big.LITTLE) - Problem
Superscalar processors: ILP is implicit Measure ILP: run on largest core/configuration Solution: VLIW-based dynamic processor
12
Super-scalar VLIW Program Program Compiler Compiler Sequential binary
Explicitly Parallel binary Datapath Scheduler Super-scalar Datapath Datapath Datapath Datapath VLIW
13
VLIW: explicit parallelism (ILP)
VLIW processors: ILP is explicit Encoded in binary by compiler Bundle boundaries (stopbits)
14
VLIW: explicit parallelism (ILP)
VLIW processors: ILP is explicit Encoded in binary by compiler VLIW and add nop
15
VLIW: explicit parallelism (ILP)
VLIW processors: ILP is explicit Encoded in binary by compiler VLIW shl add nop
16
VLIW: explicit parallelism (ILP)
VLIW processors: ILP is explicit Encoded in binary by compiler VLIW sub add nop
17
VLIW: explicit parallelism (ILP)
VLIW processors: ILP is explicit Encoded in binary by compiler VLIW stw add nop goto
18
Heterogeneous Multicore (big.LITTLE) – Problem 2: Migration penalty
Task 2 Underutilization Core A Task 1 Save Task 1 Restore Task 2 Unused ILP Task 2 Save Task 2 Restore Task 1 Task 1 Core B t Migration penalty!
19
Solution: Liquid Computing
Dynamic processor Assigning datapaths to threads Datapath 1 & 2 Datapath 3 & 4 Datapath 5 & 6 Datapath 7 & 8 t
20
Solution: Liquid Computing
Dynamic processor Assigning datapaths to threads Task 2 Task 2 Task 2 Task 2 Task 4 Task 3 Task 1 Task 3 t 5 clock cycles
22
Heterogeneous Multicore (big.LITTLE) – Problem 3: reactive
Response time + migration penalty Source: ARM – Programmers guide for ARMv8
23
Phases ILP changes too rapidly for heterogeneous core migrations.
But not for our dynamic processor!
24
Phases - Solution The compiler analyses loops…
25
Phases - Solution … and writes ILP info into a control register
The compiler analyses loops…
26
Coverage Up to 72% avg.
27
Overhead Up to 2.35% avg.
28
Dynamic 20% faster than heterogeneous
Throughput Dynamic 20% faster than heterogeneous
29
Demo Liefst wil ik een plaatje met 4 contexts die ILP info in hun control registers schrijven en de runtime die adh daarvan de beste configuratie gaat berekenen
30
Liquid Computing – Advantages
High single-thread performance High multi-thread throughput Low configuration overhead (no migration penalties) Low interrupt latency
31
Applications - Image processing pipeline (Rolf, Joost), Doom (Koray, Jeroen), Demos (Muneeb, Joost, Jeroen), Benchmarks SPEC, MiBench, Malardalen, Powerstone (Anthony, Joost) Operating System support - Linux (Mainly Joost, some low-level code written/fixed/updated by Anthony & Jeroen), FreeRTOS (Jeroen, Muneeb) Runtime libraries - Newlib (Joost, Anthony), uCLibc (Tom, Joost), Floating Point & Division, math (Joost) Compilers - HP VEX, GCC (IBM, Anthony, Joost), Cosy (Hugo), LLVM (Maurice, Hugo), Open64 (Joost) Binutils - Assembler, linker, etc. (Anthony), VEXparse (Anthony, Jeroen) Architectural Simulator (Joost) Debug hardware, tools and interface (Jeroen) Hardware design - VHDL (Jeroen) ASIC manufacturing effort - core (Lennart), interface (Shizao) supported by Jeroen
33
Liquid Computing – Fault-tolerance
Protected Task 2 Task 2 Task 2 Task 2 Task 2 t
35
Image processing FPGA overlay fabric Streaming architecture 16x4 cores 194 MHz
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.