Improved schedulability on the ρVEX polymorphic VLIW processor

Slides:

Advertisements

Similar presentations

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.

Real-Time Systems Scheduling Tool Developed by Daniel Ghiringhelli Advisor: Professor Jiacun Wang December 19, 2005.

11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.

CS 7810 Lecture 20 Initial Observations of the Simultaneous Multithreading Pentium 4 Processor N. Tuck and D.M. Tullsen Proceedings of PACT-12 September.

Instruction Level Parallelism (ILP) Colin Stevens.

How Multi-threading can increase on-chip parallelism

“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.

Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.

Operating Systems for Reconfigurable Systems John Huisman ID:

Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.

Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

Scheduling Issues on a Heterogeneous Single ISA Multicore IRISA, France Robert Guziolowski, André Seznec. Contact: 1. M. Becchi and P.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Migration Cost Aware Task Scheduling Milestone Shraddha Joshi, Brian Osbun 10/24/2013.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.

“Temperature-Aware Task Scheduling for Multicore Processors” Masters Thesis Proposal by Myname 1 This slides presents title of the proposed project State.

Real-Time Operating Systems RTOS For Embedded systems.

Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo

RT-OPEX: Flexible Scheduling for Cloud-RAN Processing

CS161 – Design and Architecture of Computer

vCAT: Dynamic Cache Management using CAT Virtualization

Nios II Processor: Memory Organization and Access

Computer Architecture: Parallel Task Assignment

Memory COMPUTER ARCHITECTURE

Adaptive Cache Partitioning on a Composite Core

Prabhat Kumar Saraswat Paul Pop Jan Madsen

Xiaodong Wang, Shuang Chen, Jeff Setter,

Design-Space Exploration

Networks and Operating Systems: Exercise Session 2

Wayne Wolf Dept. of EE Princeton University

Simultaneous Multithreading

Unit OS9: Real-Time and Embedded Systems

“Temperature-Aware Task Scheduling for Multicore Processors”

FPGA: Real needs and limits

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

/ Computer Architecture and Design

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Many-core Software Development Platforms

Chapter 6: CPU Scheduling

Anne Pratoomtong ECE734, Spring2002

Levels of Parallelism within a Single Processor

Computer Architecture Lecture 4 17th May, 2006

Liquid computing – the rVEX approach

Peter Poplavko, Saddek Bensalem, Marius Bozga

Chapter 4: Threads.

Yingmin Li Ting Yan Qi Zhao

CPU SCHEDULING.

Liquid Architectures: Linux on rVEX

CPU scheduling decisions may take place when a process:

Chapter 6: CPU Scheduling

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Hossein Omidian, Guy Lemieux

/ Computer Architecture and Design

Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)

CSCI1600: Embedded and Real Time Software

Many-Core Graph Workload Analysis

Operating System Introduction.

Chapter 10 Multiprocessor and Real-Time Scheduling

Levels of Parallelism within a Single Processor

Hardik Shah, Kai Huang and Alois Knoll

Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,

Real-Time Process Scheduling Concepts, Design and Implementations

FAST: Frequency-Aware Static Timing Analysis

Real-Time Process Scheduling Concepts, Design and Implementations

Presentation transcript:

Improved schedulability on the ρVEX polymorphic VLIW processor Joost J. Hoozemans Computer Engineering Laboratory Delft University of Technology, The Netherlands Friday, 16 November 2018 RTCSA 2017, Hsingchu, Taiwan

Outline Intro: ρVEX polymorphic (adaptable) VLIW processor Static realtime scheduling Challenge: When to perform adaptations Approach Results Conclusions RTCSA 2017, Hsingchu, Taiwan

Observation Past Future Embedded workloads becoming increasingly dynamic Intensity (nr. of tasks) Characteristics (amount, type of parallelism) Requirements (criticality) RTCSA 2017, Hsingchu, Taiwan

Dynamic workloads call for dynamic computing platforms Realization Dynamic workloads call for dynamic computing platforms RTCSA 2017, Hsingchu, Taiwan

Dynamic workloads call for dynamic computing platforms Realization & Vision Dynamic workloads call for dynamic computing platforms Implementing a system that constantly adapts to all its running tasks RTCSA 2017, Hsingchu, Taiwan

Homogeneous multicores Old days Homogeneous multicores RTCSA 2017, Hsingchu, Taiwan

Current state-of-the-art Heterogeneous multicores RTCSA 2017, Hsingchu, Taiwan

Current state-of-the-art Heterogeneous multicores RTCSA 2017, Hsingchu, Taiwan

Heterogeneous Multicore processors Task 2 Underutilization Core A Task 1 Save Task 1 Restore Task 2 TODO: draw or explain that vertical ‘axis’ is ILP/core utilization Unused ILP Task 2 Save Task 2 Restore Task 1 Task 1 Core B t Migration penalty! RTCSA 2017, Hsingchu, Taiwan

Solution: ρVEX adaptible VLIW processor Dynamic processor Assigning datapaths to threads Datapath 1 & 2 TODO: not clear what the datapaths are here Datapath 3 & 4 Datapath 5 & 6 Datapath 7 & 8 t RTCSA 2017, Hsingchu, Taiwan

Solution: ρVEX adaptible VLIW processor Dynamic processor Assigning datapaths to threads Task 2 Task 2 Task 2 Task 2 Task 4 Task 3 Task 1 Task 3 t 5 clock cycles adaptation penalty RTCSA 2017, Hsingchu, Taiwan

Datapath Control Datapath Datapath Control Datapath Datapath Control Task 1 State storage Cache block 1 Control Datapath Datapath Task 2 State storage Cache block 2 Control Datapath Similar to SMT, but with performance isolation Datapath Task 3 State storage Cache block 3 Control Datapath Datapath Task 4 State storage Cache block 4 Control Datapath RTCSA 2017, Hsingchu, Taiwan

Datapath Control Datapath Datapath Control Datapath Datapath Control Task 1 State storage Cache block 1 Control Datapath Datapath Task 2 State storage Cache block 2 Control Datapath Similar to SMT, but with performance isolation Datapath Task 3 State storage Cache block 3 Control Datapath Datapath Task 4 State storage Cache block 4 Control Datapath RTCSA 2017, Hsingchu, Taiwan

Datapath Control Datapath Datapath Control Datapath Datapath Control Task 1 State storage Cache block 1 Control Datapath Datapath Task 2 State storage Cache block 2 Control Datapath Similar to SMT, but with performance isolation Datapath Task 3 State storage Cache block 3 Control Datapath Datapath Task 4 State storage Cache block 4 Control Datapath RTCSA 2017, Hsingchu, Taiwan

Datapath Control Datapath Datapath Control Datapath Datapath Control Task 1 State storage Cache block 1 Control Datapath Datapath Task 2 State storage Cache block 2 Control Datapath Similar to SMT, but with performance isolation Datapath Task 3 State storage Cache block 3 Control Datapath Datapath Task 4 State storage Cache block 4 Control Datapath RTCSA 2017, Hsingchu, Taiwan

Datapath Control Datapath Datapath Control Datapath Datapath Control Task 1 State storage Cache block 1 Control Datapath Datapath Task 2 State storage Cache block 2 Control Datapath Similar to SMT, but with performance isolation Datapath Task 3 State storage Cache block 3 Control Datapath Datapath Task 4 State storage Cache block 4 Control Datapath RTCSA 2017, Hsingchu, Taiwan

ρVEX VLIW processor – Advantages High single-thread performance High multi-thread throughput Low configuration overhead (no migration penalties) Low interrupt latency RTCSA 2017, Hsingchu, Taiwan

ρVEX VLIW processor – Advantages High single-thread performance High multi-thread throughput Low configuration overhead (no migration penalties) Low interrupt latency Static Real-time Schedulability RTCSA 2017, Hsingchu, Taiwan

Static real-time schedulability Advantage 1: Each task has n configuration options Each with a distinct WCET This provides more flexibility in creating a valid schedule. If one configuration does not meet the deadline, you can run it in another configuration. RTCSA 2017, Hsingchu, Taiwan

Static real-time schedulability Advantage 1: Each task has n configuration options Each with a distinct WCET In this case, only 2 of the 3 options can create a valid schedule. Other task with lower utilizations can run in 2-issue RTCSA 2017, Hsingchu, Taiwan

Static real-time schedulability Advantage 2: Adaptations can be performed at any time Each task can run in their preferred configuration As we will see shortly, this situation where periods are aligned is not realistic and we will present a solution t RTCSA 2017, Hsingchu, Taiwan

Static real-time schedulability Advantage 3: Compute resources can always be fully utilized (as opposed to heterogeneous systems) t RTCSA 2017, Hsingchu, Taiwan

Static real-time schedulability Advantage 3: Compute resources can always be fully utilized (as opposed to heterogeneous systems) Idle! Migration penalty! t RTCSA 2017, Hsingchu, Taiwan

Challenge: When to perform adaptations RTCSA 2017, Hsingchu, Taiwan

When to perform adaptations Problem 1: execution phases The problem is; we don’t know these phases. You can measure it but then it is too late. (and it would probably be too complex to schedule if we did) RTCSA 2017, Hsingchu, Taiwan

When to perform adaptations Problem 1: execution phases The problem is; we don’t know these phases. You can measure it but then it is too late. (and it would probably be too complex to schedule if we did) 8-issue 4-issue RTCSA 2017, Hsingchu, Taiwan

When to perform adaptations Problem 1: execution phases The problem is; we don’t know these phases. You can measure it but then it is too late. (and it would probably be too complex to schedule if we did) 8-issue 4-issue 4-issue 8-issue RTCSA 2017, Hsingchu, Taiwan

When to perform adaptations Problem 1: execution phases A task must always be executed in the same configuration (otherwise, the WCET may not be valid) RTCSA 2017, Hsingchu, Taiwan

When to perform adaptations Problem 1: execution phases A task must always be executed in the same configuration (otherwise, the WCET may not be valid) RTCSA 2017, Hsingchu, Taiwan

When to perform adaptations Problem 2: different periods If a task must always run in the same config, it quickly becomes impossible to create a static schedule that guarantees this and also meets all the deadlines. Even if this is possible, finding such a schedule for the entire hyperperiod results in a very large search space RTCSA 2017, Hsingchu, Taiwan

When to perform adaptations ? When to perform adaptations Problem 2: different periods RTCSA 2017, Hsingchu, Taiwan

Approach Solution: divide execution into ‘rounds’ of fixed length RTCSA 2017, Hsingchu, Taiwan

Approach 1: Measure WCETs for each configuration RTCSA 2017, Hsingchu, Taiwan

Approach 1: Measure WCETs for each configuration RTCSA 2017, Hsingchu, Taiwan

Approach Issue width: 2 4 8 125% 100% 80% 30% 20% 15% 30% 25% 25% 110% 2: calculate utilizations (for each configuration) Issue width: 2 4 8 125% 100% 80% 30% 20% 15% 30% 25% 25% 110% 70% 60% RTCSA 2017, Hsingchu, Taiwan

Approach 3: Choose a round length, calculate resource utilization (issue slots, time) RTCSA 2017, Hsingchu, Taiwan

Approach 4: Fit shapes into the area representing available resources Time, issue slots (datapaths) RTCSA 2017, Hsingchu, Taiwan

Approach

Approach

Results - Schedulability Comparing platforms Equal computational resources (8 datapaths) 8-issue 4-issue 2-issue 4-issue 2-issue 4-issue 2-issue 2-issue 2-issue 2-issue Homogeneous Multicore (2x 4-issue) Homogeneous Multicore (4x 2-issue) Heterogeneous Multicore (1x 4-issue, 2x 2-issue) Single core (1x 8-issue) RTCSA 2017, Hsingchu, Taiwan

Results - Schedulability nTasks = 4 Total Utilization RTCSA 2017, Hsingchu, Taiwan

Results - Schedulability nTasks = 4 15% Total Utilization RTCSA 2017, Hsingchu, Taiwan

Results - Schedulability Advantage increases with higher utilization RTCSA 2017, Hsingchu, Taiwan

Results - Schedulability Advantage increases with higher utilization RTCSA 2017, Hsingchu, Taiwan

Results - Schedulability Advantage increases with higher utilization RTCSA 2017, Hsingchu, Taiwan

Mixed-criticality workloads Full resource utilization (better performance for non-critical tasks) TODO: split and build up RTCSA 2017, Hsingchu, Taiwan

Conclusions The adaptable processor: Provides improved schedulability compared to heterogeneous and homogeneous multicore systems with equal computational resources Improvements of at least 15% for task graphs with high utilization Prevents migration penalties Allows full resource utilization RTCSA 2017, Hsingchu, Taiwan

http://rvex.ewi.tudelft.nl RTCSA 2017, Hsingchu, Taiwan

Platform http://rvex.ewi.tudelft.nl Open-source (academic) Highly generic VHDL design Debugging hardware, interface, tools Fast architectural simulator Binutils, Compilers, Libraries OS support (uCLinux, FreeRTOS) MiBench, PolyBench, SPECint 2006 Zedboard, ML605, VC707 http://rvex.ewi.tudelft.nl RTCSA 2017, Hsingchu, Taiwan

Platform http://rvex.ewi.tudelft.nl Open-source (academic) Labs, Tutorials available Embedded Computer Architecture (Msc. Course) (VLIW) Computer Architecture Research Msc. Thesis topics VHDL design Tools, compilers, Operating Systems, Libraries, benchmarks, … http://rvex.ewi.tudelft.nl RTCSA 2017, Hsingchu, Taiwan