Real-Time Performance and Middleware for Multiprocessor and Multicore Linux Platforms* Yuanfang Zhang, Christopher Gill, and Chenyang Lu Department of.

Slides:

Advertisements

Similar presentations

Threads, SMP, and Microkernels

Advertisements

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

CS4315A. Berrached:CMS:UHD1 CPU Scheduling Chapter 7.

An introduction to: The uRT51 Microprocessor and Real-Time Programming Suite.

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

Soft Real-Time Semi-Partitioned Scheduling with Restricted Migrations on Uniform Heterogeneous Multiprocessors Kecheng Yang James H. Anderson Dept. of.

Chapter 13 Embedded Systems

Computer Science Deadline Fair Scheduling: Bridging the Theory and Practice of Proportionate-Fair Scheduling in Multiprocessor Servers Abhishek Chandra.

Task scheduling What are the goals of a modern operating system scheduler, and how does Linux achieve them?

Using JetBench to Evaluate the Efficiency of Multiprocessor Support for Parallel Processing HaiTao Mei and Andy Wellings Department of Computer Science.

23 September 2004 Evaluating Adaptive Middleware Load Balancing Strategies for Middleware Systems Department of Electrical Engineering & Computer Science.

The Design and Performance of A Real-Time CORBA Scheduling Service Christopher Gill, David Levine, Douglas Schmidt.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

CPU Scheduling - Multicore. Reading Silberschatz et al: Chapter 5.5.

Computer System Architectures Computer System Software

Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.

Introduction to Real-Time Systems

Scheduling for Reliable Execution in Autonomic Systems* Terry Tidwell, Robert Glaubius, Christopher Gill, and William D. Smart {ttidwell, rlg1, cdgill,

Fast Multi-Threading on Shared Memory Multi-Processors Joseph Cordina B.Sc. Computer Science and Physics Year IV.

Practical Schedulability Analysis for Generalized Sporadic Tasks in Distributed Real-Time Systems Yuanfang Zhang 1, Donald K. Krecker 2, Christopher Gill.

Reconfigurable Real-Time Middleware for Distributed Cyber-Physical Systems with Aperiodic Events Yuanfang Zhang, Christopher Gill, Chenyang Lu Department.

1 Reducing Queue Lock Pessimism in Multiprocessor Schedulability Analysis Yang Chang, Robert Davis and Andy Wellings Real-time Systems Research Group University.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Real-Time CORBA By Christopher Bolduc. What is Real-Time? Real-time computing is the study of hardware and software systems that are subject to a “real-

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.

Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.

1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.

Full and Para Virtualization

Martin Kruliš by Martin Kruliš (v1.1)1.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

FLARe: a Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems Dr. Aniruddha S. Gokhale

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Interrupts and Interrupt Handling David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO

Chapter 4: Threads 羅習五. Chapter 4: Threads Motivation and Overview Multithreading Models Threading Issues Examples – Pthreads – Windows XP Threads – Linux.

Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

Chapter 4: Threads.

Processes and threads.

Scheduling of Non-Real-Time Tasks in Linux (SCHED_NORMAL/SCHED_OTHER)

Linux Scheduler.

Midterm Review Chris Gill CSE 422S - Operating Systems Organization

Uniprocessor Scheduling

Unit OS9: Real-Time and Embedded Systems

Task Scheduling for Multicore CPUs and NUMA Systems

Chapter 4: Threads 羅習五.

Overview of the Lab 2 Assignment: Linux Scheduler Profiling

Real-time Software Design

Transparent Adaptive Resource Management for Middleware Systems

Improved schedulability on the ρVEX polymorphic VLIW processor

Chapter 4: Threads.

Department of Computer Science University of California, Santa Barbara

Chapter 4: Threads.

Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.

CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM

Chapter 6: CPU Scheduling

Multithreaded Programming

Midterm Review Brian Kocoloski

Chapter 4: Threads & Concurrency

Chapter 4: Threads.

Scheduling of Regular Tasks in Linux

Department of Computer Science University of California, Santa Barbara

Scheduling of Regular Tasks in Linux

Presentation transcript:

Real-Time Performance and Middleware for Multiprocessor and Multicore Linux Platforms* Yuanfang Zhang, Christopher Gill, and Chenyang Lu Department of Computer Science and Engineering Washington University, St. Louis, MO, USA {yfzhang, cdgill, 15 th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2009) August , 2009, Beijing, China *This research was supported in part by NSF grants CCF (EHS), CCF (CAREER), and CNS (CAREER)

2 - Zhang et al. – 10/8/2015 Motivation and Contributions Trend towards multi-processor and multi-core platforms affects both OS and middleware »Techniques designed for uni-processors need revisiting This research makes 3 main contributions to real- time systems on multi-processor platforms » A performance evaluation of relevant Linux features »MC-ORB middleware designed for MC/MP platforms »Evaluation of MC-ORB’s multi-core aware RT performance

3 - Zhang et al. – 10/8/2015 Background and Related Work Linux 2.6 introduced SMP and multi-core support »Linux added the Completely Fair Scheduler (CFS) »However, many deployed platforms predate »We studied Linux as a representative compromise Related research: modifying Linux, RT middleware »We assume unmodified COTS Linux as our middleware design point, for highly portable real-time performance »The differing trade-offs for uni-processor vs. multiprocessor platforms motivate new middleware designs

4 - Zhang et al. – 10/8/2015 Linux Performance: Clock Differences I We first evaluated clock differences between cores »How well do platform/Linux maintain synchronization? »We used RDTSC instruction to record clock ticks on each core We bounced a message back and forth between two cores »Used arrival TSCs (x, y, z) to measure round trip delay (RTD) »The results show that the cores’ frequencies were well matched

5 - Zhang et al. – 10/8/2015 Linux Performance: Clock Differences II We then estimated the cores’ temporal offsets as δ 0 = 2y 1 –x 0 – z 0 ; δ 1 = 2y 0 –x 1 –z 1 »Figures on the right show calculated results Upper: as measured at each core Lower: reverse signs for core 0 (shows consistent views of offset) Insight 1 »Though frequencies matched well, avg. offset was ~1.3μs »Motivates measuring offsets in our subsequent analyses

6 - Zhang et al. – 10/8/2015 Linux Performance: Load Balancing Can thread affinity thwart (bad) Linux rebalancing? »We ran sets of 10 vs. 30 tasks (all bound to one core to prevent rebalancing), with total utilizations of 0.6 vs. 1.0 Insight 2 »Though overhead is small and amortized, compiling kernels with rebalancing off appears to be a preferable method TasksUtilizationImbalance s detected in 5 min Overhead per imbalance (ns) Overhead (total μs) MinimumMeanMaximum

7 - Zhang et al. – 10/8/2015 Linux Performance: Migration Strategies Two key migration strategies »Thread migrates itself »Separate manager thread migrates it Thread state  mechanisms/cost »Affinity mask is always updated »For running thread, changes run queues, may invoke scheduler Case 1: a running thread modifies its own affinity Case 3: a separate manager thread modifies a sleeping thread’s affinity Case 2: a separate manager thread modifies a running thread’s affinity

8 - Zhang et al. – 10/8/2015 Linux Performance: Migration Costs Insight 3 »Every strategy risks a non- negligible thread migration cost »Motivates binding task threads into core-specific thread pools »Motivates an ORB architecture with a separate manager thread (next) self migration (~ 16 to 45 μs) manager migrates sleeping thread (~ 4 to 10 μs) manager migrates running thread (~ 18 to 36 μs)

9 - Zhang et al. – 10/8/2015 Conventional Middleware Architecture Traditional single-CPU approach benefits from leader/followers etc. to reduce costly hand-offs »E.g., TAO, nORB However, multiple cores increase risk of migration 1. Leader invokes TA (and AC) for task 2. Picks new leader 3. New leader may need to move old 4. Old leader runs the task (on the appropriate core)

10 - Zhang et al. – 10/8/2015 MC-ORB Middleware Architecture In contrast, MC-ORB’s threading architecture leverages hand-offs to avoid thread migrations »Key trade off: copying/locking costs vs. migration costs 1. Request is queued 2. Manager thread reads requests in priority order 3. Invokes TA w/AC 4. Manager picks thread from pool 5. Thread runs task

11 - Zhang et al. – 10/8/2015 Real-Time ORB Performance Evaluation To gauge performance costs of our middleware architecture we examined four key issues »Allocate on same vs. other core (as manager thread) »Thread available vs. migration needed »Reallocation is vs. is not required to allocate task »New task is admitted vs. rejected We evaluated our middleware architecture both with (MC-ORB) and without (MC-ORB*) rejection »MC-ORB* compared to nORB (designed for uniprocessors) »Varied utilization granularity & magnitude (10 task sets) »We measured how many of the task sets missed a deadline

12 - Zhang et al. – 10/8/2015 Overheads for MC-ORB’s Extensions (μs) Scenarios used for Overhead Evaluation 1. New task on same core as manager 2. New task on different core (similar cost to 1) 3. (Sleeping) thread moved from other core to run new task 4. (All) running tasks reallocated to make room for new task 5. The new task is rejected (low cost, but it’s pure overhead) ScenarioMinimumMeanMaximum

13 - Zhang et al. – 10/8/2015 Fraction of Workloads w/ Deadline Misses With rejection, >94% of tasks were admitted by MC-ORB and all admitted tasks met all deadlines Without rejection (where + shows need for AC) MC-ORB* »Outperformed nORB in 6 cases (green) »Performed the same as nORB in 4 cases (grey) »Underperformed nORB in 2 cases (red) »Less balanced workloads emphasize MC-ORB* improvement over nORB Total UtilizationORB Balance Factor nORB MC-ORB* nORB MC-ORB* nORB MC-ORB*

14 - Zhang et al. – 10/8/2015 Concluding Remarks COTS OS evaluations »Measurement on specific target platforms is crucial »Behaviors of hardware and OS mechanisms are important Middleware architectures »OS evaluations establish design trade-off parameters »Prior design decisions may be reversed on new platforms Performance evaluations bear out our new design »Even w/out admission control, MC-ORB architecture helps »With AC admitted high utilization, and met all deadlines MC-ORB open-source download & build instructions »