QUINN GAUMER ECE 259/CPS 221 Improving Performance Isolation on Chip Multiprocessors via on Operating System Scheduler.

Slides:

Advertisements

Similar presentations

CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.

Advertisements

Paging: Design Issues. Readings r Silbershatz et al: ,

Part IV: Memory Management

Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto

Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design Hikmet Aras

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Matt DeVuyst Rakesh Kumar Dean Tullsen.

Scheduling. Main Points Scheduling policy: what to do next, when there are multiple threads ready to run – Or multiple packets to send, or web requests.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.

Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.

1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.

Project 2 – solution code

1 Lecture 10: Uniprocessor Scheduling. 2 CPU Scheduling n The problem: scheduling the usage of a single processor among all the existing processes in.

SyNAR: Systems Networking and Architecture Group Symbiotic Jobscheduling for a Simultaneous Multithreading Processor Presenter: Alexandra Fedorova Simon.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.

Windows 2000 Scheduling Computing Department, Lancaster University, UK.

A Bandwidth-aware Memory-subsystem Resource Management using Non-invasive Resource Profilers for Large CMP Systems Dimitris Kaseridis, Jeffery Stuecheli,

GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008.

Scheduling policies for real- time embedded systems.

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 13 Threads Read Ch 5.1.

10/26/20151 GC16/3011 Functional Programming Lecture 21 Parallel Graph Reduction.

L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.

Virtual-Time Round-Robin: An O(1) Proportional Share Scheduler By Jason Nieh, etc Xiaojun Wang 10/07/2005.

Managing Distributed, Shared L2 Caches through OS-Level Page Allocation Jason Bosko March 5 th, 2008 Based on “Managing Distributed, Shared L2 Caches through.

Uniprocessor Scheduling

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

Mean Reverting Asset Trading Project Presentation CSCI-5551 Grant Meyers.

Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What Operating Systems Do  Computer-System Organization  Computer-System Architecture  Operating-System Structure.

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Authors: Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen.

Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.

Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.

CS333 Intro to Operating Systems Jonathan Walpole.

Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 3.

1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.

1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.

Solving equations with variable on both sides Part 1.

Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.

Measuring Performance II and Logic Design

Chapter 19: Real-Time Systems

Scheduling of Non-Real-Time Tasks in Linux (SCHED_NORMAL/SCHED_OTHER)

Scheduling in Distributed Systems

Advanced OS Concepts (For OCR)

Presented by Kristen Carlson Accardi

Chapter 4: Multithreaded Programming

Operating System Concepts

Chapter 6: CPU Scheduling

CSCI1600: Embedded and Real Time Software

Mean Reverting Asset Trading

Chapter 2: The Linux System Part 3

Chapter 8: Memory management

Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.

Operating systems Process scheduling.

Chapter 19: Real-Time Systems

Uniprocessor scheduling

CSCI1600: Embedded and Real Time Software

Presentation transcript:

QUINN GAUMER ECE 259/CPS 221 Improving Performance Isolation on Chip Multiprocessors via on Operating System Scheduler

Outline Definitions Motivation Algorithm Evaluation

Definitions  Fair xyz: xyz under fair cache allocation  Slow Schedule: Thread run with high miss rate co-runners  Fast Schedule : Thread run with low miss rate co-runners

Motivation Programs running on multiprocessors depend on co-runners Shared caches aren’t necessarily shared fairly Does it really matter? If one process suffers then another gains…

Cache Fair Algorithm Guarantees program runs as fast as it would if resources split equally Does not actually affect the cache allocation Threads with less cache space will have lower IPC Threads with IPC higher than Fair should run for less time If a thread’s IPC is lower than its Fair IPC it should be kept on the processor longer

Cache Fair Algorithm What does it actually need to do?  Maintain approximate fair IPC  Keep track of current IPC  Scheduler Compensation

Cache Fair Algorithm Two classes of threads:  Cache Fair: threads regulated so that their IPC is equivalent to their Fair IPC  Best Effort: threads where compensatory effects for Cache Fair threads occur.

Fair IPC Model For each Cache Fair Thread determine Fair Cache Miss Rate  Run with several Co-runners  Determine Cache Miss Rates  Use Linear Regression Fair Cache Miss Rate ->Fair IPC Done Online

Implementation Sampling  Run thread with various co-runners  Determine Cache Miss Rate for all threads  Use Linear Regression to determine Fair IPC Scheduling  Checks IPC and Fair IPC  Modifies Cache Fair Thread CPU slice  Adjusts Best Effort Thread to compensate

Evaluation What should our performance metric be?  IPC can’t be used, only the scheduler is being changed Performance variability  Difference between running with high and low cache contention threads Absolute Performance  Difference between normal scheduler and Cache Fair Scheduler

Program Isolation Difference between programs run on fast and slow schedule Cache-Fair is always less than Default Cache-Fair variability always less than 4%

Absolute Performance Normalized to fast schedule in Default Scheduler High IPC programs experience speedup. Low IPC programs experience slow down What causes this? Overall absolute performance is competitive

Aggregate IPC All programs are run on the slow schedule When they do not meet their Fair IPC they get compensated Slow schedule means co- runners utilize more of cache

Side Effects Best effort threads are also effected Side effects can be limited by increasing number of Cache Fair and Best Effort Threads.

Questions?