CSC 480 - Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.

Slides:



Advertisements
Similar presentations
CSC Multiprocessor Programming, Spring, 2011 Outline for Chapter 5 – Building Blocks – Library Classes, Dr. Dale E. Parson, week 5.
Advertisements

Designing a thread-safe class  Store all states in public static fields  Verifying thread safety is hard  Modifications to the program hard  Design.
Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.
Chapter 4: Threads. Overview Multithreading Models Threading Issues Pthreads Windows XP Threads.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts Essentials – 2 nd Edition Chapter 4: Threads.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
3.5 Interprocess Communication Many operating systems provide mechanisms for interprocess communication (IPC) –Processes must communicate with one another.
Threads 1 CS502 Spring 2006 Threads CS-502 Spring 2006.
3.5 Interprocess Communication
CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
Synchronization and Scheduling in Multiprocessor Operating Systems
Computer System Architectures Computer System Software
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Process Management. Processes Process Concept Process Scheduling Operations on Processes Interprocess Communication Examples of IPC Systems Communication.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Implementing Processes and Process Management Brian Bershad.
OPERATING SYSTEMS - I. What is an Operating System OS is a program that manages the computer hardware It provides a basis for application programs and.
“By the end of this chapter, you should have obtained a basic understanding of how modern processors execute parallel programs & understand some rules.
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 4: Multithreaded Programming.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Martin Kruliš by Martin Kruliš (v1.1)1.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
CSC Multiprocessor Programming, Spring, 2012 Chapter 10 – Avoiding Liveness Hazards Dr. Dale E. Parson, week 11.
CSC Multiprocessor Programming, Spring, 2012 Chapter 12 – Testing Concurrent Programs Dr. Dale E. Parson, week 12.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Chapter 4: Threads.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
CS5102 High Performance Computer Systems Thread-Level Parallelism
Chapter 4: Threads.
Sujata Ray Dey Maheshtala College Computer Science Department
Processes and Threads Processes and their scheduling
Threads.
Chapter 4: Threads.
OPERATING SYSTEMS CS3502 Fall 2017
Parallel Programming By J. H. Wang May 2, 2017.
Chapter 4: Multithreaded Programming
Process Management Presented By Aditya Gupta Assistant Professor
Section 10: Last section! Final review.
CSC Multiprocessor Programming, Spring, 2011
Operating Systems (CS 340 D)
Chapter 4: Threads.
Department of Computer Science University of California, Santa Barbara
Chapter 4: Threads.
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
Mid Term review CSC345.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Sujata Ray Dey Maheshtala College Computer Science Department
Chapter 4: Threads.
Chapter 4: Threads & Concurrency
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Chapter 4: Threads.
Process Management -Compiled for CSIT
Department of Computer Science University of California, Santa Barbara
Chapter 4 Threads!!!.
Chapter 4: Threads & Concurrency
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12

Multithreading Overhead Multithreading always introduces overhead. Costly synchronization code even without contention. Synchronization delays when contention occurs. Hardware synchronization (flushes at barriers). Increased context switching. Thread scheduling. Thread creation and destruction. How do we keep the hardware busy? What are the bottlenecks?

Scalability Ability to improve throughput or capacity when adding computing resources -- CPUs, memory, long-term storage, or I/O bandwidth. The bottleneck moves somewhere else. Throughput measures total data processed. Latency measures delays in event responses. Sometimes they trade one against the other. Use a better algorithm.

Avoid premature tradeoffs Make it correct, the fast. An initial correct solution, even if slower, helps you think through the problem, gives you output reference data for when you perform code improvements, and gives you some reusable code. It is like learning how to drive! Know your code libraries.

Amdahl’s Law Speedup <= 1.0 /(F + ((1.0 - F) / N)) F is fraction of computation that is serial. N is number of processors. Sources of serialization – Intrinsic to algorithm (e.g., partitioning in quicksort). Synchronization (e.g., CyclicBarrier in penny – dime). Access to shared data structures such as work queues. Task submission, result processing, join. Hardware serialization (e.g., memory, I/O bandwidth).

Context switching Operating system calls and complex JVM code. Cache and page faults to load and flush a thread’s working set (of data) on a switch. I/O bound threads block more often that CPU bound, resulting in more context switches. High kernel usage (> 10%) indicates heavy scheduling activity, may be caused by frequent blocking due to I/O or lock contention.

Memory synchronization Memory barriers Flush and invalidate registers, data caches, write buffers, stall execution pipelines. Uncontended synchronization overhead is low. JVM run-time compiler can use escape analysis to identify when a “locked” object does not escape its thread, removing the lock. Compiler lock coarsening can merge adjacent requests for the same lock.

Memory sync and blocking Lock elision on locks to thread-confined objects is in IBM JVM and slated for HotSpot 7. Synchronization creates traffic on memory buses. Clear, explicit Java Memory Model creates opportunities for compiler optimizations. Blocking overhead – Spin waiting. O.S. system calls and context switching overhead.

Reducing lock contention Reduce duration during which locks are held. Reduce the frequency of lock requests. Replace exclusive locks with coordination mechanisms that permit greater concurrency. Prefer a dataflow architecture when possible. Lock splitting for independent sets of shared variables guarded by a single lock. Lock striping reduces overhead of excess locks.

Hot fields, exclusive locks A hot field is a single field such as a counter that is accessed by many threads. Parallelize data structures when possible. Merge them in a subset of threads as needed. Cache merged results if invalidating cache is fast. Use alternatives to exclusive locks. ReadWriteLock Atomic variables and volatiles Queues and library classes with reduced or split locks.

Monitoring CPU utilization top, mpstat, vmstat, iostat, sar, cpustat Process and resource profilers and low-level activity register data analyzers. See Summer of 2010 report, final page. Also commercial code profilers for multithreading. Causes of CPU underutilization – Insufficient load I/O bound processing External bottlenecks Lock contention

Miscellaneous Thread pools? Yes Object pooling? No  Dynamic allocation and garbage collection is cheaper and easier than application-driven object pools and application-level synchronization of object pools. Reducing context switching overhead. Make some portion of the application architecture CPU intensive, partition it so it can be multithreaded. Let some partitioned set of threads handle the buffered I/O. Only they are candidates for I/O blocking. May introduce possibility of a serial bottleneck in the I/O threads. Use appropriate queuing.