E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Virtual Memory 3 Fred Kuhns
OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Distributed Systems CS
Challenges and Opportunities for System Software in the Multi-Core Era or The Sky is Falling, The Sky is Falling!
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.
1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.
Introductory Courses in High Performance Computing at Illinois David Padua.
CS220 Software Development Lecture: Multi-threading A. O’Riordan, 2009.
Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.
CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming.
CS533 Concepts of Operating Systems Class 2 The Duality of Threads and Events.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University, St. Louis
1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science Washington University, St. Louis
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.
Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Concurrency Patterns Emery Berger and Mark Corner University.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Processor Architecture
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Programmability Hiroshi Nakashima Thomas Sterling.
Martin Kruliš by Martin Kruliš (v1.1)1.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Read-Copy-Update Synchronization in the Linux Kernel 1 David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
Interrupts and Interrupt Handling David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
C++11 Atomic Types and Memory Model
Introduction to threads
Event Handling Patterns Asynchronous Completion Token
CS427 Multicore Architecture and Parallel Computing
CS240: Advanced Programming Concepts
Parallel Programming By J. H. Wang May 2, 2017.
Assembly Language for Intel-Based Computers, 5th Edition
Parallel Algorithm Design
EE 193: Parallel Computing
Operating Systems (CS 340 D)
Chapter 4 Multithreading programming
Some challenges in heterogeneous multi-core systems
Chapter 4: Threads.
The Active Object Pattern
Chapter 4: Threads.
Hardware Multithreading
Half-Sync/Half-Async (HSHA) and Leader/Followers (LF) Patterns
Monitor Object Pattern
CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM
COMP60621 Fundamentals of Parallel and Distributed Systems
Jinquan Dai, Long Li, Bo Huang Intel China Software Center
Operating Systems (CS 340 D)
Testing and Debugging Concurrent Code
CS510 - Portland State University
Hardware Multithreading
- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts:  multiprogramming, multiprocessing, multitasking,
CSC3050 – Computer Architecture
Why Events Are a Bad Idea (for high concurrency servers)
CS703 - Advanced Operating Systems
Lecture 20 Parallel Programming CSE /27/2019.
COMP60611 Fundamentals of Parallel and Distributed Systems
CMSC 202 Threads.
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis C++11 Concurrency Design

Dividing Work Between Threads Static partitioning of data can be helpful –Makes threads (mostly) independent, ahead of time –Threads can read from and write to their own locations Some partitioning of data is necessarily dynamic –E.g., Quicksort uses a pivot at run-time to split up data –May need to launch (or pass data to) a thread at run-time Can also partition work by task-type –E.g., hand off specific kinds of work to specialized threads –E.g., a thread-per-stage pipeline that is efficient once primed Number of threads to use is a key design challenge –E.g., std::thread::hardware_concurrency() is only a starting point (blocking, scheduling, etc. also matter)

Factors Affecting Performance Need at least as many threads as hardware cores –Too few threads makes insufficient use of the resource –Oversubscription increases overhead due to task switching –Need to gauge for how long (and when) threads are active Data contention and cache ping-pong –Performance degrades rapidly as cache misses increas –Need to design for low contention for cache lines –Need to avoid false sharing of elements (in same cache line) Packing or spreading out data may be needed –E.g., localize each thread’s accesses –E.g., separate a shared mutex from the data that it guards

Additional Considerations Exception safety –Affects both lock based and lock-free synchronization –Use std::packaged_task and std::future to allow for an exception being thrown in a thread (see listing 8.3) Scalability –How much of the code is actually parallizable? –Various theoretical formulas (including Amdahl’s) apply Hiding latency –If nothing ever blocks you may not need concurrency –If something does, concurrency makes parallel progress Improving responsiveness –Giving each thread its own task may simplify, speed up tasks