E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.

Slides:

Advertisements

Similar presentations

CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.

Advertisements

Virtual Memory 3 Fred Kuhns

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.

Distributed Systems CS

Challenges and Opportunities for System Software in the Multi-Core Era or The Sky is Falling, The Sky is Falling!

E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.

1 SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

Introductory Courses in High Performance Computing at Illinois David Padua.

CS220 Software Development Lecture: Multi-threading A. O’Riordan, 2009.

Based on Silberschatz, Galvin and Gagne  2009 Threads Definition and motivation Multithreading Models Threading Issues Examples.

CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming.

CS533 Concepts of Operating Systems Class 2 The Duality of Threads and Events.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University, St. Louis

1 Testing Concurrent Programs Why Test?  Eliminate bugs?  Software Engineering vs Computer Science perspectives What properties are we testing for? 

E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science Washington University, St. Louis

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.

Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Concurrency Patterns Emery Berger and Mark Corner University.

Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.

Processor Architecture

CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.

Programmability Hiroshi Nakashima Thomas Sterling.

Martin Kruliš by Martin Kruliš (v1.1)1.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,

Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 4: Threads.

E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.

SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.

Read-Copy-Update Synchronization in the Linux Kernel 1 David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Interrupts and Interrupt Handling David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO

C++11 Atomic Types and Memory Model

Introduction to threads

Event Handling Patterns Asynchronous Completion Token

CS427 Multicore Architecture and Parallel Computing

CS240: Advanced Programming Concepts

Parallel Programming By J. H. Wang May 2, 2017.

Assembly Language for Intel-Based Computers, 5th Edition

Parallel Algorithm Design

EE 193: Parallel Computing

Operating Systems (CS 340 D)

Chapter 4 Multithreading programming

Some challenges in heterogeneous multi-core systems

Chapter 4: Threads.

The Active Object Pattern

Chapter 4: Threads.

Hardware Multithreading

Half-Sync/Half-Async (HSHA) and Leader/Followers (LF) Patterns

Monitor Object Pattern

CHAPTER 4:THreads Bashair Al-harthi OPERATING SYSTEM

COMP60621 Fundamentals of Parallel and Distributed Systems

Jinquan Dai, Long Li, Bo Huang Intel China Software Center

Operating Systems (CS 340 D)

Testing and Debugging Concurrent Code

CS510 - Portland State University

Hardware Multithreading

- When you approach operating system concepts there might be several confusing terms that may look similar but in fact refer to different concepts: multiprogramming, multiprocessing, multitasking,

CSC3050 – Computer Architecture

Why Events Are a Bad Idea (for high concurrency servers)

CS703 - Advanced Operating Systems

Lecture 20 Parallel Programming CSE /27/2019.

COMP60611 Fundamentals of Parallel and Distributed Systems

CMSC 202 Threads.

CSC Multiprocessor Programming, Spring, 2011

Presentation transcript:

E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis C++11 Concurrency Design

Dividing Work Between Threads Static partitioning of data can be helpful –Makes threads (mostly) independent, ahead of time –Threads can read from and write to their own locations Some partitioning of data is necessarily dynamic –E.g., Quicksort uses a pivot at run-time to split up data –May need to launch (or pass data to) a thread at run-time Can also partition work by task-type –E.g., hand off specific kinds of work to specialized threads –E.g., a thread-per-stage pipeline that is efficient once primed Number of threads to use is a key design challenge –E.g., std::thread::hardware_concurrency() is only a starting point (blocking, scheduling, etc. also matter)

Factors Affecting Performance Need at least as many threads as hardware cores –Too few threads makes insufficient use of the resource –Oversubscription increases overhead due to task switching –Need to gauge for how long (and when) threads are active Data contention and cache ping-pong –Performance degrades rapidly as cache misses increas –Need to design for low contention for cache lines –Need to avoid false sharing of elements (in same cache line) Packing or spreading out data may be needed –E.g., localize each thread’s accesses –E.g., separate a shared mutex from the data that it guards

Additional Considerations Exception safety –Affects both lock based and lock-free synchronization –Use std::packaged_task and std::future to allow for an exception being thrown in a thread (see listing 8.3) Scalability –How much of the code is actually parallizable? –Various theoretical formulas (including Amdahl’s) apply Hiding latency –If nothing ever blocks you may not need concurrency –If something does, concurrency makes parallel progress Improving responsiveness –Giving each thread its own task may simplify, speed up tasks