Rules for Designing Multithreaded Applications CET306 Harry R. Erwin University of Sunderland.

Slides:



Advertisements
Similar presentations
Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Concurrent and Distributed Systems Introduction to CET306 Harry R. Erwin, PhD University of Sunderland.
Lecture 2 The Art of Concurrency 张奇 复旦大学 COMP Data Intensive Computing 1.
API Design CPSC 315 – Programming Studio Fall 2008 Follows Kernighan and Pike, The Practice of Programming and Joshua Bloch’s Library-Centric Software.
Reference: Message Passing Fundamentals.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Introduction to Analysis of Algorithms
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 16: Recursion.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
Concurrent Processes Lecture 5. Introduction Modern operating systems can handle more than one process at a time System scheduler manages processes and.
1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
1 New Architectures Need New Languages A triumph of optimism over experience! Ian Watson 3 rd July 2009.
Building Secure Software Chapter 9 Race Conditions.
Java 5 Threading CSE301 University of Sunderland Harry Erwin, PhD.
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
Analysis of Algorithms COMP171 Fall Analysis of Algorithms / Slide 2 Introduction * What is Algorithm? n a clearly specified set of simple instructions.
SE320: Introduction to Computer Games Week 8: Game Programming Gazihan Alankus.
1 Advanced Computer Programming Concurrency Multithreaded Programs Copyright © Texas Education Agency, 2013.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Finding Concurrency CET306 Harry R. Erwin University of Sunderland.
C++ Programming: From Problem Analysis to Program Design, Third Edition Chapter 17: Recursion.
Why Be Concurrent? CET306 Harry R. Erwin University of Sunderland.
Invitation to Computer Science, Java Version, Second Edition.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
Multicore Systems CET306 Harry R. Erwin University of Sunderland.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.
Copyright ©: University of Illinois CS 241 Staff1 Threads Systems Concepts.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
CY2003 Computer Systems Lecture 04 Interprocess Communication.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
CS140 Project 1: Threads Slides by Kiyoshi Shikuma.
Time Complexity. Solving a computational program Describing the general steps of the solution –Algorithm’s course Use abstract data types and pseudo code.
Parallelism without Concurrency Charles E. Leiserson MIT.
Debugging Threaded Applications By Andrew Binstock CMPS Parallel.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
Chapter 15: Recursion. Objectives In this chapter, you will: – Learn about recursive definitions – Explore the base case and the general case of a recursive.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Proving Correctness and Measuring Performance CET306 Harry R. Erwin University of Sunderland.
Chapter 15: Recursion. Recursive Definitions Recursion: solving a problem by reducing it to smaller versions of itself – Provides a powerful way to solve.
Chapter 15: Recursion. Objectives In this chapter, you will: – Learn about recursive definitions – Explore the base case and the general case of a recursive.
Tuning Threaded Code with Intel® Parallel Amplifier.
Pitfalls: Time Dependent Behaviors CS433 Spring 2001 Laxmikant Kale.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Threads Some of these slides were originally made by Dr. Roger deBry. They include text, figures, and information from this class’s textbook, Operating.
Computer Engg, IIT(BHU)
Algorithm Analysis CSE 2011 Winter September 2018.
EE 193: Parallel Computing
Algorithm Analysis (not included in any exams!)
Threads Chapter 4.
Background and Motivation
Concurrency: Mutual Exclusion and Process Synchronization
Multithreading Why & How.
Lecture 2 The Art of Concurrency
Parallelism Can we make it faster? 8-May-19.
Lecture 20 Parallel Programming CSE /27/2019.
Presentation transcript:

Rules for Designing Multithreaded Applications CET306 Harry R. Erwin University of Sunderland

Texts Clay Breshears (2009) The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications, O'Reilly Media. Mordechai Ben-Ari (2006) Principles of Concurrent and Distributed Programming, Second edition, Addison-Wesley.

Signup for Individual Feedback Choose a 15 minute slot with your regular tutor. There are several questions: – What is your approach (overview)? (4 marks) – How are you testing it? (4 marks) – How are you solving the first half—threading? (4 marks) – How are you solving the second half—cleaning up the overlapping reservations? (4 marks) – Looking back, what surprises have you encountered? (4 marks) Mark scale: 0—non-engagement; 1—serious problems; 2—average; 3—good; 4—professional.

Eight Rules of Concurrent Design 1.Identify Truly Independent Computations 2.Implement Concurrency at the Highest Level Possible 3.Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores 4.Make Use of Thread-Safe Libraries Wherever Possible 5.Use the Right Threading Model 6.Never Assume a Particular Order of Execution 7.Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data 8.Dare to Change the Algorithm for a Better Chance of Concurrency Concurrent programming remains more art than science!

Identify Truly Independent Computations You cannot execute something concurrently unless the operations in each thread are independent of each other! Review the list in “What’s Not Parallel.” (Next Slide)

What’s Not Parallel Having a baby Algorithms, functions, or procedures with persistent state. Recurrence relations using data from loop t in loop t+1. If it’s loop t+k for k>1, you can ‘unwind’ the loop for some parallelism. Induction variables incremented non-linearly with each loop pass. Reductions computing a value from a vector. Loop-carried dependence—where data generated in some previous loop iteration is used in the current iteration.

Implement Concurrency at the Highest Level Possible Suppose you have serial code and wish to thread it. You can work top-down or bottom-up. In your initial analysis, you’re looking for hot-spots that run in parallel give you the best performance. In bottom-up, you start with the hot-spots and move up. In top-down you consider the whole application and break it down. Placing concurrency at the highest possible level breaks the program up into naturally independent threads of work that are unlikely to share data. This provides structure for your more detailed threading.

Plan Early for Scalability (Taking Advantage of the Added Cores) The number of cores will only increase. Plan for it. This is not Moore’s Law—the speed-up is not background; you have to make it happen. Scalability is the ability of your application to handle useful increases in system resources (cores, memory, bus performance) Data decomposition methods give more scalable solutions. (Hint!) Note the project exploits data decomposition.

Make Use of Thread-Safe Libraries Wherever Possible Don’t reinvent the wheel, especially when it’s complicated. Many libraries already take advantage of multicore processors – Intel Math Kernel Library (MKL) – Intel Integrated Performance Primitives (IPP) Even more important—all library calls used should be thread-safe. Check the library documentation. In your own libraries—routines should be reentrant.

Use the Right Threading Model If threaded libraries are not good enough, so that you need to use your own threads, don’t use explicit threads if an implicit threading model is good enough. – OpenMP (data decomposition, threading loops running over large data sets) – Intel Threading Building Blocks Keep it as simple as possible! If you can’t use third party libraries in the deliverable code, prototype with them first and then convert.

Never Assume a Particular Order of Execution The execution order of threads is non- deterministic. There is no reliable way of predicting the ordering. If you assume an ordering, you will have data races, particularly when the hardware changes. Let the threads run as fast as possible and design them to be unencumbered. Synchronise only when necessary.

Use Thread-Local Storage or Associate Locks to Specific Data Synchronisation costs—don’t do it unless it’s needed for correctness. Use thread-local storage or memory associated with specific threads. Watch out for assumptions on the number of threads—don’t hard-code your design. Avoid frequent shared updates. If you must synchronise, use carefully designed locks, usually one-to-one with data structures or critical clumps of data. Only one lock to a data object. (Document!)

Dare to Change the Algorithm for a Better Chance of Concurrency The bottom line is execution time. Analysis usually uses asymptotic performance (big-O notation, to be covered later). However, the best serial algorithm may not be parallelisable. Then consider a suboptimal serial algorithm that you can parallelise. Know where to find a good book on algorithms – Knuth (the ‘Bible’ of algorithm theory) – Sedgewick (3 rd or 4 th edition, 3 rd edition is more advanced)

Discussion