Lecture 20 Parallel Programming CSE 1322 8/8/2019.

Slides:



Advertisements
Similar presentations
Paweł Łukasik Wroc.NET mail:
Advertisements

Parallelism Lecture notes from MKP and S. Yalamanchili.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
IAP C# Lecture 3 Parallel Computing Geza Kovacs. Sequential Execution So far, all our code has been executing instructions one after another, on a single.
CSE 1302 Lecture 21 Exception Handling and Parallel Programming Richard Gesick.
Virtual techdays INDIA │ 9-11 February 2011 Parallelism in.NET 4.0 Parag Paithankar │ Technology Advisor - Web, Microsoft India.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Lecture 39: Review Session #1 Reminders –Final exam, Thursday 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through.
PARALLEL PROGRAMMING ABSTRACTIONS 6/16/2010 Parallel Programming Abstractions 1.
Parallel Programming in.NET Kevin Luty.  History of Parallelism  Benefits of Parallel Programming and Designs  What to Consider  Defining Types of.
1 Advanced Computer Programming Concurrency Multithreaded Programs Copyright © Texas Education Agency, 2013.
Computer System Architectures Computer System Software
Multi Core Processor Submitted by: Lizolen Pradhan
Multi-core architectures. Single-core computer Single-core CPU chip.
Multi-Core Architectures
Multi-core Programming Introduction Topics. Topics General Ideas Moore’s Law Amdahl's Law Processes and Threads Concurrency vs. Parallelism.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Lecture 21 Parallel Programming Richard Gesick. Parallel Computing Parallel computing is a form of computation in which many operations are carried out.
Outline  Over view  Design  Performance  Advantages and disadvantages  Examples  Conclusion  Bibliography.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Dean Tullsen UCSD.  The parallelism crisis has the feel of a relatively new problem ◦ Results from a huge technology shift ◦ Has suddenly become pervasive.
CY2003 Computer Systems Lecture 04 Interprocess Communication.
CS162 Week 5 Kyle Dewey. Overview Announcements Reactive Imperative Programming Parallelism Software transactional memory.
Shashwat Shriparv InfinitySoft.
Multi-core processors. 2 Processor development till 2004 Out-of-order Instruction scheduling Out-of-order Instruction scheduling.
Module 8 Enhancing User Interface Responsiveness.
Data Parallelism Task Parallel Library (TPL) The use of lambdas Map-Reduce Pattern FEN 20141UCN Teknologi/act2learn.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
GCSE Computing - The CPU
Conclusions on CS3014 David Gregg Department of Computer Science
COMPUTATIONAL MODELS.
Distributed Shared Memory
Async or Parallel? No they aren’t the same thing!
Computer Engg, IIT(BHU)
Lecture 21 Concurrency Introduction
The University of Adelaide, School of Computer Science
Architecture & Organization 1
Morgan Kaufmann Publishers
Teaching Computing to GCSE
Chapter 4: Threads.
EE 193: Parallel Computing
Functional Programming with Java
Multi-Processing in High Performance Computer Architecture:
Teaching Computing to GCSE
Architecture & Organization 1
Chapter 4: Threads.
Parallel Processing Sharing the load.
CSE8380 Parallel and Distributed Processing Presentation
Operating System 4 THREADS, SMP AND MICROKERNELS
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.
Multithreaded Programming
Professor Ioana Banicescu CSE 8843
Prof. Leonardo Mostarda University of Camerino
Chapter 4: Threads & Concurrency
Lecture 2 The Art of Concurrency
GCSE Computing - The CPU
Lecture 20 Parallel Programming CSE /27/2019.
Presentation transcript:

Lecture 20 Parallel Programming CSE 1322 8/8/2019

Moore’s Law Moore’s Law predicted the doubling of transistor capacity per square inch of integrated circuit every two years. Gordon Moore made this proclamation in the mid-1960s and predicted that the trend would continue at least 10 years, but Moore’s Law has actually held true for nearly 50 years. Moore’s prediction is often interpreted to mean that processor speed would double every couple of years. Software developers benefitted from the continual performance gains of new hardware in single-core computers. If your application was slow, just wait—it would soon run faster because of advances in hardware performance. Your application simply rode the wave of better performance. However, you can no longer depend on consistent hardware advancements to assure better-performing applications! Problems have arisen: can’t get the circuits much closer together, packing circuits as close as they are has increased heat generation and thus increased cooling requirements. 8/8/2019

Parallel Computing Parallel computing is a form of computation in which many operations are carried out simultaneously. Many personal computers and workstations have two or four cores which enable them to execute multiple threads simultaneously. Computers in the near future are expected to have significantly more cores. To take advantage of the hardware of today and tomorrow, software developers can parallelize their code to distribute work across multiple processors. In the past, parallelization required low-level manipulation of threads and locks. 8/8/2019

Multicore Processing Multicore architecture is an alternative, where multiple processor cores share a chip die. The additional cores provide more computing power without the heat problem. In a parallel application, you can leverage the multicore architecture for potential performance gains without a corresponding heat penalty. Recent studies assert that only about 30% of the code written today can be effectively parallelized. The speed increase with parallel is not a 1 to 1 based on the number of cores. Setting code up to run in parallel with the high level libraries has some “overhead” particularly in the background tasks that handle the threading. A quad core machine may see a 30%-40% decrease in time when run on large data sets. 8/8/2019

Task Parallel Library C# The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading namespace in the .NET Framework. This relies on a task scheduler that is integrated with the .NET ThreadPool. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. 8/8/2019

Task Parallel Library The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. Parallel code that is based on the TPL not only works on dual-core and quad-core computers. It will also automatically scale, without recompilation, to many-core computers. When you use the TPL, writing a multithreaded for loop closely resembles writing a sequential for loop. The following code automatically partitions the work into tasks, based on the number of processors on the computer. 8/8/2019

The Parallel for and foreach loops Parallel.For(startIndex, endIndex, (currentIndex) => DoSomeWork(currentIndex)); // Sequential version foreach (var item in sourceCollection) { Process(item); } // Parallel equivalent Parallel.ForEach(sourceCollection, item => Process(item)); 8/8/2019

Parallel Invoke You can schedule a task in several ways, the simplest of which is by using the Parallel.Invoke method. The following example executes two parallel tasks—one for MethodA and another for MethodB. This version of Parallel.Invoke accepts an array of Action delegates as the sole parameter. Action delegates have no arguments and return void. Parallel.Invoke(new Action[] { MethodA, MethodB }); The Parallel.Invoke method is convenient for executing multiple tasks in parallel. However, this method has limitations: Parallel.Invoke creates but does not return task objects. The Action delegate is limited—it has no parameters and no return value. Parallel.Invoke does not guarantee the ordering of task execution 8/8/2019

Code segments Use .Net Parallel Extensions (Task Parallel Library) and then see the sample/tutorial on how to get this working in the VS IDE. Parallel.ForEach( data, c => { if (c.value == 10) results.Add(c); } ); The code sample uses lambda expressions, but you can assume this is "magic syntax" for now.  You could also use delegates syntax if you'd like. 8/8/2019

Results You should see increased performance (reduced time to complete) relative to the number of processors/cores you have on the machine as long as your resources and data are independent. If the processes share data or resources there is likely to be significant blocking and reduced performance.  How might we solve this blocking/concurrency problem? 8/8/2019