1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Shared-Memory Model and Threads Intel Software College Introduction to Parallel Programming – Part 2.
Variations of the Turing Machine
Adders Used to perform addition, subtraction, multiplication, and division (sometimes) Half-adder adds rightmost (least significant) bit Full-adder.
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Chapter 5: Repetition and Loop Statements Problem Solving & Program.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Create an Application Title 1A - Adult Chapter 3.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Custom Services and Training Provider Details Chapter 4.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Solve Multi-step Equations
Break Time Remaining 10:00.
Factoring Quadratics — ax² + bx + c Topic
Turing Machines.
PP Test Review Sections 6-1 to 6-6
Chapter 10: Virtual Memory
MAT 205 F08 Chapter 12 Complex Numbers.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
Subtraction: Adding UP
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Copyright © 2010 Pearson Education, Inc. All rights reserved Sec
Analyzing Genes and Genomes
Speak Up for Safety Dr. Susan Strauss Harassment & Bullying Consultant November 9, 2012.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Converting a Fraction to %
Exponents and Radicals
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Introduction Peter Dolog dolog [at] cs [dot] aau [dot] dk Intelligent Web and Information Systems September 9, 2010.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
FIGURE 3-1 Basic parts of a computer. Dale R. Patrick Electricity and Electronics: A Survey, 5e Copyright ©2002 by Pearson Education, Inc. Upper Saddle.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Lecture 2 CSS314 Parallel Computing
1/50 University of Turkish Aeronautical Association Computer Engineering Department Ceng 541 Introduction to Parallel Computing Dr. Tansel Dökeroğlu
The University of Adelaide, School of Computer Science
EE 193: Parallel Computing
Dr. Tansel Dökeroğlu University of Turkish Aeronautical Association Computer Engineering Department Ceng 442 Introduction to Parallel.
Presentation transcript:

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco

2 Copyright © 2010, Elsevier Inc. All rights Reserved Roadmap Why we need ever-increasing performance. Why were building parallel systems. Why we need to write parallel programs. How do we write parallel programs? What well be doing. Concurrent, parallel, distributed! # Chapter Subtitle

3 Changing times Copyright © 2010, Elsevier Inc. All rights Reserved From 1986 – 2002, microprocessors were speeding like a rocket, increasing in performance an average of 50% per year. Since then, its dropped to about 20% increase per year.

4 An intelligent solution Copyright © 2010, Elsevier Inc. All rights Reserved Instead of designing and building faster microprocessors, put multiple processors on a single integrated circuit.

5 Now its up to the programmers Adding more processors doesnt help much if programmers arent aware of them… … or dont know how to use them. Serial programs dont benefit from this approach (in most cases). Copyright © 2010, Elsevier Inc. All rights Reserved

6 Why we need ever-increasing performance Computational power is increasing, but so are our computation problems and needs. Problems we never dreamed of have been solved because of past increases, such as decoding the human genome. More complex problems are still waiting to be solved. Copyright © 2010, Elsevier Inc. All rights Reserved

7 Climate modeling Copyright © 2010, Elsevier Inc. All rights Reserved

8 Protein folding Copyright © 2010, Elsevier Inc. All rights Reserved

9 Drug discovery Copyright © 2010, Elsevier Inc. All rights Reserved

10 Energy research Copyright © 2010, Elsevier Inc. All rights Reserved

11 Data analysis Copyright © 2010, Elsevier Inc. All rights Reserved

12 Why were building parallel systems Up to now, performance increases have been attributable to increasing density of transistors. But there are inherent problems. Copyright © 2010, Elsevier Inc. All rights Reserved

13 A little physics lesson Smaller transistors = faster processors. Faster processors = increased power consumption. Increased power consumption = increased heat. Increased heat = unreliable processors. Copyright © 2010, Elsevier Inc. All rights Reserved

14 Solution Move away from single-core systems to multicore processors. core = central processing unit (CPU) Copyright © 2010, Elsevier Inc. All rights Reserved Introducing parallelism!!!

15 Why we need to write parallel programs Running multiple instances of a serial program often isnt very useful. Think of running multiple instances of your favorite game. What you really want is for it to run faster. Copyright © 2010, Elsevier Inc. All rights Reserved

16 Approaches to the serial problem Rewrite serial programs so that theyre parallel. Write translation programs that automatically convert serial programs into parallel programs. This is very difficult to do. Success has been limited. Copyright © 2010, Elsevier Inc. All rights Reserved

17 More problems Some coding constructs can be recognized by an automatic program generator, and converted to a parallel construct. However, its likely that the result will be a very inefficient program. Sometimes the best parallel solution is to step back and devise an entirely new algorithm. Copyright © 2010, Elsevier Inc. All rights Reserved

18 Example Compute n values and add them together. Serial solution: Copyright © 2010, Elsevier Inc. All rights Reserved

19 Example (cont.) We have p cores, p much smaller than n. Each core performs a partial sum of approximately n/p values. Copyright © 2010, Elsevier Inc. All rights Reserved Each core uses its own private variables and executes this block of code independently of the other cores.

20 Example (cont.) After each core completes execution of the code, is a private variable my_sum contains the sum of the values computed by its calls to Compute_next_value. Ex., 8 cores, n = 24, then the calls to Compute_next_value return: Copyright © 2010, Elsevier Inc. All rights Reserved 1,4,3, 9,2,8, 5,1,1, 5,2,7, 2,5,0, 4,1,8, 6,5,1, 2,3,9

21 Example (cont.) Once all the cores are done computing their private my_sum, they form a global sum by sending results to a designated master core which adds the final result. Copyright © 2010, Elsevier Inc. All rights Reserved

22 Example (cont.) Copyright © 2010, Elsevier Inc. All rights Reserved

23 Example (cont.) Copyright © 2010, Elsevier Inc. All rights Reserved Core my_sum Global sum = 95 Core my_sum

24 Copyright © 2010, Elsevier Inc. All rights Reserved But wait! Theres a much better way to compute the global sum.

25 Better parallel algorithm Dont make the master core do all the work. Share it among the other cores. Pair the cores so that core 0 adds its result with core 1s result. Core 2 adds its result with core 3s result, etc. Work with odd and even numbered pairs of cores. Copyright © 2010, Elsevier Inc. All rights Reserved

26 Better parallel algorithm (cont.) Repeat the process now with only the evenly ranked cores. Core 0 adds result from core 2. Core 4 adds the result from core 6, etc. Now cores divisible by 4 repeat the process, and so forth, until core 0 has the final result. Copyright © 2010, Elsevier Inc. All rights Reserved

27 Multiple cores forming a global sum Copyright © 2010, Elsevier Inc. All rights Reserved

28 Analysis In the first example, the master core performs 7 receives and 7 additions. In the second example, the master core performs 3 receives and 3 additions. The improvement is more than a factor of 2! Copyright © 2010, Elsevier Inc. All rights Reserved

29 Analysis (cont.) The difference is more dramatic with a larger number of cores. If we have 1000 cores: The first example would require the master to perform 999 receives and 999 additions. The second example would only require 10 receives and 10 additions. Thats an improvement of almost a factor of 100! Copyright © 2010, Elsevier Inc. All rights Reserved

30 How do we write parallel programs? Task parallelism Partition various tasks carried out solving the problem among the cores. Data parallelism Partition the data used in solving the problem among the cores. Each core carries out similar operations on its part of the data. Copyright © 2010, Elsevier Inc. All rights Reserved

31 Professor P Copyright © 2010, Elsevier Inc. All rights Reserved 15 questions 300 exams

32 Professor Ps grading assistants Copyright © 2010, Elsevier Inc. All rights Reserved TA#1 TA#2 TA#3

33 Division of work – data parallelism Copyright © 2010, Elsevier Inc. All rights Reserved TA#1 TA#2 TA#3 100 exams

34 Division of work – task parallelism Copyright © 2010, Elsevier Inc. All rights Reserved TA#1 TA#2 TA#3 Questions Questions Questions

35 Division of work – data parallelism Copyright © 2010, Elsevier Inc. All rights Reserved

36 Division of work – task parallelism Copyright © 2010, Elsevier Inc. All rights Reserved Tasks 1) Receiving 2) Addition

37 Coordination Cores usually need to coordinate their work. Communication – one or more cores send their current partial sums to another core. Load balancing – share the work evenly among the cores so that one is not heavily loaded. Synchronization – because each core works at its own pace, make sure cores do not get too far ahead of the rest. Copyright © 2010, Elsevier Inc. All rights Reserved

38 What well be doing Learning to write programs that are explicitly parallel. Using the C language. Using three different extensions to C. Message-Passing Interface (MPI) Posix Threads (Pthreads) OpenMP Copyright © 2010, Elsevier Inc. All rights Reserved

39 Type of parallel systems Shared-memory The cores can share access to the computers memory. Coordinate the cores by having them examine and update shared memory locations. Distributed-memory Each core has its own, private memory. The cores must communicate explicitly by sending messages across a network. Copyright © 2010, Elsevier Inc. All rights Reserved

40 Type of parallel systems Copyright © 2010, Elsevier Inc. All rights Reserved Shared-memoryDistributed-memory

41 Terminology Concurrent computing – a program is one in which multiple tasks can be in progress at any instant. Parallel computing – a program is one in which multiple tasks cooperate closely to solve a problem Distributed computing – a program may need to cooperate with other programs to solve a problem. Copyright © 2010, Elsevier Inc. All rights Reserved

42 Concluding Remarks (1) The laws of physics have brought us to the doorstep of multicore technology. Serial programs typically dont benefit from multiple cores. Automatic parallel program generation from serial program code isnt the most efficient approach to get high performance from multicore computers. Copyright © 2010, Elsevier Inc. All rights Reserved

43 Concluding Remarks (2) Learning to write parallel programs involves learning how to coordinate the cores. Parallel programs are usually very complex and therefore, require sound program techniques and development. Copyright © 2010, Elsevier Inc. All rights Reserved