Download presentation
Presentation is loading. Please wait.
Published byEliseo Burchard Modified over 9 years ago
1
Parallel Programming Yang Xianchun Department of Computer Science and Technology Nanjing University Introduction
2
Agenda –About the Course –Evolution of Computing Technology –Grand Challenge Problem Examples –Motivations for Parallelism –Parallel Computation –A Road Map of Topics –Parallel Computing Terminology –An Illustrative Example Control-Parallel Approach Data-Parallel Approach Performance Analysis
3
About the course Description –An introduction course to parallel programming concepts and techniques –No prior parallel computation background is necessary Prerequisites –Working knowledge of computer systems –Adequate programming skill in C or C++ Textbook –Harry Jordan and Gita Alaghband, “Fundamentals of Parallel Processing,” Prentice Hall, 2003.
4
About the course About the course (cont.) Topics –Introduction –Parallel architectures –Data parallel computing Fortran 90 / 95 –Shared memory computing Pthreads OpenMP –Message passing programming PVM MPI –Applications Sorting Numerical algorithms Graph algorithms –Interconnection network –Performance –Models
5
About the course About the course (cont.) Organization –Lectures –Programming projects using parallel programming tools PVM and MPI on Linux / Unix machines –Homeworks –Class Tests and Final Exam (Open book)
6
Evolution of Computing Technology Hardware –Vacuum tubes, relay memory –Discrete transistors, core memory –Integrated circuits, pipelined CPU –VLSI microprocessors, solid state memory Languages and Software –Machine / Assembly languages –Algol / Fortran with compilers, batch processing OS –C, multiprocessing, timesharing OS –C++ / Java, parallelizing compilers, distributed OS
7
Evolution of Computing Technology Evolution of Computing Technology (cont.) The Driving Force Behind the Technology Advances –The ever-increasing demands on computing power Scientific computing (e.g. Large-scale simulations) Commercial computing (e.g. Databases) 3D graphics and realistic animation Multimedia internet applications
8
Grand Challenge Problem Examples Simulations of the earth’s climate Resolution: 10 kilometers Period: 1 year Ocean and biosphere models: simple –Total requirements: 10 16 floating-point operations per second –With a supercomputer capable of 10 Giga FLOPS, it will take 10 days to execute Real-time processing of 3D graphics Number of data elements: 10 9 (1024 in each dimension) Number of operations per element : 200 Update rate: 30 times per second –Total requirements: 6.4 x 10 12 operations per second –With processor capable of 10 Giga IOPS, we need 640 of them
9
Motivations for Parallelism Conventional computers and sequential a single CPU a single stream of instructions executing one instruction at a time (not completely true) –Single-CPU processor has a performance limit Moore’s Law can’t go on forever How to increase computing power? Better processor design –More transistors, larger caches, advanced architectures Better system design –Faster / larger memory, faster buses, better OS Scale up the computer (parallelism) –Replicate hardware at component or whole computer levels –Parallel processor’s power is virtually unlimited 10 processor @ 500 Mega FLOPS each = 5 Giga FLOPS 100 processor @ 500 Mega FLOPS each = 50 Giga FLOPS 1,000 processor @ 500 Mega FLOPS each = 500 Giga FLOPS...
10
Motivations for Parallelism Motivations for Parallelism (cont.) Additional Motivations –Solving bigger problems –Lowering cost
11
Parallel Computation Parallel computation means –multiple CPUs –single or multiple streams of instructions –executing multiple instructions at a time Typical process –Breaking a problem into pieces and arranging for all pieces to be solved simultaneously on a multi-CPU computer system Requirements –Parallel algorithms only parallelizable applications can benefit from parallel implementation –Parallel languages and/or constructs expressing parallelism –Parallel architectures provide hardware support
12
A Road Map of Topics Parallel Architectures Vector / SIMD / MIMD design issues Machine Models PRAM LogP Programming Models data parallel shared memory message passing Parallel Algorithms master-slave divide-conquer control vs. Data complexity analysis Programming Languages Fortran 90 / 95 Pthreads, Open MP PVM, MPI Applications Scientific computations data-intensive problems performance measurement
13
Parallel Computing Terminology Parallel Computing Terminology (1) Hardware –Multicomputers tightly networked, multiple uniform computers –Multiprocessors tightly networked, multiple uniform processors with additional memory units –Supercomputers general purpose and high-performance, nowadays almost always parallel –Clusters Loosely networked commodity computers
14
Parallel Computing Terminology Parallel Computing Terminology (2) Programming –Pipelining divide computation into stages (segments) assign separate functional units to each stage –Data Parallelism multiple (uniform) functional units apply same operation simultaneously to different elements of data set –Control Parallelism multiple (specialized) functional units apply distinct operations to data elements concurrently
15
Parallel Computing Terminology Parallel Computing Terminology (3) Performance –Throughput number of results per unit time –Speedup Time needed for the most efficient sequential algorithm S= —————————————————————————— — Time needed on a pipelined / parallel machine –Scalability An algorithm is scalable if the available parallelism increases at least linearly with problem size An architecture is scalable if it gives same performance per processor, as the number of processors and the size of the problem are both increased Data-parallel algorithms tend to be more scalable than control-parallel algorithms
16
An Illustrative Example Problem –Find all primes less than or equal to some positive integer n Method (the sieve algorithm) –Write down all th integers from 1 to n –Cross out from the list all multiples of 2, 3, 5, 7, … up to sqrt (n)
17
An Illustrative Example (cont.) sequential Implementation Boolean array representing the integers from 1 to n Buffer for holding current prime Index for loop iterating through the array
18
Control-Parallel Approach Different processors strike out multiples of different primes The boolean array and the current prime is shared; each processor has its own private copy of loop index An Illustrative Example (cont.)
19
Control-Parallel Approach (cont.) –Potential Problem — Race Conditions Race 1: More than one processor may sieve multiples of the same prime –a processor reads the current prime, p, and goes off to sieve multiples of p ; later it finds a new prime and updates the current prime buffer –before the current prime is updated, another processor comes and reads p and goes off to do the same thing Race 2: A processor may sieve multiples of a composite number –processor A start marking multiples of 2 –before it can mark any cells, processor B finds an unmarked cell, 3, and starts marking multiples of 3 –then processor C comes in and finds the next unmarked cell, 4, and starts marking multiples of 4 These two race conditions would not cause incorrect result, but they will cause inefficiency An Illustrative Example (cont.)
20
Data-Parallel Approach Each processor responsible for a unique range of the integers, it does all the striking in that range Processor 1 is responsible for broadcasting its findings to other processors Potential Problem –If [n/p] < sqrt(n), more than one processor need to broadcast their findings An Illustrative Example (cont.)
21
Performance Analysis –Sequential Algorithm Cost of sieving multiples of 2: [(n-3)/2] Cost of sieving multiples of 3: [(n-8)/3] Cost of sieving multiples of 5: [(n-24)/5]... For n=1,000, T=1,411 –Control-Parallel Algorithm For p=2, n=1,000, T=706 For p=3, n=1,000, T=499 For p=4, n=1,000, T=499 An Illustrative Example (cont.)
22
Performance Analysis (cont.) –Data-Parallel Algorithm Cost of broadcasting: k(P-1) Cost of striking: ([(n/p)/2]+ [(n/p)/3]+ … + [(n/p)/ k ]) For p=2, n=1,000, T≈781 For p=3, n=1,000, T≈ 471 For p=4, n=1,000, T≈ 337 An Illustrative Example (cont.)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.