Download presentation
Presentation is loading. Please wait.
Published byLenard Walters Modified over 9 years ago
1
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari
2
Parallel Processing Super Computer Parallel Computer Amdahl’s Low, Speedup, Efficiency Parallel Machine Architecture Computational Model Concurrency Approach Parallel Programming Cluster Computing Lecture organization
3
It is the division of work into smaller tasks Assigning many smaller tasks to multiple workers to work on simultaneously Parallel processing is the use of multiple processors to execute different parts of the same program simultaneously Difficulties: coordinating, controlling and monitoring the workers The main goals of parallel processing are: -solve much bigger problems much faster! to reduce wall-clock time of execution of computer programs to increase the size of computational problems that can be solved What is Parallel Processing?
4
What is a Supercomputer? A supercomputer is a computer that is a lot faster than the computers that normal people use Note: This is a time-dependent definition Manufacturer Computer/Procs R max R peak Installation Site Country/Year TMC CM-5/1024/ 1024 59.70 131.00 Los Alamos National Laboratory USA/ June 1993: TOP500 Lists Supercomputer & parallel computer
5
June 2003: Manufacturer Computer/Procs R max R peak Installation Site Country/Year NEC Earth-Simulator/ 5120 35860.00 40960.00 Earth simulator center Japan R max Maximal LINPACK performance achieved R peak Theoretical peak performance LINPACK is a Benchmark
7
Amdahl’s Law Amdahl’s low, Speedup, Efficiency
9
Efficiency is a measure of the fraction of time that a processor spends performing useful work. Efficiency
10
Shunt Operation
11
SIMD MIMD MISD Clusters Parallel and Distributed Computers
12
SIMD (Single Instruction Multiple Data)
13
MISD(Multi Instruction Single Data)
14
MIMD (Multiple Instruction Multiple Data)
15
MIMD(cont.)
16
Shared memory model Bus-based Switch-based NUMA Distributed memory model Distributed shared memory model Page-based Object-based Hardware Parallel machine architecture
17
Shared memory model
18
- Shared memory or Multiprocessor -OpenMP is a standard (C/C++/FORTRAN) Advantage: Easy Programming. Disadvantage: Design Complexity Not Scalable Shared memory model(cont.)
19
-Bus is bottleneck - Not scalable Bus-based shared memory model
20
- Maintenance is difficult. - Expensive - scalable Switch-based shared memory model
21
NUMA stands for Non-Uniform Memory Access. Simulated shared memory Better scalability NUMA model
22
Multi computer MPI(Message Passing Interface) Easy design Low cost High scalability Difficult programming Distributed memory model
23
Linear Array Ring Mesh Fully Connected 63 12 54 Examples of Network Topology
24
1110 1111 1010 1011 0110 0111 0010 0011 1101 1010 1000 1001 0100 0101 0010 0000 0001 S d = 4 Hypercubes Examples of Network Topology(cont.)
25
Simpler abstraction Sharing data easier portability Easy design with easy programming Low performance(for high communication) Distributed shared memory model
26
Degree of Coupling SIMDMIMD Shared Memory Distributed Memory Supported Grain Sizes Communication Speed slowfast fine coarse loose tight SIMDSMPNUMACluster Parallel and Distributed Architecture (Leopold, 2001)
27
RAM PRAM BSP LOGP MPI Computational Model
28
RAM Model
29
Synchronized Read Compute Write Cycle EREW ERCW CREW CRCW Control Private Memory P1P1 Private Memory P2P2 Private Memory PpPp Global Memory Parallel Random Access Machine PRAM Model
30
Generalization of PRAM Model Processor- Memory Pairs Communication Network Barrier Synchronization Super-step Processes Execute Communications Barrier Synchronization Bulk Synchronous Parallel (BSP) Model
31
Cost of superstep = w+max(hs,hr).g+l –w (maximum number of local operation) –hs (maximum # of packets sent) –hr (maximum # of packets received) g (communication throughput) p (number of Processors) l (synchronization latency) BSP Space Complexity
32
Closely related to BSP It models asynchronous execution News Parameters L (message latency) o The overhead, defined as the length of time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations. g : The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth P: The number of processor/memory modules. LogP Model
33
Logp (cont.)
34
What Is MPI? A message-passing library specification message-passing model not a compiler specification not a specific product For parallel computers, clusters, and heterogeneous networks Full-featured Designed to permit (unleash?) the development of parallel software libraries Designed to provide access to advanced parallel hardware for end users library writers tool developers MPI(Message Passing Interface)
35
Application MPI Comm. Application MPI Comm. Node 1 Node 2 Task 1Task 2 Virtual communication Real communication MPI Layer
36
Matrix Multiplication Example
37
PRAM Matrix Multiplication Cost Of PRAM Algorithm
38
BSP Matrix Multiplication Cost of algorithm
39
Concurrency Approach Control Parallel Data Parallel
40
Control Parallel
41
Data Parallel
42
The Best granularity for programming
43
Explicit Parallel Programming Occam, MPI, PVM Implicit Parallel Programming Parallel functional programming ML,… Concurrent object-oriented programming COOL,… Data parallel programming Fortran 90, HPF,… Parallel Programming
44
A Cluster system is –Parallel multicomputer built from high-end PCs and conventional high-speed network. –Support parallel programming Cluster Computing
45
Scientific Computing –Simulation, CFD, CAD/CAM, Weather prediction, process large volume of data Super server system –Scalable internet/ web server –Database server –Multimedia, video, audio server Applications Cluster Computing(cont.)
46
Cluster System Building Block High Speed Network HW OS Single System Image Layer System Tool Layer Application Layer Cluster Computing(cont.)
47
Why cluster computing? Scalability –Build small system first, grow it later. Low-cost –Hardware based on COTS model (Component off-the-shelf) –S/w(SoftWare) based on freeware from research community Easier to maintain Vendor independent Cluster Computing(cont.)
48
The End
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.