Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.

Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

Parallel Processing Super Computer Parallel Computer Amdahl’s Low, Speedup, Efficiency Parallel Machine Architecture Computational Model Concurrency Approach Parallel Programming Cluster Computing Lecture organization

It is the division of work into smaller tasks Assigning many smaller tasks to multiple workers to work on simultaneously Parallel processing is the use of multiple processors to execute different parts of the same program simultaneously Difficulties: coordinating, controlling and monitoring the workers The main goals of parallel processing are: -solve much bigger problems much faster! to reduce wall-clock time of execution of computer programs to increase the size of computational problems that can be solved What is Parallel Processing?

What is a Supercomputer? A supercomputer is a computer that is a lot faster than the computers that normal people use Note: This is a time-dependent definition Manufacturer Computer/Procs R max R peak Installation Site Country/Year TMC CM-5/1024/ 1024 59.70 131.00 Los Alamos National Laboratory USA/ June 1993: TOP500 Lists Supercomputer & parallel computer

June 2003: Manufacturer Computer/Procs R max R peak Installation Site Country/Year NEC Earth-Simulator/ 5120 35860.00 40960.00 Earth simulator center Japan R max Maximal LINPACK performance achieved R peak Theoretical peak performance LINPACK is a Benchmark

Amdahl’s Law Amdahl’s low, Speedup, Efficiency

Efficiency is a measure of the fraction of time that a processor spends performing useful work. Efficiency

Shunt Operation

SIMD MIMD MISD Clusters Parallel and Distributed Computers

SIMD (Single Instruction Multiple Data)

MISD(Multi Instruction Single Data)

MIMD (Multiple Instruction Multiple Data)

MIMD(cont.)

Shared memory model Bus-based Switch-based NUMA Distributed memory model Distributed shared memory model Page-based Object-based Hardware Parallel machine architecture

Shared memory model

- Shared memory or Multiprocessor -OpenMP is a standard (C/C++/FORTRAN) Advantage: Easy Programming. Disadvantage: Design Complexity Not Scalable Shared memory model(cont.)

-Bus is bottleneck - Not scalable Bus-based shared memory model

- Maintenance is difficult. - Expensive - scalable Switch-based shared memory model

NUMA stands for Non-Uniform Memory Access. Simulated shared memory Better scalability NUMA model

Multi computer MPI(Message Passing Interface) Easy design Low cost High scalability Difficult programming Distributed memory model

Linear Array Ring Mesh Fully Connected 63 12 54 Examples of Network Topology

1110 1111 1010 1011 0110 0111 0010 0011 1101 1010 1000 1001 0100 0101 0010 0000 0001 S d = 4 Hypercubes Examples of Network Topology(cont.)

Simpler abstraction Sharing data easier portability Easy design with easy programming Low performance(for high communication) Distributed shared memory model

Degree of Coupling SIMDMIMD Shared Memory Distributed Memory Supported Grain Sizes Communication Speed slowfast fine coarse loose tight SIMDSMPNUMACluster Parallel and Distributed Architecture (Leopold, 2001)

RAM PRAM BSP LOGP MPI Computational Model

RAM Model

Synchronized Read Compute Write Cycle EREW ERCW CREW CRCW Control Private Memory P1P1 Private Memory P2P2 Private Memory PpPp Global Memory Parallel Random Access Machine PRAM Model

Generalization of PRAM Model Processor- Memory Pairs Communication Network Barrier Synchronization Super-step Processes Execute Communications Barrier Synchronization Bulk Synchronous Parallel (BSP) Model

Cost of superstep = w+max(hs,hr).g+l –w (maximum number of local operation) –hs (maximum # of packets sent) –hr (maximum # of packets received) g (communication throughput) p (number of Processors) l (synchronization latency) BSP Space Complexity

Closely related to BSP It models asynchronous execution News Parameters L (message latency) o The overhead, defined as the length of time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations. g : The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth P: The number of processor/memory modules. LogP Model

Logp (cont.)

What Is MPI? A message-passing library specification message-passing model not a compiler specification not a specific product For parallel computers, clusters, and heterogeneous networks Full-featured Designed to permit (unleash?) the development of parallel software libraries Designed to provide access to advanced parallel hardware for end users library writers tool developers MPI(Message Passing Interface)

Application MPI Comm. Application MPI Comm. Node 1 Node 2 Task 1Task 2 Virtual communication Real communication MPI Layer

Matrix Multiplication Example

PRAM Matrix Multiplication Cost Of PRAM Algorithm

BSP Matrix Multiplication Cost of algorithm

Concurrency Approach Control Parallel Data Parallel

Control Parallel

Data Parallel

The Best granularity for programming

Explicit Parallel Programming Occam, MPI, PVM Implicit Parallel Programming Parallel functional programming ML,… Concurrent object-oriented programming COOL,… Data parallel programming Fortran 90, HPF,… Parallel Programming

A Cluster system is –Parallel multicomputer built from high-end PCs and conventional high-speed network. –Support parallel programming Cluster Computing

Scientific Computing –Simulation, CFD, CAD/CAM, Weather prediction, process large volume of data Super server system –Scalable internet/ web server –Database server –Multimedia, video, audio server Applications Cluster Computing(cont.)

Cluster System Building Block High Speed Network HW OS Single System Image Layer System Tool Layer Application Layer Cluster Computing(cont.)

Why cluster computing? Scalability –Build small system first, grow it later. Low-cost –Hardware based on COTS model (Component off-the-shelf) –S/w(SoftWare) based on freeware from research community Easier to maintain Vendor independent Cluster Computing(cont.)

The End

Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.

Similar presentations

Presentation on theme: "Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.

Similar presentations

Presentation on theme: "Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari."— Presentation transcript:

Similar presentations

About project

Feedback