Parallel Computers Chapter 1.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

SE-292 High Performance Computing
Potential for parallel computers/parallel programming
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Chapter 1 Parallel Computers.
Parallel Computers Chapter 1
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
COMPE 462 Parallel Computing
History of Distributed Systems Joseph Cordina
Reference: Message Passing Fundamentals.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Parallel Computing and Parallel Computers. Home work assignment 1. Write few paragraphs (max two page) about yourself. Currently what going on in your.
Grid Computing, B. Wilkinson, 20047a.1 Computational Grids.
Parallel Computer Architecture and Interconnect 1b.1.
1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.
Department of Computer Science University of the West Indies.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
CSCI-455/522 Introduction to High Performance Computing Lecture 1.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Data Structures and Algorithms in Parallel Computing Lecture 1.
1 BİL 542 Parallel Computing. 2 Parallel Programming Chapter 1.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Parallel Computing Presented by Justin Reschke
1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2006.
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Potential for parallel computers/parallel programming
Parallel Computing and Parallel Computers
Overview Parallel Processing Pipelining
Distributed Shared Memory
Parallel Computing Demand for High Performance
Course Outline Introduction in algorithms and applications
Constructing a system with multiple computers or processors
Parallel Computers.
Pipelining and Vector Processing
Data Structures and Algorithms in Parallel Computing
Objective of This Course
Parallel Computing Demand for High Performance
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
CSE8380 Parallel and Distributed Processing Presentation
Overview Parallel Processing Pipelining
Constructing a system with multiple computers or processors
Parallel Computing Demand for High Performance
COMP60621 Fundamentals of Parallel and Distributed Systems
Parallel Computing Demand for High Performance
Chapter 4 Multiprocessors
Parallel Computing and Parallel Computers
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
COMP60611 Fundamentals of Parallel and Distributed Systems
Types of Parallel Computers
Presentation transcript:

Parallel Computers Chapter 1

The Demand for Computational Speed Introduction There is a continual demand for greater computational speed from a computer system than is currently possible Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems. Computations must be completed within a “reasonable” time period

The Demand for Computational Speed Grand Challenge Problems One that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable. Examples: Modeling large DNA structures Global weather forecasting Modeling motion of astronomical bodies

Grand Challenge Problems Weather Forecasting Atmosphere modeled by dividing it into 3-dimensional cells. Calculations of each cell repeated many times to model passage of time

Example: Global Weather Forecasting Suppose whole global atmosphere divided into cells of size 1 mile  1 mile  1 mile to a height of 10 miles (10 cells high) - about 5  108 cells. Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary. To forecast the weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days. To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4  1012 floating point operations/sec).

Example: Modeling Motion of Astronomical Bodies Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body. With N bodies, N - 1 forces to calculate for each body, or approx. N2 calculations. (N log2 N for an efficient approx. algorithm.) After determining new positions of bodies, calculations repeated.

Example: Modeling Motion of Astronomical Bodies A galaxy might have, say, 1011 stars. Even if each calculation done in 1 ms (extremely optimistic figure), it takes 109 years for one iteration using N2 algorithm and almost a year for one iteration using an efficient N log2 N approximate algorithm

Example: Modeling Motion of Astronomical Bodies Sample Output Could look like this:

Parallel Computing What is it? What is the Motive? Using more than one computer, or a computer with more than one processor, to solve a problem. What is the Motive? Usually faster computation - very simple idea - that n computers operating simultaneously can achieve the result n times faster - it will not be n times faster for various reasons. Other motives include: fault tolerance, larger amount of memory available, ...

Parallel Computing History – When did this all begin?: Parallel computers - computers with more than one processor - and their programming - parallel programming - has been around for more than 50 years.

Gill writes in 1958: “... There is therefore nothing new in the idea of parallel programming, but its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation ...” Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp. 2-10.

2. Potential for Increased Computational Speed Notation: Now and in the future, the number of processors will be identified as p. We will also use the term “multicomputer” to include all parallel computer systems with more than one processor. Now, on to some definitions:

2.1 Speedup Factor where ts is execution time on a single processor and tp is execution time on a multiprocessor. S(p) gives increase in speed by using multiprocessor. Use best sequential algorithm with single processor system. Underlying algorithm for parallel implementation might be (and is usually) different.

2.1 Speedup Factor Speedup factor can also be cast in terms of computational steps: Can also extend time complexity to parallel computations.

2.1 Speedup Factor Maximum speedup is usually p with p processors (linear speedup). It is possible to get superlinear speedup (greater than p) but usually a specific reason such as: Extra memory in multiprocessor system Nondeterministic algorithm There is also the definition of Efficiency

2.2 What is the Maximum Speedup? Several factors will appear as overhead in the parallel version and limit the speedup, notably: Periods when not all of the processors can be performing useful work and are simply idle. Extra computations (or duplicate ones) in the parallel version Communication time

2.2 What is the Maximum Speedup?

2.2 What is the Maximum Speedup? Speedup factor is given by: This equation is known as Amdahl’s law

2.2 What is the Maximum Speedup? Even with infinite number of processors, maximum speedup limited to 1/f . Example: With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors.

Superlinear Speedup example - Searching Searching each sub-space sequentially

Superlinear Speedup example - Searching Searching each sub-space in parallel

Superlinear Speedup example - Searching Speed-up then given by

Superlinear Speedup example - Searching The worst case for sequential search is when the solution is found in the last sub-space search. Then parallel version offers greatest benefit, i.e.

Superlinear Speedup example - Searching Least advantage for parallel version is when solution found in first sub-space search of the sequential search, i.e. Actual speed-up depends upon which subspace holds solution but could be extremely large

2.2 What is the Maximum Speedup? One last definition is Scalability. This is an overloaded word. Two types we are interested in are: Architecture or Hardware scalability Algorithmic scalability

2.3 Message-Passing Computations Message passing can be a very significant overhead portion of parallel computation tp = tcomm+tcomp The following ratio can be used as a comparison metric for algorithms (tcomp)/ (tcomm)

3. Types of Parallel Computers Now that we agree there is some potential for speedup, let us explore how a parallel machine might be constructed Two principal types: Shared memory multiprocessor Distributed memory multicomputer

Conventional Computer Consists of a processor executing a program stored in a (main) memory: Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.

3.1 Shared Memory Multiprocessor System The Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module :

3.1 Shared Memory Multiprocessor System A Simplistic view of a small shared memory multiprocessor Examples: Dual Pentiums Quad Pentiums

Quad Pentium Shared Memory Multiprocessor

Quad i7 Shared Memory Multiprocessor ..

Programming Shared Memory Multiprocessors Threads - programmer decomposes program into individual parallel sequences, (threads), each being able to access variables declared outside threads. Example Pthreads Sequential programming language with preprocessor compiler directives to declare shared variables and specify parallelism. Example OpenMP - industry standard - needs OpenMP compiler

Programming Shared Memory Multiprocessors Sequential programming language with added syntax to declare shared variables and specify parallelism. Example UPC (Unified Parallel C) - needs a UPC compiler. Parallel programming language with syntax to express parallelism - compiler creates executable code for each processor (not now common) Sequential programming language and ask parallelizing compiler to convert it into parallel executable code. - also not now common

3.2 Message-Passing Multicomputer Complete computers connected through an interconnection network:

3.2 Message-Passing Multicomputer There are a lot of definitions that are used to compare networks Bandwidth Latency (network, and message) Cost Bisection width

3.2 Message-Passing Multicomputer Interconnection Networks Limited and exhaustive interconnections 2- and 3-dimensional meshes Hypercube (not now common) Using Switches: Crossbar Trees Multistage interconnection networks

Two-dimensional array (mesh) There are also three-dimensional - used in some large high performance systems

Three-dimensional hypercube

Four-dimensional hypercube Hypercubes were popular in the 1980’s

Crossbar switch

Tree

Multistage Interconnection Network Example: Omega network

Communication Methods When routing messages you have two ways to transfer messages Circuit switching Packet switching There are issues to watch out for in routing algorithms Deadlock Livelock

3.3 Distributed Shared Memory Making main memory of group of interconnected computers look as though a single memory with single address space. Then you can use shared memory programming techniques

3.4 MIMD and SIMD Classifications Flynn (1966) created a classification for computers based upon instruction streams and data streams: Single instruction stream-single data stream (SISD) computer Single processor computer - single stream of instructions generated from program. Instructions operate upon a single stream of data items

3.4 MIMD and SIMD Classifications Multiple Instruction Stream-Multiple Data Stream (MIMD) Computer General-purpose multiprocessor system - each processor has a separate program and one instruction stream is generated from each program for each processor. Each instruction operates upon different data. Both the shared memory and the message-passing multiprocessors so far described are in the MIMD classification.

3.4 MIMD and SIMD Classifications Single Instruction Stream-Multiple Data Stream (SIMD) Computer A specially designed computer - a single instruction stream from a single program, but multiple data streams exist. Instructions from program broadcast to more than one processor. Each processor executes same instruction in synchronism, but using different data. Developed because a number of important applications that mostly operate upon arrays of data.

3.4 MIMD and SIMD Classifications Multiple Program Multiple Data (MPMD) Structure Within the MIMD classification, each processor will have its own program to execute:

3.4 MIMD and SIMD Classifications Single Program Multiple Data (SPMD) Structure Single source program written and each processor executes its personal copy of this program, although independently and not in synchronism. Source program can be constructed so that parts of the program are executed by certain computers and not others depending upon the identity of the computer

4. Cluster Computing Networked Computers as a Computing Platform. These became an attractive alternative to expensive supercomputers in the early 1990’s There were several early projects. Berkeley NOW Project (network of workstations) NASA Beowulf project.

4. Cluster Computing Key Advantages Very High performance workstations and PC’s readily available at low cost. The latest processors can easily be incorporated into the system as they become available. Existing software can be used or modified.

4.2 Cluster Configurations Existing Networked Computers Like the ECC lab. You need to be kind to the interactive users (those at the console) Problem – they can shut off one machine and mess up your computation.

4.2 Cluster Configurations Beowulf Clusters A group of interconnected “commodity” computers achieving high performance with low cost. Typically using commodity interconnects (high speed Ethernet), and Linux OS The Beowulf project started at NASA Goddard in 1993

4.2 Cluster Configurations Beyond Beowulf Cluster Interconnects Originally Fast Ethernet Now Gigabit Ethernet More Specialized Interconnects Myrinet – 2.4Gbits/sed cLan SCI QNet Infiniband In my opinion, this is the one to watch!

4.3 Setting up a Cluster Hardware Configuration

4.3 Setting up a Cluster Software Tools (Middleware) PVM – MPI Developed in the late 1980’s MPI Standard defined in the late 1990’s

5. Summary This chapter introduced the following concepts: Parallel computers and programming Speedup and other factors Extension of a single processor system into a shared memory multiprocessor The message passing multiprocessor Interconnection networks suitable for message passing Networked workstations as a parallel programming platform Cluster computing.