Classification of parallel computers Limitations of parallel processing.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
SE-292 High Performance Computing
Potential for parallel computers/parallel programming
SISD—Single Instruction Single Data Xin Meng Tufts University School of Engineering.
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.

Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
PSU CS 106 Computing Fundamentals II Introduction HM 1/3/2009.
1 Pertemuan 25 Parallel Processing 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
CS 470/570:Introduction to Parallel and Distributed Computing.
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
18-447: Computer Architecture Lecture 30B: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/22/2013.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Pipeline And Vector Processing. Parallel Processing The purpose of parallel processing is to speed up the computer processing capability and increase.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Yulia Newton CS 147, Fall 2009 SJSU. What is it? “Parallel processing is the ability of an entity to carry out multiple operations or tasks simultaneously.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
2015/10/14Part-I1 Introduction to Parallel Processing.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.
Chapter 9: Alternative Architectures In this course, we have concentrated on single processor systems But there are many other breeds of architectures:
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
CSE Advanced Computer Architecture Week-1 Week of Jan 12, 2004 engr.smu.edu/~rewini/8383.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Parallel Computing.
Pipelining and Parallelism Mark Staveley
Outline Why this subject? What is High Performance Computing?
Computer Architecture And Organization UNIT-II Flynn’s Classification Of Computer Architectures.
EKT303/4 Superscalar vs Super-pipelined.
Parallel Processing Chapter 9. Problem: –Branches, cache misses, dependencies limit the (Instruction Level Parallelism) ILP available Solution:
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
An Overview of Parallel Processing
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
Computer Architecture Lecture 27: Multiprocessors Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/6/2015.
CS203 – Advanced Computer Architecture Performance Evaluation.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
CS203 – Advanced Computer Architecture
Potential for parallel computers/parallel programming
CLASSIFICATION OF PARALLEL COMPUTERS
18-447: Computer Architecture Lecture 30B: Multiprocessors
Computer Architecture: Parallel Processing Basics
Distributed Processors
buses, crossing switch, multistage network.
Parallel Processing - introduction
What Exactly is Parallel Processing?
Flynn’s Classification Of Computer Architectures
Chapter 17 Parallel Processing
Symmetric Multiprocessing (SMP)
buses, crossing switch, multistage network.
Overview Parallel Processing Pipelining
AN INTRODUCTION ON PARALLEL PROCESSING
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Chapter 2 Parallel Programming background
Presentation transcript:

Classification of parallel computers Limitations of parallel processing

Classification of parallel computers 1966 M. J. Flynn has made an informal classification of computer parallelism based on the number of simultaneous instruction and data streams, which can be distinguished during operation of a computer system

Classification of parallel computers SISD Single Instruction Stream Single Data Stream Conventional architectures (von Neumann’s) Vector computers? PU CU MM DSIS PU – Processing Unit CU – Control Unit MM – Memory Module IS – Instruction Stream DS – Data Stream

Classification of parallel computers SIMD Single Instruction Stream Multiple Data Streams Vector computers? Array computers PE 1 PE 2 PE n CU MM 1 MM 2 MM m DS 1 DS 2 DS n IS

Classification of parallel computers MISD Multiple Instruction Streams Single Data Stream Nonexistent, systolic array, pipelining? PU 1 PU 2 PU n CU 1 CU 2 CU n MM 1 MM 2 MM m DS. IS 1 IS 2 IS n

Classification of parallel computers MIMD Multiple Instruction Streams Multiple Data Streams Multiprocessor systems Multicomputer systems PE 1 PE 2 PE n CU 1 CU 2 CU n MM 1 MM 2 MM m DS 1 DS 2 DS n IS 1 IS 2 IS n

Classification of parallel computers M. J. Flynn Advantages of classification Simplicity Disadvantages Does not include all solutions and classes MISD is an empty layer MIMD is overloaded layer

Classification of parallel computers according to sources of parallelism Data-level parallelism The same operation performed on multiple data units Instruction-level parallelism (low-level parallelism) Instruction Pipelining Multiple Processor Functional Units Data flow analysis / out-of-order execution / branch prediction Process/Thread-level parallelism (high-level parallelism)

Classification of parallel computers according to sources of parallelism Data-Level Parallelism (DLP) Time Instruction-Level Parallelism (ILP) Time Thread-Level Parallelism (TLP)

First mechanisms of parallel processing Evolution of I/O functions Interrupts DMA I/O processors Development of memory Virtual memory Cache memory Multiplied ALUs (IBM 360, CDC 6600)

First mechanisms of parallel processing Pipelining Pipelined control unit (Instruction Pipelining) Pipelined arithmetic-logic unit (Arithmetic pipelining)

Limitations of parallel processing How much faster is my program going to be run on a parallel computer than on a machine without any mechanisms of parallel processing (uniprocessor)?

Time and Processor Complexity Given an algorithm and an input data of a given size n Time complexity T(n) is a number of time steps needed to execute the algorithm for the given input of size n Processor complexity P(n) is a number of processors used in the execution of the algorithm for the given input of size n T(p,n) is a number of time steps needed to execute the algorithm on p processors for the given input of size n

Speedup Speedup S(p,n) gives the factor of acceleration going from a sequential execution of an algorithm on one processor to the parallel execution of the parallel algorithm on p processors

Speedup T*(1, n) is the execution time for the best known sequential algorithm Typically 1<= S(p,n) <= p S(p,n) = p – ideal speedup S(p,n) > p – superlinear speedup

= P0P0 = P1P1 = P2P2 = P3P3 = P4P4 = P5P5 = P6P6 = P7P7 = P0P0 += S = A[0] S += A[1] S += A[2] S += A[3] S += A[4] S += A[5] S += A[6] S += A[7] += Time

Cost Cost is execution time times number of processors A cost-optimal algorithm is an algorithm for which the cost to solve a problem on parallel system is proportional to the cost on a single processor

= P0P0 = P1P1 = P2P2 = P3P3 = P4P4 = P5P5 = P6P6 = P7P7 = P0P0 += S = A[0] S += A[1] S += A[2] S += A[3] S += A[4] S += A[5] S += A[6] S += A[7] += Time

Efficiency Efficiency relates the speedup to a number of processors Efficiency represents fraction of time for which a processor does a useful work

= P0P0 = P1P1 = P2P2 = P3P3 = P4P4 = P5P5 = P6P6 = P7P7 = P0P0 += S = A[0] S += A[1] S += A[2] S += A[3] S += A[4] S += A[5] S += A[6] S += A[7] += Time

Amdahl’s Law Assuming the constant size of a problem, what is the maximum speedup that can be obtained with the use of parallel processing?

Amdahl’s Law Sequential part Perfectly parallelizable part (1-f) / 2f 1 f1-f (1-f) / 4 f p = 1 p = 2 p = 4

Amdahl’s Law

Gustafson’s Law Is the reduced execution time obtained thanks to parallel processing always of the highest priority? What about the increased amount of work that can be performed thanks to application of parallel processing within the same period of time?

Gustafson’s Law Sequential part Perfectly parallelizable part 1-ff 1 f f p = 1 p = 2 p = 4

Gustafson’s Law

Grain of parallelism Fine grain of parallelism type of problems Coarse grain of parallelism type of problems

Loosely & Tightly Coupled Design Approach Loosely & tightly coupled design approach represents the degree of internal coupling between components of the computer system Mentioned degree of coupling corresponds to the overhead associated with the communication as well as the potential to upscale a given design