Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Today’s topics Single processors and the Memory Hierarchy
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Multiple Processor Systems
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
History of Distributed Systems Joseph Cordina
Advanced Topics in Algorithms and Data Structures An overview of the lecture 2 Models of parallel computation Characteristics of SIMD models Design issue.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

Chapter 17 Parallel Processing.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
Models of Parallel Computation Advanced Algorithms & Data Structures Lecture Theme 12 Prof. Dr. Th. Ottmann Summer Semester 2006.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500 Cluster.
1 Pertemuan 25 Parallel Processing 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Chapter 18 Parallel Processing (Multiprocessing).
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Parallel Computers1 RISC and Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters,
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Lecture 3 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
An Overview of Parallel Processing
Midterm 3 Revision and Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Parallel Computers Prof. Sin-Min Lee Department of Computer Science.
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Overview Parallel Processing Pipelining
Parallel Architecture
Multiprocessor Systems
Parallel Processing - introduction
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Parallel and Multiprocessor Architectures
Multiprocessor Introduction and Characteristics of Multiprocessor
Parallel Architectures Based on Parallel Computing, M. J. Quinn
Chapter 17 Parallel Processing
Outline Interconnection networks Processor arrays Multiprocessors
Overview Parallel Processing Pipelining
AN INTRODUCTION ON PARALLEL PROCESSING
Advanced Computer and Parallel Processing
Chapter 4 Multiprocessors
Advanced Computer and Parallel Processing
William Stallings Computer Organization and Architecture
Presentation transcript:

Introduction to Parallel Processing Debbie Hui CS 147 – Prof. Sin-Min Lee 7 / 11 / 2001

Parallel Processing Parallelism in Uniprocessor Systems Parallelism in Uniprocessor Systems Organization of Multiprocessor Systems Organization of Multiprocessor Systems

Parallelism in Uniprocessor Systems A computer achieves parallelism when it performs two or more unrelated tasks simultaneouslyA computer achieves parallelism when it performs two or more unrelated tasks simultaneously

Uniprocessor Systems Uniprocessor system may incorporate parallelism using: an instruction pipelinean instruction pipeline a fixed or reconfigurable arithmetic pipelinea fixed or reconfigurable arithmetic pipeline I/O processorsI/O processors vector arithmetic unitsvector arithmetic units multiport memorymultiport memory

Uniprocessor Systems Instruction pipeline: By overlapping the fetching, decoding, and execution of instructionsBy overlapping the fetching, decoding, and execution of instructions Allows the CPU to execute one instruction per clock cycleAllows the CPU to execute one instruction per clock cycle

Uniprocessor Systems Reconfigurable Arithmetic Pipeline: Better suited for general purpose computingBetter suited for general purpose computing Each stage has a multiplexer at its inputEach stage has a multiplexer at its input The control unit of the CPU sets the selected data to configure the pipelineThe control unit of the CPU sets the selected data to configure the pipeline Problem: Although arithmetic pipelines can perform many iterations of the same operation in parallel, they cannot perform different operations simultaneously.Problem: Although arithmetic pipelines can perform many iterations of the same operation in parallel, they cannot perform different operations simultaneously.

Uniprocessor Systems Vectored Arithmetic Unit: Provides a solution to the reconfigurable arithmetic pipeline problemProvides a solution to the reconfigurable arithmetic pipeline problem Purpose: to perform different arithmetic operations in parallelPurpose: to perform different arithmetic operations in parallel

Uniprocessor Systems Vectored Arithmetic Unit (cont.): Contains multiple functionalContains multiple functionalunits - Some performs addition, subtraction, etc. subtraction, etc. Input and output switchesInput and output switches are needed to route the proper data to their proper destinations - Switches are set by the control unit control unit

Uniprocessor Systems Vectored Arithmetic Unit (cont.): How do we get all that data to the vector arithmetic unit? By transferring several data values simultaneously using: - Multiple buses - Very wide data buses

Uniprocessor Systems Improve performance: Allowing multiple, simultaneous memory accessAllowing multiple, simultaneous memory access - requires multiple address, data, and control buses (one set for each simultaneous memory access) (one set for each simultaneous memory access) - The memory chip has to be able to handle multiple transfers simultaneously transfers simultaneously

Uniprocessor Systems Multiport Memory: Has two sets of address, data, and control pins to allow simultaneous data transfers to occurHas two sets of address, data, and control pins to allow simultaneous data transfers to occur CPU and DMA controller can transfer data concurrentlyCPU and DMA controller can transfer data concurrently A system with more than one CPU could handle simultaneous requests from two different processorsA system with more than one CPU could handle simultaneous requests from two different processors

Uniprocessor Systems Multiport Memory (cont.): Can - Multiport memory can handle two requests to read data from the same location at the same time Cannot - Process two simultaneous requests to write data to the same memory location - Requests to read from and write to the same memory location simultaneously

Organization of Multiprocessor Systems Three different ways to organize/classify systems: Flynn’s Classification System Topologies MIMD System Architectures

Multiprocessor Systems Flynn’s Classification Flynn’s Classification: Based on the flow of instructions and data processingBased on the flow of instructions and data processing A computer is classified by:A computer is classified by: - whether it processes a single instruction at a time or multiple instructions simultaneously - whether it operates on one more multiple data sets

Multiprocessor Systems Flynn’s Classification Four Categories of Flynn’s Classification: SISDSingle instruction single dataSISDSingle instruction single data SIMDSingle instruction multiple dataSIMDSingle instruction multiple data MISDMultiple instruction single data **MISDMultiple instruction single data ** MIMDMultiple instruction multiple dataMIMDMultiple instruction multiple data ** The MISD classification is not practical to implement. In fact, no significant MISD computers have ever been build. It is included only for completeness.

Multiprocessor Systems Flynn’s Classification Single instruction single data (SISD): Consists of a single CPU executing individual instructions on individual data valuesConsists of a single CPU executing individual instructions on individual data values

Multiprocessor Systems Flynn’s Classification Single instruction multiple data (SIMD): Main Memory Control Unit Processor Memory Communications Network Executes a single instruction on multiple data values simultaneously using many processors Executes a single instruction on multiple data values simultaneously using many processors Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the instruction Since only one instruction is processed at any given time, it is not necessary for each processor to fetch and decode the instruction This task is handled by a single control unit that sends the control signals to each processor. This task is handled by a single control unit that sends the control signals to each processor. Example: Array processor Example: Array processor

Multiprocessor Systems Flynn’s Classification Multiple instruction Multiple data (MIMD): Executes different instructions simultaneouslyExecutes different instructions simultaneously Each processor must include its own control unitEach processor must include its own control unit The processors can be assigned to parts of the same task or to completely separate tasksThe processors can be assigned to parts of the same task or to completely separate tasks Example: Multiprocessors, multicomputersExample: Multiprocessors, multicomputers

Multiprocessor Systems System Topologies System Topologies: The topology of a multiprocessor system refers to the pattern of connections between its processorsThe topology of a multiprocessor system refers to the pattern of connections between its processors Quantified by standard metrics:Quantified by standard metrics: DiameterThe maximum distance between two processors in the computer system DiameterThe maximum distance between two processors in the computer system BandwidthThe capacity of a communications link multiplied by the number of such links in the system (best case) BandwidthThe capacity of a communications link multiplied by the number of such links in the system (best case) Bisectional BandwidthThe total bandwidth of the links connecting the two halves of the processor split so that the number of links between the two halves is minimized (worst case) Bisectional BandwidthThe total bandwidth of the links connecting the two halves of the processor split so that the number of links between the two halves is minimized (worst case)

Multiprocessor Systems System Topologies Six Categories of System Topologies: Shared bus Ring Tree Mesh Hypercube Completely Connected

Multiprocessor Systems System Topologies Shared bus: The simplest topologyThe simplest topology Processors communicate with each other exclusively via this busProcessors communicate with each other exclusively via this bus Can handle only one data transmission at a timeCan handle only one data transmission at a time Can be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitryCan be easily expanded by connecting additional processors to the shared bus, along with the necessary bus arbitration circuitry Shared Bus Global Memory M P M P M P

Multiprocessor Systems System Topologies Ring: Uses direct dedicated connections between processorsUses direct dedicated connections between processors Allows all communication links to be active simultaneouslyAllows all communication links to be active simultaneously A piece of data may have to travel through several processors to reach its final destinationA piece of data may have to travel through several processors to reach its final destination All processors must have two communication linksAll processors must have two communication links P PP PP P

Multiprocessor Systems System Topologies Tree topology: Uses direct connections between processorsUses direct connections between processors Each processor has three connectionsEach processor has three connections Its primary advantage is its relatively low diameterIts primary advantage is its relatively low diameter Example: DADO ComputerExample: DADO Computer P PP PPP

Multiprocessor Systems System Topologies Mesh topology: Every processor connects to the processors above, below, left, and rightEvery processor connects to the processors above, below, left, and right Left to right and top to bottom wraparound connections may or may not be presentLeft to right and top to bottom wraparound connections may or may not be present PPP PPP PPP

Multiprocessor Systems System Topologies Hypercube: Multidimensional meshMultidimensional mesh Has n processors, each with log n connectionsHas n processors, each with log n connections

Multiprocessor Systems System Topologies Completely Connected: Every processor has n-1 connections, one to each of the other processors The complexity of the processors increases as the system grows Offers maximum communication capabilities

Multiprocessor Systems System Topologies TOPOLOGYDIAMETERBANDWIDTH BISECTION BANDWIDTH Sharedl 1 * l Ring  n / 2  n * l 2 * l Tree 2  lg n  (n – 1) * l 1 * l Mesh * 2  n 2n – 2  n 2   n / 2  * l Mesh **  n n n n 2n * l 2  n * l Hypercube lg n (n/2) * lg n * l (n/2) * l Comp. Con. 1 (n/2)*(n-1) * l (  n/2  *  n/2  )* l * Without wraparound ** With wraparound l = bandwidth of the bus n = number of processors

Multiprocessor Systems MIMD System Architecture MIMD System Architecture: The architecture of an MIMD system refers to its connections with respect to system memoryThe architecture of an MIMD system refers to its connections with respect to system memory MultiprocessorMultiprocessor MulticomputersMulticomputers

Multiprocessor Systems MIMD System Architecture Symmetric multiprocessor (SMP): A computer system that has two or more processor with comparable capabilitiesA computer system that has two or more processor with comparable capabilities Four different types:Four different types: - Uniform memory access (UMA) - Nonuniform memory access (NUMA) - Cache coherent NUMA (CC-NUMA) - Cache only memory access (COMA)

Multiprocessor Systems MIMD System Architecture Uniform memory access (UMA): Gives all CPUs equal (uniform) access to all shared memory locationsGives all CPUs equal (uniform) access to all shared memory locations Each processor may have its own cache memory, not directly accessible by the other processorsEach processor may have its own cache memory, not directly accessible by the other processors Processor 1 Processor 2 Processor n Communications Mechanism Shared Memory

Multiprocessor Systems MIMD System Architecture Nonuniform memory access (NUMA): Dos not allow uniform access to all shared memory locationsDos not allow uniform access to all shared memory locations It still allows all processors to access all shared memory locations, however, each processor can access the memory module closest to it faster than other modulesIt still allows all processors to access all shared memory locations, however, each processor can access the memory module closest to it faster than other modules Processor 1Processor 2 Processor n Communications Mechanism Memory 1Memory 2Memory n

Multiprocessor Systems MIMD System Architecture Cache Coherent NUMA (CC-NUMA): Similar to NUMA except each processor includes cache memorySimilar to NUMA except each processor includes cache memory The cache can buffer data from memory modules that are not local to the processor, which can reduce the access time of the memory transfersThe cache can buffer data from memory modules that are not local to the processor, which can reduce the access time of the memory transfers Creates a problem when two or more caches hold the same piece of dataCreates a problem when two or more caches hold the same piece of data A solution to this problem is Cache only memory access (COMA)A solution to this problem is Cache only memory access (COMA)

Multiprocessor Systems MIMD System Architecture Cache Only Memory Access (COMA): Each processor’s local memory is treated as a cacheEach processor’s local memory is treated as a cache When the processor requests data that is not in its cache (local memory), the system loads that data into local memory as part of the memory operationWhen the processor requests data that is not in its cache (local memory), the system loads that data into local memory as part of the memory operation

Multiprocessor Systems MIMD System Architecture Multicomputer: An MIMD machine in which all processors are not under the control of one operating systemAn MIMD machine in which all processors are not under the control of one operating system Each processor or group of processors is under the control of a different operating system, or a different instantiation of the same operating systemEach processor or group of processors is under the control of a different operating system, or a different instantiation of the same operating system Two different types:Two different types: - Network or cluster of workstations (NOW or COW) - Massively parallel processor (MPP)

Multiprocessor Systems MIMD System Architecture Network of workstation (NOW) or Cluster of workstation (COW): More than a group of workstations on a local area network (LAN)More than a group of workstations on a local area network (LAN) Have a master scheduler, which matches tasks and processors togetherHave a master scheduler, which matches tasks and processors together

Multiprocessor Systems MIMD System Architecture Massively Parallel Processor (MPP): Consists of many self-contained nodes, each having a processor, memory, and hardware for implementing internal communicationsConsists of many self-contained nodes, each having a processor, memory, and hardware for implementing internal communications The processors communicate with each other using shared memoryThe processors communicate with each other using shared memory Example: IBM’s Blue Gene ComputerExample: IBM’s Blue Gene Computer

Thank you! Any Questions???