CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Chapter Thirteen Multiprocessors.
FIU Chapter 7: Input/Output Jerome Crooks Panyawat Chiamprasert
Multiple Processor Systems
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.

TECH CH03 System Buses Computer Components Computer Function
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
GCSE Computing - The CPU
Input/Output and Communication
Introduction to Parallel Processing Ch. 12, Pg
Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.
CS-334: Computer Architecture
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Top Level View of Computer Function and Interconnection.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Expected Course Outcome #Course OutcomeCoverage 1Explain the concepts that underlie modern.
Computer Architecture Lecture 2 System Buses. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given.
EEE440 Computer Architecture
Computer Architecture And Organization UNIT-II General System Architecture.
Computer System Architecture Dept. of Info. Of Computer. Chap. 13 Multiprocessors 13-1 Chap. 13 Multiprocessors n 13-1 Characteristics of Multiprocessors.
ECEG-3202 Computer Architecture and Organization Chapter 3 Top Level View of Computer Function and Interconnection.
Chapter 4 MARIE: An Introduction to a Simple Computer.
L/O/G/O Input Output Chapter 4 CS.216 Computer Architecture and Organization.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Lec 6 Chap. 13Multiprocessors
By Fernan Naderzad.  Today we’ll go over: Von Neumann Architecture, Hardware and Software Approaches, Computer Functions, Interrupts, and Buses.
Outline Why this subject? What is High Performance Computing?
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
Module 3 Distributed Multiprocessor Architectures.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Computer Architecture. Top level of Computer A top level of computer consists of CPU, memory, an I/O components, with one or more modules of each type.
GCSE Computing - The CPU
Overview Parallel Processing Pipelining
Distributed Processors
CHAPTER 4 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Overview Parallel Processing Pipelining
Chapter 3 Top Level View of Computer Function and Interconnection
Parallel and Multiprocessor Architectures
Pipelining and Vector Processing
Multiprocessor Introduction and Characteristics of Multiprocessor
Chapter 17 Parallel Processing
INTERCONNECTION NETWORKS
Overview Parallel Processing Pipelining
Chap. 9 Pipeline and Vector Processing
AN INTRODUCTION ON PARALLEL PROCESSING
Advanced Computer and Parallel Processing
GCSE Computing - The CPU
Presentation transcript:

CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational speed of the computer A parallel processing system is able to process multiple tasks simultaneously

CSCI 232© 2005 JW Ryder2 Parallel Processing Instruction in ALU, next instr. read from memory 2 or more ALUs, 2 or more processors Speed up, throughput - amount of processing that can be done in a given amount of time Amount of hardware increases, cost increases, complexity increases

CSCI 232© 2005 JW Ryder3 Parallel Processing Viewed at various levels of complexity Lowest - distinguish between serial and parallel load registers Higher level - Multiple functional units (FU) –Arithmetic Adder-subtractor, Integer multiplier –Logic Logic unit, Incrementer, Shifter –Floating point add-subtract, multiply, divide

CSCI 232© 2005 JW Ryder4 Parallel Processing Classification Internal organization of processors Interconnection structure between processors Flow of information through the system Organization of computer system by number of instructions and data items that are manipulated simultaneously

CSCI 232© 2005 JW Ryder5 Normal operation of computer is fetch from memory then execute in processor Sequence of instructions read from memory is instruction stream Operations performed on the data in the processor is data stream Parallel processing may occur in the instruction stream, data stream, or both Classifications

CSCI 232© 2005 JW Ryder6 SISD - Single Instruction Single Data SIMD - Single Instruction Multiple Data MISD - Multiple Instruction Single Data MIMD - Multiple Instruction Multiple Data 4 Major Groups

CSCI 232© 2005 JW Ryder7 Single computer containing a –Control Unit –Processing Unit –Memory Unit Instructions executed sequentially System may or may not have internal parallel processing capabilities –Multiple FUs or pipelining SISD

CSCI 232© 2005 JW Ryder8 Organization including many processing units under supervision of a common control unit All processors receive the same instruction from the control unit Operate on different items of data Shared memory unit must contain multiple modules so that it can communicate with all processors simultaneously Array processor SIMD

CSCI 232© 2005 JW Ryder9 Only of theoretical interest MISD

CSCI 232© 2005 JW Ryder10 Computer system capable of processing several programs at the same time Most multiprocessor and multi- computer systems are in this category Flynn’s classification depends on distinction between the performance of the control unit and the data processing unit Emphasizes behavioral characteristics of the computer system rather than its operational structures and interconnections MIMD

CSCI 232© 2005 JW Ryder11 Pipelining does not fit into Flynn’s parallel processing classification scheme Only 2 categories used are SIMD, MIMD Pipelining

CSCI 232© 2005 JW Ryder12 Multiprocessor system is an interconnection of 2 or more CPUs with memory and input-output equipment ‘Processor’ in multiprocessor can mean either a central processing unit (CPU) or an input-output processor (IOP) System with single CPU and multiple IOPs is not considered (usually) a multiprocessor Multiprocessors

CSCI 232© 2005 JW Ryder13 Both support concurrent operations Computers are interconnected with each other by means of communications lines to form a computer network –Consists of several autonomous computers that may or may not communicate with each other Multiprocessor system controlled by one operating system that provides interaction between processors and all components in the system cooperate to solve the problem at hand Multiprocessors / Multicomputers

CSCI 232© 2005 JW Ryder14 Multiprocessors Microprocessors major motivation - cheap, small VLSI helps make it possible too Improves reliability –mutual funds, some loss of efficiency Benefits –Improved system performance –Computations can proceed in parallel in 2 ways Multiple independent jobs run in parallel Single job can be partitioned into multiple parallel tasks

CSCI 232© 2005 JW Ryder15 Multiprocessors Overall functions can be partitioned into several tasks System tasks can be allocated to specialized processors –Designed for optimal performance –Example: One processor performs standard tasks for an industrial process and others sense and control various parameters such as temperature and flow rate –Example: One processor takes care of high speed floating point operations while other processes standard operations and tasks

CSCI 232© 2005 JW Ryder16 Performance Improvement Decompose problem into multiple discrete tasks User can explicitly direct computer to split tasks Provide a compiler that automatically detects when parts of program can be split –Parallelizing compiler Multiprocessors classified by way memory is organized

CSCI 232© 2005 JW Ryder17 Tightly Coupled A multiprocessor system with common shared memory –Shared memory or Tightly coupled multiprocessor Does not preclude each processor from having own local memory Most commercial tightly coupled systems provide cache memory for each CPU In addition, global common memory provided that all CPUs can access

CSCI 232© 2005 JW Ryder18 Loosely Coupled Distributed memory = Loosely coupled Each processing element (PE) is a loosely coupled system has its own local memory Processors tied together by switching scheme designed to route information between processors through a message passing scheme Programs and data relayed in packets consisting of address, data, error detection codes

CSCI 232© 2005 JW Ryder19 Loosely Coupled Packets either destined for a specific processor or grabbed by first processor that finds it depending on communication system design Most efficient when interaction between tasks is minimal Tightly coupled tasks can tolerate higher degree of interaction between tasks

CSCI 232© 2005 JW Ryder20 Interconnection Structures Components forming a multiprocessor are –CPUs –IOPs –A memory unit (may be partitioned into separate modules) Interconnections can have different physical configurations –Depending on number transfer paths available between processors and memory in shared memory system –Depending on number of transfer paths among PEs in a loosely coupled system

CSCI 232© 2005 JW Ryder21 Physical Forms Time-Shared Common Bus Multiported Memory Crossbar Switch Multistage Switching Network Hypercube System

CSCI 232© 2005 JW Ryder22 Time-Shared Common Bus N processors connected through a common bus to a memory unit Only 1 processor can have access (communicate with) the memory unit or another processor at a time Transfer operations conducted by processor that is in control of the bus Other processors must wait, checking availability Command issued to inform destination that communication is requested –What operation, from where Destination responds and transfer begins

CSCI 232© 2005 JW Ryder23 Common Bus Bus Contention Resolved by including a bus controller –Priorities Restricted to a single transfer at a time –When one processor transferring to/from memory other processors are either busy with internal processing or idle waiting System overall transfer rate is limited by speed of bus Multiple buses possible but you pay penalty ($$)

CSCI 232© 2005 JW Ryder24 Dual Buses Not more economical Local buses, local memory System bus controller is big coordinator Local memory can be cache memory –Coherency problems possible

CSCI 232© 2005 JW Ryder25 Multiported Memory Separate buses between each memory module (MM) and processor Each processor bus connected to each MM Processor bus consists of –Address –Data –Control lines MM has 4 ports, 1 for each bus

CSCI 232© 2005 JW Ryder26 Multiported Memory MM must have internal logic to determine which bus has control Fixed priorities assigned to each memory port (1,2,3,4) Advantage: High transfer rate Disadvantage: –Expensive memory control logic –Many cables and connectors Usually only appropriate for small number of processors

CSCI 232© 2005 JW Ryder27 Crossbar Switch Crosspoints placed at intersections of processor buses and memory buses See figure 13-4 on page 495 Each switch determines path (control logic) –Examines address on bus –Resolves conflicts on predetermined, hardcoded definition See figure 13-5 on page 495 –Data both directions –Multiplexers select data (remember select lines??)

CSCI 232© 2005 JW Ryder28 Crossbar Switch Supports simultaneous transfers from all MM –Separate path associated with each MM Hardware can be large and complex Number switches needed is Processors x MM

CSCI 232© 2005 JW Ryder29 Multistage Switching Network Basic Component is a 2-input 2- output interchange switch See figure 13-6 on page explain Switch can arbitrate between conflicts Can use to build a switching network See figure 13-7 on page explain

CSCI 232© 2005 JW Ryder30 Patterns & Omega Not all patterns are always available to all processors P 1 accessing 0xx then P 2 can only access 1xx Used in both tightly and loosely coupled systems Omega Switching Network - see figure 13-8 on page 498 –Exactly 1 path from each source to each MM –Some patterns cannot be connected simultaneously (000 and 001) 1 switch 1 signal at a time

CSCI 232© 2005 JW Ryder31 Omega Network Tightly Coupled Systems –Sources - Processorrs –Destinations - MM Loosely Coupled Systems –Source - Processor –Destination - Processor

CSCI 232© 2005 JW Ryder32 Hypercube Hypercube or binary n-cube Loosely coupled system composed of N = 2 n processors interconnected in an n-dimensional binary cube Each node contains CPU, local memory, I/O interfaces Direct communications paths to n other nodes (1 hop) There are 2 n distinct n-bit binary addresses to be assigned to the processors Each neighboring processor address differs by exactly 1 bit position See figure 13-9 on page 499

CSCI 232© 2005 JW Ryder33 Will take from 1 to n hops (max source to destination) Routing procedure –XOR Source and Destination addresses Result will show on which axes addresses differ –Send along any indicated axis –Repeat until arrival at destination Routing Messages