CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.

CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational speed of the computer A parallel processing system is able to process multiple tasks simultaneously

CSCI 232© 2005 JW Ryder2 Parallel Processing Instruction in ALU, next instr. read from memory 2 or more ALUs, 2 or more processors Speed up, throughput - amount of processing that can be done in a given amount of time Amount of hardware increases, cost increases, complexity increases

CSCI 232© 2005 JW Ryder3 Parallel Processing Viewed at various levels of complexity Lowest - distinguish between serial and parallel load registers Higher level - Multiple functional units (FU) –Arithmetic Adder-subtractor, Integer multiplier –Logic Logic unit, Incrementer, Shifter –Floating point add-subtract, multiply, divide

CSCI 232© 2005 JW Ryder4 Parallel Processing Classification Internal organization of processors Interconnection structure between processors Flow of information through the system Organization of computer system by number of instructions and data items that are manipulated simultaneously

CSCI 232© 2005 JW Ryder5 Normal operation of computer is fetch from memory then execute in processor Sequence of instructions read from memory is instruction stream Operations performed on the data in the processor is data stream Parallel processing may occur in the instruction stream, data stream, or both Classifications

CSCI 232© 2005 JW Ryder7 Single computer containing a –Control Unit –Processing Unit –Memory Unit Instructions executed sequentially System may or may not have internal parallel processing capabilities –Multiple FUs or pipelining SISD

CSCI 232© 2005 JW Ryder8 Organization including many processing units under supervision of a common control unit All processors receive the same instruction from the control unit Operate on different items of data Shared memory unit must contain multiple modules so that it can communicate with all processors simultaneously Array processor SIMD

CSCI 232© 2005 JW Ryder10 Computer system capable of processing several programs at the same time Most multiprocessor and multi- computer systems are in this category Flynn’s classification depends on distinction between the performance of the control unit and the data processing unit Emphasizes behavioral characteristics of the computer system rather than its operational structures and interconnections MIMD

CSCI 232© 2005 JW Ryder12 Multiprocessor system is an interconnection of 2 or more CPUs with memory and input-output equipment ‘Processor’ in multiprocessor can mean either a central processing unit (CPU) or an input-output processor (IOP) System with single CPU and multiple IOPs is not considered (usually) a multiprocessor Multiprocessors

CSCI 232© 2005 JW Ryder13 Both support concurrent operations Computers are interconnected with each other by means of communications lines to form a computer network –Consists of several autonomous computers that may or may not communicate with each other Multiprocessor system controlled by one operating system that provides interaction between processors and all components in the system cooperate to solve the problem at hand Multiprocessors / Multicomputers

CSCI 232© 2005 JW Ryder14 Multiprocessors Microprocessors major motivation - cheap, small VLSI helps make it possible too Improves reliability –mutual funds, some loss of efficiency Benefits –Improved system performance –Computations can proceed in parallel in 2 ways Multiple independent jobs run in parallel Single job can be partitioned into multiple parallel tasks

CSCI 232© 2005 JW Ryder15 Multiprocessors Overall functions can be partitioned into several tasks System tasks can be allocated to specialized processors –Designed for optimal performance –Example: One processor performs standard tasks for an industrial process and others sense and control various parameters such as temperature and flow rate –Example: One processor takes care of high speed floating point operations while other processes standard operations and tasks

CSCI 232© 2005 JW Ryder16 Performance Improvement Decompose problem into multiple discrete tasks User can explicitly direct computer to split tasks Provide a compiler that automatically detects when parts of program can be split –Parallelizing compiler Multiprocessors classified by way memory is organized

CSCI 232© 2005 JW Ryder17 Tightly Coupled A multiprocessor system with common shared memory –Shared memory or Tightly coupled multiprocessor Does not preclude each processor from having own local memory Most commercial tightly coupled systems provide cache memory for each CPU In addition, global common memory provided that all CPUs can access

CSCI 232© 2005 JW Ryder18 Loosely Coupled Distributed memory = Loosely coupled Each processing element (PE) is a loosely coupled system has its own local memory Processors tied together by switching scheme designed to route information between processors through a message passing scheme Programs and data relayed in packets consisting of address, data, error detection codes

CSCI 232© 2005 JW Ryder19 Loosely Coupled Packets either destined for a specific processor or grabbed by first processor that finds it depending on communication system design Most efficient when interaction between tasks is minimal Tightly coupled tasks can tolerate higher degree of interaction between tasks

CSCI 232© 2005 JW Ryder20 Interconnection Structures Components forming a multiprocessor are –CPUs –IOPs –A memory unit (may be partitioned into separate modules) Interconnections can have different physical configurations –Depending on number transfer paths available between processors and memory in shared memory system –Depending on number of transfer paths among PEs in a loosely coupled system

CSCI 232© 2005 JW Ryder22 Time-Shared Common Bus N processors connected through a common bus to a memory unit Only 1 processor can have access (communicate with) the memory unit or another processor at a time Transfer operations conducted by processor that is in control of the bus Other processors must wait, checking availability Command issued to inform destination that communication is requested –What operation, from where Destination responds and transfer begins

CSCI 232© 2005 JW Ryder23 Common Bus Bus Contention Resolved by including a bus controller –Priorities Restricted to a single transfer at a time –When one processor transferring to/from memory other processors are either busy with internal processing or idle waiting System overall transfer rate is limited by speed of bus Multiple buses possible but you pay penalty ($$)

CSCI 232© 2005 JW Ryder25 Multiported Memory Separate buses between each memory module (MM) and processor Each processor bus connected to each MM Processor bus consists of –Address –Data –Control lines MM has 4 ports, 1 for each bus

CSCI 232© 2005 JW Ryder26 Multiported Memory MM must have internal logic to determine which bus has control Fixed priorities assigned to each memory port (1,2,3,4) Advantage: High transfer rate Disadvantage: –Expensive memory control logic –Many cables and connectors Usually only appropriate for small number of processors

CSCI 232© 2005 JW Ryder27 Crossbar Switch Crosspoints placed at intersections of processor buses and memory buses See figure 13-4 on page 495 Each switch determines path (control logic) –Examines address on bus –Resolves conflicts on predetermined, hardcoded definition See figure 13-5 on page 495 –Data both directions –Multiplexers select data (remember select lines??)

CSCI 232© 2005 JW Ryder29 Multistage Switching Network Basic Component is a 2-input 2- output interchange switch See figure 13-6 on page 496 - explain Switch can arbitrate between conflicts Can use to build a switching network See figure 13-7 on page 497 - explain

CSCI 232© 2005 JW Ryder30 Patterns & Omega Not all patterns are always available to all processors P 1 accessing 0xx then P 2 can only access 1xx Used in both tightly and loosely coupled systems Omega Switching Network - see figure 13-8 on page 498 –Exactly 1 path from each source to each MM –Some patterns cannot be connected simultaneously (000 and 001) 1 switch 1 signal at a time

CSCI 232© 2005 JW Ryder32 Hypercube Hypercube or binary n-cube Loosely coupled system composed of N = 2 n processors interconnected in an n-dimensional binary cube Each node contains CPU, local memory, I/O interfaces Direct communications paths to n other nodes (1 hop) There are 2 n distinct n-bit binary addresses to be assigned to the processors Each neighboring processor address differs by exactly 1 bit position See figure 13-9 on page 499

CSCI 232© 2005 JW Ryder33 Will take from 1 to n hops (max source to destination) Routing procedure –XOR Source and Destination addresses Result will show on which axes addresses differ –Send along any indicated axis –Repeat until arrival at destination Routing Messages

CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.

Similar presentations

Presentation on theme: "CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.

Similar presentations

Presentation on theme: "CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational."— Presentation transcript:

Similar presentations

About project

Feedback