PIPELINE AND VECTOR PROCESSING CHAPTER # 9 PIPELINE AND VECTOR PROCESSING
CONTENTS Parallel Processing Pipelining Arithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing Array Processors
Figure 9-1 Processor with multiple functional units Adder-sub tractor Integer multiply Logic unit Shift unit Processor register Incrementer To memory Floating-point Add-subtract Floating-point multiply Floating-point divide
Instruction and stream. Single instruction stream, single data stream (SISD). Single instruction stream, multiple data stream (SIMD). Multiple instruction stream, single data stream (MISD). Multiple instruction stream, multiple data stream (MIMD).
Figure 9-2 Example of Pipelining. Ai Bi Ci R1 Ai , R2 Bi Input Ai and Bi R3 R1 * R2, R4 Ci Multiply and input Ci R5 R3 + R4 Add Ci to product R1 R2 Multiplier R3 R4 Adder R5
1 A1 B1 ---- ---- ---- Content of registers in pipeline example. Table 9-1 Content of registers in pipeline example. Clock Pulse number Segment1 R1 R2 Segment2 R3 R4 Segment3 R5 1 A1 B1 ---- ---- ---- 2 A2 B2 A1*B1 C1 ---- 3 A3 B3 A2*B2 C2 A1*B1+C1 4 A4 B4 A3*B3 C3 A2*B2+C2 5 A5 B5 A4*B4 C4 A3*B3+C3 6 A6 B6 A5*B5 C5 A4*B4+C4 7 A7 B7 A6*B6 C6 A5*B5+C5 8 ---- ---- A7*B7 C7 A6*B6+C6 9 ---- ---- ---- ---- A7*B7+C7
Figure 9-3 Four segment pipeline. Clock Input S1 R1 S2 R2 S3 R3 S4 R4
Figure 9-4 Space-time diagram for pipeline. Clock cycle 1 2 3 4 5 6 7 8 9 T1 T2 T3 T4 T5 T6 Segment: 1 2 3 4
Figure 9-5 Multiple functional units in parallel. Ii+3 P3 Ii+2 P2 Ii+1 P1 Ii
Add or subtract the mantissas. Normalize the result. Arithmetic Pipeline Compare the exponents. Align the mantissas. Add or subtract the mantissas. Normalize the result.
Exponents Mantissas a b A B R Difference Figure 9-6 Pipeline for floating-point and subtraction. Exponents Mantissas a b A B Segment 1 Segment 2 Segment 3 Segment 4 R Compare Exponent By subtraction Choose exponent Adjust Align mantissas Add or subtract mantissas Normalize result Difference
Instruction Pipeline Fetch the instruction from memory. Decode the instruction. Calculate the effective address. Fetch the operands from memory. Execute the instruction. Store the result in the proper place.
Figure 9-7 Four-segment CPU pipeline. Decode instruction And calculate Effective address Fetch instruction from memory Branch? Fetch operand From memory Execute instruction Interrupt? Interrupt handling Update PC Empty pipe yes no
Segments and their purpose. FI is the segment that fetches an instruction. DA is the segment that decodes the instruction and calculate the effective address. FO is the segment that fetches the operand. EX is the segment that executes the instruction.
Figure 9-8 Timing of instruction pipeline. Step: 1 2 3 4 5 6 7 8 9 10 11 12 13 Instruction: 1 FI DA FO EX 2 FI DA FO EX (Branch) 3 FI DA FO EX 4 FI -- -- FI DA FO EX 5 -- -- -- FI DA FO EX 6 FI DA FO EX 7 FI DA FO EX
Pipeline Conflicts Resource conflicts Data dependency conflicts Branch difficulties conflicts
Three-segment instruction pipeline I: Instruction fetch A: ALU operation E: Execute instruction
Delayed Load LOAD R1 M[address 1] LOAD R2 M[address 2] ADD R3 R1+R2 STORE M[address 3] R3
Figure 9-9 Three segment pipeline timing. 6 5 4 3 2 1 I Clock cycles A E 1. Load R1 2. Load R2 3. Add R1+R2 4. Store R3 Pipeline timing with data conflict 7 3. No-operation 4. Add R1+R2 5. Store R3 Pipeline timing with delayed load E
Figure 9-10 Examples of delayed branch. Clock cycles A E 1. Load 2. Increment 3. Add 4. Subtract 10 9 8 7 6 5 4 3 2 1 5. Branch to X 6. NO-operation 7. NO-operation 8. Instruction in X Using no-operation instructions
Figure 9-10 Examples of delayed branch. 2 3 4 5 6 7 8 Clock cycles I A E 1. Load 2. Increment I A E 3. Branch to X I A E 4. Add I A E 5. Subtract I A E 6. Instruction in X I A E Rearranging instruction
Application of Vector Processing Long range weather forecasting. Petroleum explorations. Seismic data analysis. Medical diagnosis. Aerodynamics and space flight simulations.
Figure 9-11 Instruction format for vector processor Operation code Base address Source 1 Base address Source 2 Base address destination Vector length
Figure 9-12 Pipeline for calculating an inner product Source A B Multiplier pipeline Adder
Figure 9-13 Multiple module memory organization AR DR Memory array Address bus Data bus
Types of Array Processors Attached Array Processor SIMD Array Processor
Figure 9-14 Attached Array Processor with host computer General-Purpose computer input-output interface Attached array processor Local memory Main memory High-speed memory to Memory bus
Figure 9-15 SIMD array processor organization Master control unit Main memory PE1 PE2 PE3 PEn M1 M2 M3 Mn