1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design
2 Three basic ways: Random logic approach – Design a unique logic system to implement the instructions using gates and flip-flops, using techniques described in Part 1 of the course. Microprogrammed approach – Also touched upon in Part 1. Each step in the state diagram encoded into a binary pattern called a microinstruction. Creates microprogram that is held in a control memory. Step through the microinstructions of microprogram generating logic signals to effect the register transfers. Only used today for very complicated instructions. Pipeline design – Each step, or group of steps, implemented by one unit (stage). Units linked together in a pipeline. Most common approach as it leads to concurrent high speed operation. We will only consider this way.
3 Pipelined Processor Design The operation of the processor are divided into a number of sequential actions, e.g.: 1.Fetch instruction. 2.Fetch operands. 3.Execute operation. 4.Store results or more steps. Each step is performed by a separate unit (stage). Each action is performed by a separate logic unit which are linked together in a “pipeline.”
4 Processor Pipeline Space-Time Diagram
5 Pipeline Staging Latches Usually, pipelines designed using latches (registers) between units (stages) to hold the information being transferred from one stage to next. Transfer occurs in synchronism with a clock signal:
6 Processing time Time to process s instructions using a pipeline with p stages = p + s - 1 cycles
7 Note: This does not take into account the extra time due to the latches in the pipeline version
8 Dividing Processor Actions The operation of the processor can be divided into: Fetch Cycle Execute Cycle
9 Two Stage Fetch/Execute Pipeline
10 A Two-Stage Pipeline Design
11 Fetch/decode/execute pipeline Relevant for complex instruction formats Recognizes instruction - separates operation and operand addresses
12 Try to have each stage require the same time otherwise pipeline will have to operate at the time of the slowest stage. Usually have more stages to equalize times. Let’s start at four stages: Four-Stage Pipeline IF OS EX OF Space-Time Diagram
13 Four-stage Pipeline “Instruction-Time Diagram” An alternative diagram: This form of diagram used later to show pipeline dependencies.
14 Information Transfer in Four-Stage Pipeline Register file Memory Instruction Address PC OF IF EX OS Contents Register #’s Latch Clock ALU
15 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R1 Add R3 R2 R1 After instruction fetched: Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers. PC = PC+4
16 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R After operands fetched: Add R3 V2 V1 V1 is contents of R1, V2 is contents of R2
17 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R After execution (addition): Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers. Add --- V2 V1 R3 Result ---
18 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R After result stored: Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers R3, result
19 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R1 Overall: Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers. R3, result Add R3 R2 R1 PC = PC+4 Add R3 V2 V1 Add V2 V1 R3 Result
20 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, 123 Add R3 R2 123 After instruction fetched: Note: where R3 and R2 mentioned in the latch, it actually holds just register numbers. PC = PC+4
21 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, After operands fetched: Add R3 V2 123 V2 is contents of R2
22 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, After execution (addition): Note: where R3 and R2 mentioned in the latch, it actually holds just register numbers. Add --- V2 123 R3 Result ---
23 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, After result stored: Note: where R3 and R2 mentioned in the latch, it actually holds just register numbers R3, result
24 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions (Immediate addressing) ADD R3, R2, 123 Add R3 R2 123 Overall: R3 Result V2 R3, result R2 Add 123 V2 is contents of R2
25 Branch Instructions A couple of issues to deal with here: 1.Number of steps needed. 1.Dealing with program counter incrementing after each instruction fetch.
26 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 Bcond R1 R2 Offset After instruction fetched: R2 Offset to L1 held in instruction R1 Test +
27 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 Offset After operands fetched: V1 V2 Bcond V1 is contents of R1, V2 is contents of R2 Offset to L1 held in instruction Offset V1 V2 Test +
28 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 After execution (addition): V1 Result V2 Bcond V1 is contents of R1 Offset to L1 held in instruction Test + Offset
29 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 After result stored: Result Result (TRUE/FALSE) V1 is contents of R1 Offset to L1 held in instruction Test + Offset If TRUE add offset to PC else do nothing
30 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 Bcond R1 R2 Offset Overall: V1 Result V2 Result (TRUE/FALSE) R2 Bcond V1 is contents of R1 Offset to L1 held in instruction Offset R1 V1 V2 Test + Offset If TRUE add offset to PC else do nothing
31 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Simpler Branch Instructions Bcond R1, L1 Bcond R1 Offset Overall: V1 Result Result (TRUE/FALSE) Bcond V1 is contents of R1 Offset R1 V1 Test + Offset If TRUE add offset to PC else do nothing Tests R1 against zero
32 Dealing with program counter incrementing after each instruction fetch Previous design will need to taking into account that by the time the branch instruction is in the execute unit, the program counter will have been incremented three times. Solutions: 1.Modify the offset value in the instruction (subtract 12). 2. Modify the arithmetic operation to be PC + offset – Feed the program counter value through the pipeline. (This is the best way as it takes into account any pipeline length. Done in the Patterson-Hennessy architecture book)
33 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 Bcond R1 Offset After instruction fetched: V1 is contents of R1 R1 V1 Test Tests R1 against zero PC Add
34 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 After operand fetched: V1 Bcond V1 is contents of R1 Offset R1 V1 Test Tests R1 against zero PC Add
35 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 After branch computed: Result V1 is contents of R1 R1 V1 Test Tests R1 against zero New PC value Add
36 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 After PC updated: Result (TRUE/FALSE) V1 is contents of R1 R1 V1 Test New PC value If TRUE update PC else do nothing Tests R1 against zero Add
37 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 Bcond R1 Offset Overall: V1 Result Result (TRUE/FALSE) Bcond V1 is contents of R1 Offset R1 V1 Test New PC value If TRUE update PC else do nothing Tests R1 against zero PC New PC value Add
38 Load and Store Instructions Need at least one extra stage to handle memory accesses. Early RISC processor arrangement was to place memory stage (MEM) between EX and OS as below. Now a five-stage pipeline. LD R1, 100[R2]
39 ST 100[R2], R1 Note: Convenient to have separate instruction and data memories connecting to processor pipeline - usually separate cache memories, see later.
40 Usage of Stages Uses IF twice
41 Number of Pipeline Stages As the number of stages is increased, one would expect the time for each stage to decrease, i.e. the clock period to decrease and the speed to increase. However one must take into account the pipeline latch delay. 5-stage pipeline represents an early RISC design - “underpipelined” Most recent processors have more stages.
42 Optimum Number of Pipeline Stages* Suppose one homogeneous unit doing everything takes T s time units. With p pipeline stages with equally distributed work, each stage takes T/p. Let t L = time for latch to operate. Then: Execution time T ex = (p + s - 1) (T s /p + t L ) * Adapted from “Computer Architecture and Implementation” by H. G. Cragon, Cambridge University Press, Typical results (T s = 128, T L =2) In practice, there are a lot more factors involved, see later for some.
43 Questions