Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth about implementation and more advanced topics
Outline Instruction set characteristics Review of the CPU and actions taken to fetch, decode, and execute a sample instruction Review of the fundamental idea of pipelining Fundamentals of RISC architectures
Outline - continued Pipelining with RISC Five-stage pipeline for a RISC processor Hazards and pipeline stalls CPI definition and calculation Structural hazards Data hazards
Instruction Set Architecture Classes Consider the next diagram (figure 2.1) and the table figure 2.2 on page 93 to examine the differences between the four classes of instruction sets. Historically stack and accumulator style were popular architectures, after 1980 most have used load-store (the right-most in both diagrams).
General-purpose Register Computers GPR computers have two advantages: First, registers are faster than memory Second, registers are a more efficient way for compilers than other forms of internal storage Registers can be used to hold variables This reduces memory traffic vice the alternative of having memory hold variables
DSP Comment DSPs can differ. The remark in the first line of page 94 is worth noting: because of the dominance of hand-optimized code in the DSP community the DSPs may have many special-purpose registers and only a few general-purpose registers.
Two Major Instruction Set Characteristics ALU instructions have two or three operands How many of the operands may be memory addresses in ALU instructions Consider carefully the two tables on page 94 and 95. Especially note the advantages and disadvantages.
The CPU Review the next two slides that depict a simple CPU and an associated slide that shows one instruction broken down into the different machine actions
Remarks The MAR and MDR components can operate independently from the ALU and IR decoder since they deal with memory. The instruction or IR decoding section of the graph is quite independent from the rest of the diagram (it doesn’t require the bus for example).
Remarks - continued In effect three parts of this diagram operate somewhat independently: the ALU, the IR decoder and the MAR/MDR which sends data to and from the memory. Pipelining builds on this independence: different instructions can be using the IR decode, the ALU and the memory transfer functions somewhat independently.
Remarks The first three steps mainly fetch the next instruction (1 PC out through 3 IR in ). Steps 4-6 are mainly involved in executing the actual adding part of the instruction (6 … Add) Step 7 is involved in writing the output of the Add instruction The decode part of the operation is more implicit.
Pipelining Review This is from the CS 4014 course and represents a four-stage pipeline vice the five- state RISC pipeline that we will study later. The difference is that this diagram doesn’t have a mem stage – it being combined in other stages
Pipelining Review Formally pipelining is an implementation technique takes advantage of the parallelism that exists among the actions needed to execute an instruction. Informally pipelining is like an assembly line.
Remarks Refer to the earlier CPU diagram and the example of a sample instruction and the actual CPU operations. Note that the fetch, decode, execute and write parts of instructions could (in principle) be separated and make use of independent parts of the CPU.
Appendix A A1 and A2 are the only sections that we will study for now. The time required to move an instruction one step down the pipeline is a processor cycle. If the stages are perfectly balanced then the average time per instruction in a pipelined processor is: time per instruction / number of stages
Remark Depending on how you look at it the reduction due to pipelining can be viewed either as decreasing the CPI or as decreasing the clock cycle time or both.
RISC Instruction Set Implementation in a Pipeline 5 stages IF – instruction fetch ID – instruction decode Ex – execution Mem – memory access WB – write-back (result to register) Look over figure A.1 on page A-7 and the next diagram.
Pipeline = series of data paths shifted in time IM – instruction memory, DM – data memory
Remarks Note usage of pipeline registers discussed on page A-9 and the fact that they are edge- triggered so values change instantaneously on a clock edge.
Performance Issues in Pipelining Go through the example on pages A-10, 11 Stalls and performance are handled with calculations like those found on page A-12 Although there are different ways of defining performance and recalling our in-class discussion last week: note the second paragraph remark: The ideal CPI on a pipelined processor is almost always 1
Pipeline Hazards Structural – resource conflicts when the hardware cannot support all combinations of instructions Data – instruction depends on the result of a previous instruction (time dependent – can’t read the result until it is computed and stored by another instruction) Control – caused by branchs
Hazards cause Stalls The pipeline must be held up temporarily or stalled when a hazard occurs. As a result, no new instructions are fetched during the stall (study why in the last paragraph of page A-11).
A processor with only one memory port will generate a conflict. The load instruction needs the memory port in CC 4 at the same time as instruction 3 needs the memory port.
Remark on figure A.4 Look over figure A.5 on page A-15 where the conflict is settled by introducing a stall for instruction i+3.
Data Hazards The pipeline changes the order of read/write accesses to operands so that the order differs from the order seen by sequential processing.
Example Example: two sequential statements: R1 = R2 + R3; R5 = R4 – R1; Now consider the pipelining of these two statements, see page A-17 and figure A.5 (also the next diagram). In pipelining R1’s value from the first instruction may not be updated before it is needed in the second instruction.
Branch/Control Hazards Look over the examples in figures A.11 through A.14.