Processor Types and Instruction Sets CS 147 Presentation by Koichiro Hongo
Brief Review Of CISC And RISC ➲ Complex Instruction Set Computer(CISC) This processor includes many insturctions and can perform an complex computation. ➲ Reduced Instruction Set Computer(RISC) This prcessor contains a small number of instructions that can execute in one clock cycle.
RISC Design And The Execution Pipeline ➲ Risc processors contain parallel hardware units that each perform Fetch the next instruction, Examine opcode, Fetch operands, Perform operation, and Store results step by step. The hardware is arranged in a multiple pipeline. ➲ Although a RISC processor cannot perform all the steps of the fetch-execute cycle in a single clock cycle, an instruction pipeline with parallel hardware provides approximately same performance. Fetch Instruction Perform Operation Examine Opcode Fetch Operands Store Result Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
How Instruction Pass Through a Five Stage Pipeline Clockstage 1stage 2stage 3stage 4stage inst inst. 2 inst inst. 3 inst. 2 inst inst. 4 inst. 3 inst. 2 inst inst. 5 inst. 4 inst. 3 inst. 2 inst. 1 6 inst. 6 inst. 5 inst. 4 inst. 3 inst. 2 7 inst. 7 inst. 6 inst. 5 inst. 4 inst. 3 8 inst. 8 inst. 7 inst. 6 inst. 5 inst. 4 Time Instruction passing through a five-stage pipeline. Once the pipeline is filled, each stage is busy on each clock cycle.
Pipelines And Instruction Stalls ● To understand the effect of programming choices on a pipeline, consider a program that contains two successive instructions that perform an addition and subtraction on operands and results located in resistered A, B, C, D, and E. Instruction K:C <-add A B Instruction K + 1D <-subtract E C ● In this situation, the hardware must wait for instruction K to finish before fetching the operands for insturction K + 1. This is a stage of the pipeline stalls to wait for the operand to become available.
What Happens During a Pipeline Stall Clockstage 1stage 2stage 3stage 4stage inst. Kinst. K-1inst. K-2inst. K-3inst. K-4 2inst. K+1inst. Kinst. K-1inst. K-2inst. K-3 3inst. K+2inst. K+1inst. Kinst. K-1inst. K-2 4inst. K+3inst. K+2(inst. K+1)inst. Kinst. K (inst. K+1) -inst. K 6 - -inst. K inst. K+4inst. K+3inst. K+2inst. K+1 - 8inst. K+5inst. K+4inst. K+3inst. K+2inst. K+1 Time Illustration of a pipeline stall. Instruction K + 1 cannot proceed until operand C becomes available.
Other Causes Of Pipeline Stalls ➲ Accesses external storage ➲ Invoking a coprocessor ➲ Branches to a new location ➲ Calls a subroutine
Consequences For Programmers ● To archieve maximum speed, a program must be written to accommodate an instruction pipeline. For example, instead of referencing a result register immediately in the following instruction, the reference can be delayed. (a) C <- add A B D <- subtract E C F <- add G H J <- subtract I F M <- add K L P <- subtract M N (b) C <- add A B F <- add G H M <- add K L D <- subtract E C J <- subtract I F P <- subtract M N (a) A list of instructions, and (b) the instructions reordered to run faster. Reducing pipeline stalls increases speed.
Programming, Stalls, And No-Op Instructions ➲ Insert a comment that explains the reason for stall. ➲ Insert extra no-op instructions in the code to show where instruction can be inserted to fill the pipeline.(no-op instruction does absolutely nothing except occupy time.) Techniques that programmers document a stall.
Forwarding ● Some hardware units are designed to detect and avoid stalls automatically. In particular, an ALU can use a technique known as forwarding to solve the problem of successive arithmetic instructions passing results. Instruction K:C <-add A B Instruction K + 1D <-subtract E C ● In this situation, the hardware that implements forwarding can avoid the stall by arranging for the hardware to detect the dependency and automatically pass the value for C from instruction K directly to instruction K + 1.
Types Of Operations ➲ Arithmetic instructions(integer arithmetic) ➲ Logical instructions(also called Boolean) ➲ Data access and transfer instructions ➲ Conditional and unconditional branch instructions ➲ Floating point instructions ➲ Processor control instructions
An Arithmetic Instruciton Set Instruction Meaning addinteger addition subtractinteger subtraction add immediateinteger addition(register + constant) add unsignedunsigned integer addition subtract unsignedunsigned integer subtraction add immediate unsignedunsigned addition with a constant move from coprocessoraccess coprocessor register multiplyinteger multiplication multiply unsignedunsigned integer multiplication divideinteger division divide unsignedunsigned integer division move from Hiaccess high-order resister move from Loaccess low-order resister
Logical(Boolean) And Data Transfer Instruction Set Instruction Meaning andlogical and(two resisters) orlogical or(two resisters) and immediateand of resister and constant or immediateor of resister and constant shift left logicalshift resister left N bits shift right logicalshift resister right N bits Data Tranfer load wordload resister from memory store wordstore resister into memory load upper immediateplace constant in upper sixteen bits of resister move from coporoc. resisterobtain a value from a coprocessor
Conditional And Unconditional Branch Instruciton Set Instruction Meaning branch equalbranch if two resisters equal branch not equalbranch if two resisters unequal set on less thancompare two resisters set less than immediatecompare resisters and constant set less than unsignedcompare unsigned resisters set less than immediatecompare unsigned resister and constant Unconditional Branch jumpgo to target address jump resistergo to address in resister jump and linkprocedure call
Program Counter, Fetch-Execute, And Branching ● Program Counter The special-purpose internal resister that the processor uses to implement the fetch-execute cycle. It contains the address of the next instruction. After an instruction has been fetched, the program counter is updated to the address of the next instruction. ● Branch instructions There are two kinds of branch instrucitons: absolute and relative. A typical abosolute branch instruction is jump. This instruction loads the address given by the operand into internal register. Then, this address are copied to the program counter when this instruction fetched. Unlike an absolute branch instruction, a relative branch isntruction does not specify an exact memory address. For example, the instruction: br + 8 means that the next instruction is 8 bytes beyond the current address.
Subroutine Calls And Arguments ● Subroutine Calls and Arguments The typical subroutine call instruciton is jsr. It operates like a branch operation. However, after subroutine executed, the processor returns to the next address of jsr instruction, then the fetch-execute cycle resumes at the instruction immediately following the jsr. If the subroutine calls have arguments, there are some ways to pass them to subroutines: using memory, or using general- purpose register or special-purpose register. If an architecture uses register to pass arguments, general registers are typically used.
Resister Windows An optimization for argument passing that some modern processors use is known as a resister window. In this case, arguments are handled as a subset of resisters known as window. The window moves automatically each time a subroutine is invoked, and moves back when the subroutine returns. Also, the windows available to a program and subroutine overlap – some of the resisters visible to the caller and visible to the subroutine. X1X1 X2X2 X3X3 X4X4 ABCDI1I1 I2I2 I3I3 I4I4 Resister 0~7 when program runs Current resister 0~7 when subroutines runs
The Principle of Orthogonality The principle of orthogonality specifies that each instruction should perform a unique task without duplicating or overlapping the functionality of other instructions. Orthogonality is so important that it has become a general principle of processor design.
Condition Codes And Conditional Branching A conditional Branch instruction that follows the arithmetic operation can test one or more of the condition code bits, and use the result to determine whereher to branch. On many processors, each instruction produces a status, which the processor stores in an internal hardware mechanism. This status that contains bits to record whether the result is positive, negative, zero, or an arithmetic overflow occured is used as condition codes. Cmpr4, r5#compare regs. 4 & 5, and set condition code be lab1#branch to lab1 if condition code specifies equal mov r3, 0#place a zero in register 3 lab1:...program continues at this point An Example of using a condition code. An ALU operation sets the condition code, and a later conditional branch tests the condition code.