Lecture 4: Instruction Set Design/Pipelining Instruction set design (Sections 2.9-2.12) control instructions instruction encoding Basic pipelining implementation (Section A.1)
Control Transfer Instructions Conditional branches (75% - Int) (82% - FP) Jumps (6% - Int) (10% - FP) Procedure calls/returns (19% - Int) (8% - FP) Design issues: How do you specify the target address? How do you specify the condition? What happens on a procedure call/return?
Specifying the Target Address PC-Relative: needs fewer bits to encode, independent of how/where the compiled code is linked, used for branches and jumps – typically, the displacement needs 4-8 bits Register-indirect jumps: the address is not known at compile-time and has to be computed at run-time (note: can use any other addressing mode too) procedure returns case statements virtual functions function pointers dynamically shared libraries
Specifying the Condition Name Examples How condition is tested Advantages Disadvantages Condition Code (CC) 80x86, ARM, PowerPC, SPARC Tests special bits set by ALU ops Sometimes condition is set for free CC is extra state. Instructions cannot be re-ordered Condition Register Alpha, MIPS Comparison sets register and this is tested Simple Register pressure Compare and branch PA-RISC, VAX Comparison is part of the branch One instruction instead of two Complex pipelines
Procedure Call/Returns Need to maintain a stack of return addresses (in memory or in hardware) Can copy and save all registers together or this can be done selectively Who is responsible for saving registers? Caller saving: correctness issues (global register has to be made available to other procedures), it only saves values that it cares about Callee saving: it saves only as many registers as it needs (provided it doesn’t call other procedures) A combination of both is typically employed
Instruction Set Encoding Operations are easy to encode efficiently – the key issues are the number of operands and their addressing modes Few addressing modes low complexity in decoding and pipelining, but greater code size Fixed instruction lengths low complexity in decoding, but greater code size
Instruction Lengths
Dealing with Code Size in RISC Some hybrid versions allow for 16 and 32-bit instructions (40% reduction in code size) – useful for embedded apps IBM PowerPC stores 32-bit instructions in compressed form in memory – more hardware complexity on an I-cache miss (need to translate from uncompressed to compressed in addition to virtual to physical) Reducing the register file size can also reduce the instruction length
Compiler Optimizations The phase-ordering problem…early phases have to assume that register allocation will find a register, else, optimizations such as common subexpression elimination may increase memory traffic
Register Allocation Issues Graph coloring: determine when variables are live and avoid allocating the same register to variables that are simultaneously live Stack variables (typically local to a procedure): easy to allocate registers for Global data: can be accessed from multiple places (aliasing), difficult to allocate to registers Heap data: dynamically created objects, accessed with pointers, difficult to allocate to registers because of aliasing
Case Study: The MIPS ISA Load-store architecture Focus on pipelining, decoding, and compiler efficiency In other words, RISC
Registers 32 GPRs (general-purpose/integer registers) and 32 FPRs 64-bit registers; two single-precision FP values can fit in one register Register R0 is hardwired to zero – with displacement addressing mode, we can also accomplish absolute addressing; other uses for R0?
Instruction Format
Control Instructions Comparisons with zero can happen as part of the branch Compares between registers are placed in other registers that are tested by branches Jump-and-link places the return address in register R31
Instruction Frequencies
Summary In the 1960s, stack architectures were considered a good match for high-level languages In the 1970s, software costs were a concern – ISAs were enriched to make the compiler’s job easier – CISC In the 1980s, there was a push for simpler architectures – high clock speed and high parallelism – RISC ISAs designed in 1980 are still around!
The Assembly Line Unpipelined Pipelined Start and finish a job before moving to the next Jobs Time A B C Break the job into smaller stages A B C A B C A B C Pipelined
Performance Improvements? Does it take longer to finish each individual job? Does it take shorter to finish a series of jobs? What assumptions were made while answering these questions? Is a 10-stage pipeline better than a 5-stage pipeline?
Title Bullet