Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola Memory- Memory 33VAX
2.4. Addressing Modes for DSP DSPs work with infinite, continuous streams of data –Use circular buffers –Have a modulo or circular addressing mode Address registers have start and end registers Autoincrement/autodecrement automatically reset at start/end
DSP Fast Fourier Transforms (FFT) common –Shuffle data in a distinct pattern –In binary: need to reverse address bits –Many DSPs have bit reverse addressing mode
Usage Circular addressing –2.35% Bit reverse –0% Four “basic” modes (immediate, displacement, register indirect, direct) –70% –Rest are variations on autoincrement/auto- decrement, including circular addressing
Summary: Memory Addressing New machines should support at least: –Displacement –Immediate –Register indirect Size should be: –12–16 bits for displacements –8–16 bits for immediates
2.5. Type and Size of Operands How is this designated? –Encoded in the instruction –Tags on the data Type normally determines size
Alternative Types Character strings –Comparisons and moves Decimal (BCD) formats –0…9 (4 bits), two digits per byte –String conversions –Arithmetic operations
Type Usage Integer applications –Mainly double word (59%), but skewed by 64- bit addresses FP applications –Mainly double word (70%) Important design issue
2.6. Operands for Media and DSPs Graphics applications –Vertices (four 32-bit values: x,y,z,w) –Pixels (32 bits: four 8-bit values: R,G,B,A) DSPs –Fixed point (fractions between –1 and +1) May experience rounding errors: registers wider than data size
2.7. Operations Several categories: –Arithmetic / logical –Data transfer –Control –System –Floating point –Decimal –String –Graphics } Common } Application-specific
Operations Example: –Intel 80x86 (SPECint92) –10 instructions account for 96% of execution! Load, branch, compare, store, add, and, sub, move, call, return –The first seven account for 90%
2.8. Media and DSP Operations Media operations –Partitioned add E.g. 64-bit add does 4 × 16-bit adds or 8 × 8-bit adds –Paired single operations 64-bit FP ops do 2 × 32-bit operations –Fig SPARC VIS and Pentium MMX
DSP Operations Use “saturating arithmetic” –Overflow results in maximum value, not wrap- around Rounding –Wide registers into narrow data words Multiply-Accumulate (MAC) instructions –Key to dot-products for matrices and vectors
2.9. Control Flow Instructions Tend to be independent of other factors Terminology –Highly varied! –H&P: Jump (unconditional) Branch (conditional)
Control Flow Instructions Mainly conditional branches (+/–80%) Almost always PC-relative addressing –Requires fewer bits –Position independence Register indirect jumps –Useful for switch, dynamic libraries, OO programs
Control Flow Instructions How many bits are needed for address? –10 bits is plenty (Alpha) Specifying conditions? –Condition codes –Register –Compare-and-branch DSP: –repeat (uses a counter register)
Control Flow Instructions Procedure call and return –How are registers saved? Caller save –vs– Callee save –Most current systems use a combination
2.10. Instruction Encoding All the previous factors are important Affects: –Size of programs –Ease of decoding Require opcode How is address info. handled?
Encoding the Address Many and complex addressing modes –Separate address specifier Simple load/store architectures –Address included in the instruction
Trade-Offs Desire for many registers and addressing modes Instruction (and program) size Easy decoding –Fixed length instructions
Examples Variable –VAX –Instructions range from 1 to 53 bytes! –Zero to three operands in memory (0…6 memory accesses) Fixed –SPARC, MIPS, Alpha, etc. Hybrid –IBM 360
Reducing Code Size in RISCs Embedded processors –Program size is important cost factor –Led to hybrid RISC instruction sets 16- and 32-bit instructions MIPS: MIPS16 Code size reduction of up to 40% –Hitachi Developed a new 16-bit RISC architecture (SuperH)
Reducing Code Size in RISCs IBM –Compress programs –Hardware decompresses instructions when fetched from memory into cache –Performance impact: 10% –Code size reduction: 35%–40% –Compilers do not need modification