Computer Architecture Principles Dr. Mike Frank CDA 5155 Summer 2003 Module #9 Basics of Instruction Set Architectures
Moving on to Chapter 2... Topic: Principles of instruction set design… An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: A set of instructions (really, instruction types) With associated argument fields, assembly syntax, and machine encoding. A set of named storage locations Registers, memory, … Programmer-accessible caches? A set of addressing modes (ways to name locations) Often an I/O interface (usu. memory-mapped)
Classifying Architectures
Broad Classes of Processor Architectures Can be distinguished by radically differing programming models: Instruction-based programming model: Subdivision of hardware into memory, register file, ALUs, etc. Traditional assembly-language operations; like high-level RTL statements Load, add, move, etc. operating on programmer-visible registers Microinstruction-based programming model: Similar microarchitecture to that used in instruction-based archs., but… Microinstructions directly specify control signals to be sent to internal mach. registers, muxes, ALUs, etc.; equivalent to low-level RTL statements Dataflow / systolic array programming models: “Program” specifies custom pattern of connectivity among pre-existing arrays of hardware functional units operating in parallel Examples: MIT J-machine, RAW project Circuit-based programming models: “Program” specifies interconnection and internal logic of functional units. Examples: FPGAs (Field-Programmable Gate Arrays) (Xilinx, Altera) Others? (E.g., mesh of processing elements, ea. w. all features) Focus of this course
The Best of All Possible Worlds? Chip Multiprocessor: Homogeneous 2-D (or 3-D) mesh of processing elements Heterogeneous Processing Element Systolic dataflow array Superscalar, highly dynamic, heavily pipelined, RISC CPU core Special-purpose units (graphics, signals, media) Local memory hierarchy 1st-level cache 2nd level cache Reconfigurable FPGA core 3rd-level cache or local DRAM Communications/power/cooling grid
Chapter 2 contents Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F (online at mkp.com)
2.2. Classifying Architectures One important classification scheme is by the type of addressing modes supported. Stack architecture: Operands implicitly on top of a stack. (Early machines, Intel floating-point.) Accumulator architecture: One operand is implicitly an accumulator (a special register). (Early machs.) General-purpose register arch.: Operands may be any of a large (typically 10s-100s) # of registers. Register-memory architectures: One op may be memory. Load-store architectures: All ops are registers, except in special load and store instructions.
Illustrating Architecture Types Assembly for C:=A+B:
Number of Operands A further classification is by the max. number of operands, and # that can be memory: e.g., 2-operand (e.g. a += b) src/dest(reg), src(reg) src/dest(reg), src(mem) IBM 360, x86, 68k src/dest(mem), src(mem) VAX 3-operand (e.g. a = b+c) dest(reg), src1(reg), src2(reg) MIPS, PPC, SPARC, &c. dest(reg), src1(reg), src2(mem) dest(mem), src1(mem), src2(mem) VAX
Memory Addressing Modes & Conventions
2.3. Memory Addressing A memory address n names the location of the (n+1)th “item” in memory. If each item is a byte (octet, 8-bit chunk), then the ISA’s memory system is byte-addressed. (Standard) Also possible is numbering with larger chunks (e.g., 32 bits), such memories are called word-addressed. Objects consisting of several consecutive items might be accessible as a unit: Bytes, half-words (2 bytes), words (4 bytes), double words (8 bytes).
Endians & Alignment Increasing byte address 7 6 5 4 3 2 1 4 4 Word-aligned word at byte address 4. 2 Halfword-aligned word at byte address 2. 1 Byte-aligned (non-aligned) word, at byte address 1. 4 Little-endian byte order (least-significant byte “first”). 3 (MSB) 2 1 0 (LSB) 4 Big-endian byte order (most-significant byte “first”). 0 (LSB) 1 2 3 (MSB)
Addressing Modes In example assembly syntax in middle column, ( ) indicates memory access. (A typical syntax.) In RTL syntax on right, [ ] denotes accessing a member of an array, Register or Memory.
Addressing Modes Visualization Name Instr. Field(s) Reg. File Memory Immediate imm reg Register addr Direct reg Indirect “base” address reg imm all your base are belong to us Displacement + offset
Example row size = 8 locations Addr. Mode Vis. Cont. Mode Name Instr. Field(s) Reg. File Memory “base” address reg1 reg2 Indexed + offset Memory Indirect reg Scaled reg1 reg2 rowsz + (r1)[r2] × Example row size = 8 locations Base address index
(Out of non-register modes) Addressing Mode Usage (Out of non-register modes) (on a VAX)
(Alpha, optimized, SPEC CPU2000) Offset Distribution (Alpha, optimized, SPEC CPU2000)
Popularity of Immediates (Alpha, optimized, SPEC CPU2000)
Distribution of Immediates
Instruction Categories
2.7. Types of Instructions
Instruction Distribution
Control-Flow Instructions
2.9. Control Flow Instructions Four basic types: (Conditional) branches (Unconditional) jumps Procedure calls Procedure returns Control flow addressing modes: Often PC-relative (PC + displacement). Relocatable. Also useful: register indirect jumps (reg. has addr.). Uses: Case / switch statements Virtual functions / methods (abstract class method calls) High-order functions / function pointers Dynamically shared libraries
Conditional Branch Options Condition Code (CC) Register E.g.: X86, ARM, PPC, SPARC, … ALU ops set condition code flags in the CCR Branch just checks the flag Condition register E.g.: Alpha, MIPS Comparison instruction puts result in a GPR Branch instruction checks the register Compare & Branch E.g.: PA-RISC, VAX Compare & branch in 1 instruction.
Special Control-Flow Instrs. In DSPs: Repeat instruction Repeat subsequent code block n times Avoids some loop overhead
Procedure Calling Conventions Two major calling conventions: Caller saves: Before the call, procedure caller saves registers that will be needed later Callee saves: Inside the call, called procedure saves registers that it will overwrite Can be more efficient if many small procedures Many archs. use a combination of schemes: E.g., MIPS: Some registers caller-saves, some callee-saves
Control Flow Instr. Distrib.
Branch Distances
Comparison Types
Data Access Sizes
Outline of Today’s Lecture Additions for signal & media processing: Addressing modes Operands Instruction types Instruction set encodings Role of compilers Examples: MIPS, Trimedia Fallacies & Pitfalls
Last Lecture / This Lecture Chapter 2 contents Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F (online at mkp.com) Last Lecture / This Lecture
DSP & Multimedia Instruction-Set Extensions
Special DSP/media Addr. Modes Modulo or circular addressing: For dealing with circular buffers for handling infinite, continuous streams of data Automatically increment pointer, reset to start of buffer if at end Bit reverse addressing: Facilitates Fast Fourier Transform (FFT) operation The n low-order bits of an address are reversed before making the access Special modes rarely used even in DSP code Mainly just in hand-coded assembly library routines Strided, gather/scatter addressing: Used in SIMD vector machines
Special DSP & Media Operands Media processing (e.g., 2-D & 3-D graphics): Vertex (x,y,z,w coordinates, each a 32-bit float) w is a visibility or color value Pixel (R,G,B,A channels, each an 8-bit integer) Red, Green, Blue; A is transparency Signal processing Fixed point (fractions between −1 and +1)
Special DSP & Media Operations Partitioned add, etc. Use same hardware for multiple small ops as for a single large op E.g. use 1 same hardware that makes up one 64-bit ALU to do four 16-bit adds simultaneously Or, 2 single-precision FP ops w. 1 instruction Examples: Intel MMX, PowerPC AltiVec SIMD (single-inst., multiple data) / vector ops Same idea, more general – used on supercomputers Saturating add, etc. Max out @ MAXINT, instead of throw overflow exception Multiply-accumulate (MAC) Used in dot products for vector & matrix multiplications Others: Max, min, pack, unpack, merge, permute, shuffle, abs
2.10. Instruction Set Encodings Competing forces in IS encoding design: Want as many registers & modes as possible Large register & mode fields larger programs Want simplicity of pipelined execution path Some solutions: Variable-length encoding (VAX, x86) Fixed-length encoding (most RISC) Hybrid (e.g., MIPS16, Thumb) Dynamic decompression (IBM CodePack)
Instruction Set Encodings
Compiler Technology and ISA Design
2.11. Compiler Passes
Compiler Optimizations
Compiler Optimizations cont.
Effect of Optimization
Compilers Need Architectures that… Provide regularity Orthogonality (independence) of: Registers used Addressing modes Operations used Provide primitives, not solutions Don’t directly support specific kernels or languages Simplify trade-offs among alternatives Make easy to tell fastest code sequence @ compile time Don’t interpret values known at compile time Allow compile-time constants to be provided in immediates
ISA Example: MIPS
Design Principles used in MIPS 2.2. Use GPRs, load-store architecture 2.3. Best addr. Modes: Displacement (12-16 bits), immediate (8-16 bits), register indirect. 2.5. Data sizes/types: 8-64 bit integers, 64-bit IEEE 754 standard doubles 2.7. Support load, store, add, subtract, move, shift. 2.9. Compares: =, ≠, <, branch (relative 8+-bit), jump, call, return 2.10. Fixed encoding for performance, variable for code size 2.11. GPRs, orthogonality, simplicity
MIPS64 Registers 32-bit instructions 32 64-bit GPRs, R0-R31. Really, only 31 – R0 is just a constant 0. 32 64-bit FPRs, F0-F31 Can hold 32-bit floats also (with other ½ unused). “SIMD” extensions operate on 2 floats in 1 FPR A few special registers Floating-point status register Load/store 8-, 16-, 32-, 64-bit integers All sign-extended to fill 64-bit GPR Also 32- bit floats/doubles
MIPS Addressing Modes Register (arith./logical ops only) Immediate (arith./logical only) & Displacement (load/stores only) 16-bit immediate / offset field Register indirect: use 0 as displacement offset Direct (absolute): use R0 as displacement base Byte-addressed memory, 64-bit address Software-settable big-endian/little-endian flag Alignment required
MIPS Instruction Layouts
I-type instructions
R-type instructions
J-type instructions
MIPS Operations Go through figures 2.28-2.31 in textbook, Also, see appendix C online @ www.mkp.com/CA3 (if it’s up) for more details on MIPS & other RISC-style architectures Patterson & Hennssy, Computer Organization & Design: The Hardware/Software Interface, appendix A-10 has a description of MIPS ISA. Textbook for CDA3101
MIPS dynamic instr. frequencies Integer benchmarks Floating-point benchmarks Also see figs. 2.32 (int) & 2.33 (fp) in textbook
MIPS versus VAX (2.16)
Other ISA Examples
2.13 Trimedia TM32 CPU A “media processor” Some special features: Typical media apps include: Data communication (Viterbi decoding) Audio coding (AC3, MP3 encode/decode) Video coding (MPEG2 encode/decode, DVD decode) Video processing (various transforms) Graphics (3d render) Some special features: 5x VLIW encoding (compare 3x in IA-64) Instructions compressed in memory, like w. CodePack
Intel architecture lineage 1970, 4-bit: 4004 (1st gen.-purpose comm. p) 1970’s, 8-bit: 8008, 8080 accumulator machine 1978-80, 16-bit: 8086, + 8087 (stack-based fpu) extended accumulator:additional dedicated registers 1982, 24-bit: 80286 (bigger addr. space, etc.) 1985, 32-bit (quasi-GPR): 2×108 in svc. in 1995 80386, 80486, Pentium, PII, PIII, P4. Any time now, 64-bit: IA-64 (Merced/Itanium)
Intel 80x86 ISA See Williams, Computer Systems Architecture: A Networking Approach, chapter 7 We’ll put this on reserve (& online)
Intel IA-64 See various technical documents available from intel.com, other websites
Some Motorola arch. lineage 8-bit: 6800, 650x, 6809, 68HC11 Early Radio Shack home computers, many embedded systems 16-bit: 68HC12, 68HC16 32-bit: 68000, 68010, 68020, 68030, 68040, 68060 Apple Lisa, Macintosh, orig. commodore Amiga Motorola adopted 32-bit PowerPC for its later processors: 821, 823, 850, 860 PDAs, embedded systems
Thursday’s Lecture We’ll survey other current architectures: Other RISC archs.: Alpha, Sparc, PPC, PA, ARM… Transmeta Crusoe?
Intel IA-64 See various technical documents available from intel.com, other websites
Other archs. We’ll survey other current architectures: Other RISC archs.: Alpha, Sparc, PPC, PA, ARM… Transmeta Crusoe?
Other RISC architectures MIPS (MIPS technologies, inc.) SGIs, CPU cores & embedded systems: routers, PlayStation 2, web appliances, Aibo, TV systems. PowerPC (IBM, Motorola, and Apple) Embedded systems, Macintoshes, new Amiga. SPARC (Sun Microsystems) Workstations, servers, PCI cards, storage systems… PA-RISC (Hewlett-Packard) Workstations, servers Alpha (DEC Compaq HP?) 64-bit Unix workstations, VMS servers
Some details from some archs. Go through sections of various books… H&P: App. C, RISC archs. App. D, 80x86 archs. Stallings: PDP-8 & PDP-10 ISAs VAX ISA PII & PPC opcodes, addressing modes, inst. formats IA-64 inst. Format Williams: Chapter on Pentium Section on IA-64 Chapter on MC68300 microcontroller
Fallacies & Pitfalls Fallacies: There is a “typical” program. A flawed architecture cannot succeed. You can design a flawless architecture. Pitfalls: “High-level” ISA features Reducing code size w/o considering compiler Expecting good perf. from DSP compilers