1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set.

1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set Control flow instructions Instruction format Structure of recent compilers MMX technology MIPS instruction set

2 Introduction An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: –A set of instructions (really, instruction types) With associated argument fields, assembly syntax, and machine encoding. –A set of named storage locations Registers, memory, … Programmer-accessible caches? –A set of addressing modes (ways to name locations) –Often an I/O interface (usually memory-mapped)

3 Classifying Architectures One important classification scheme is by the type of addressing modes supported. –Stack architecture: Operands implicitly on top of a stack. (Early machines.) –Accumulator architecture: One operand is implicitly an accumulator (a special register). (Early machs.) –General-purpose register architecture: Operands may be any of a large (typically 10s-100s) # of registers. Register-memory architectures: One op may be memory. Load-store architectures: All ops are registers, except in special load and store instructions.

4 Four Architecture Classes Assembly for C:=A+B :

5 A further classification is by the maximum number of operands, and # that can be memory: e.g., –2-operand (e.g. a += b) src/dest(reg), src(reg) src/dest(reg), src(mem)IBM 360, x86, 68k src/dest(mem), src(mem) VAX –3-operand (e.g. a = b+c) dest(reg), src1(reg), src2(reg) MIPS, PPC, SPARC, &c. dest(reg), src1(reg), src2(mem) IBM 370 dest(mem), src1(mem), src2(mem)IBM 370, VAX Number of Operands

6 Further Classification # of Memory Operands # of OperandsType of Architecture Examples 03Register-registerAlpha, ARM, MIPS, PowerPC, Sparc,etc 12Register-memoryIBM360/370, Intel 80x86, Motorola 68000, TI C54x 22Memory-memoryVAX 33Memory-memoryVAX

7 Comparison of Architecture Types TypeInstruction Encoding Code Generation # of Clock Cycles/Inst. Code Size Register- register Fixed-lengthSimpleSimilarLarge Register- memory EasyModerateDifferentMedium Memory- memory Variable- length ComplexLarge variation Compact Advantages Disadvantages

8 Endians & Alignment 01234567 4 1 Word-aligned word at byte address 4. Byte-aligned (non-aligned) word, at byte address 1. 2 Halfword-aligned word at byte address 2. Increasing byte address 0 (LSB)123 (MSB) 210 (LSB) Little-endian byte order (least-significant byte “first”). Big-endian byte order (most-significant byte “first”). word

9 Addressing Modes In example assembly syntax in middle column, ( ) indicates memory access. (A typical syntax.) In RTL syntax on right, [ ] denotes accessing a member of an array, Register or Memory.

10 Addressing Mode Usage 3 SPEC89 on VAX

11 Displacement Distribution SPEC CPU2000 on Alpha Sign bit is not counted

12 Use of Immediate Operand

13 Distribution of Immediate SPEC CPU2000 on Alpha Sign bit is not counted

14 Instruction Type

15 Instruction Distribution (5 SPECint92)

16 Control Flow Instructions Four basic types: –(Conditional) branches –(Unconditional) jumps –Procedure calls –Procedure returns Control flow addressing modes: –Often PC-relative (PC + displacement). Relocatable. –Also useful: register indirect jumps (reg. has addr.). Uses: Procedure returns Case / switch statements Virtual functions / methods (abstract class method calls) High-order functions / function pointers Dynamically shared libraries

17 Conditional Branch Options Condition Code (CC) Register –E.g.: X86, ARM, PPC, SPARC, … –ALU ops set condition code flags in the CCR –Branch just checks the flag Condition register –E.g.: Alpha, MIPS –Comparison instruction puts result in a GPR –Branch instruction checks the register Compare & Branch –E.g.: PA-RISC, VAX –Compare & branch in 1 instruction.

18 Procedure Calling Conventions Two major calling conventions: –Caller saves: Before the call, procedure caller saves registers that will be needed later, even if callee did not use them –Callee saves: Inside the call, called procedure saves registers that it will overwrite Can be more efficient if many small procedures Many architectures use a combination of schemes: –E.g., MIPS: Some registers caller-saves, some callee- saves

19 Three Classes of Control Instructions SPEC CPU2000 on Alpha

20 Branch Distance Distribution SPEC CPU2000 on Alpha

21 Branch Comparison Types SPEC CPU2000 on Alpha

22 Encoding An Instruction Set

23 Compiler Structure

24 Compiler Optimizations

25 Compiler Optimizations (cont.)

26 Effect of Optimization

27 Architectural Support for Compiler Provide regularity –Orthogonality (independence) of: Registers used Addressing modes Operations used Provide primitives, not solutions –Don’t directly support specific kernels or languages Simplify trade-offs among alternatives –Make easy to tell fastest code sequence @ compile time Don’t interpret values known at compile time –Allow compile-time constants to be provided in immediates

28 MIPS Architecture RISC, load-store architecture, simple address 32-bit instructions, fixed format 32 64-bit GPRs, R0-R31. –Really, only 31 – R0 is just a constant 0. 32 64-bit FPRs, F0-F31 –Can hold 32-bit floats also (with other ½ unused). –“SIMD” extensions operate on more floats in 1 FPR A few special registers –Floating-point status register Load/store 8-, 16-, 32-, 64-bit integers –All sign-extended to fill 64-bit GPR –Also 32- bit floats/doubles

29 MIPS Addressing Modes Register (arith./logical ops only) Immediate (arith./logical only) & Displacement (load/stores only) –16-bit immediate / offset field –Register indirect: use 0 as displacement offset –Direct (absolute): use R0 as displacement base Byte-addressed memory, 64-bit address Software-settable big-endian/little-endian flag Alignment required

30 Inst. Format: I-type Instructions

31 Inst. Format: R-type Instructions

32 Inst. Format: J-type Instructions

33 MIPS Instruction Set Go through Figures A.23-A.25 in textbook, –Loads and stores in MIPS, Figure A.23 –Arithmetic and logical instructions, Figure A.24 –Control flow instructions, Figure A.25 More on Appendix A: Figure A.26 – A.30.

34 MIPS Dynamic Instr. Frequencies Integer benchmarks FP benchmarks

35 Multimedia Extensions Graphics displays work on pixels: 8, 16, 32 bits per pixel to define pixel colors Audio samples of 16, 24 bits Exploit subword parallelism using existing 64/128 bit registers and ALUs Intel i860, first (1989) to operate on 8 8-bit, 4 16- bit, or 2 32-bit operands on 64-bit ALUs Almost all microprocessors have media extensions Intel use SIMD to describe MMX extensions, only limit in the width of registers, e.g. 64 bits

36 Intel MMX Technology MMX registers: 64-bit MM0 to MM7 shared with FP registers R0, R7, has side-effect on FPU state, only use for operands Four MMX data types: 063 Packed Byte 8x8 Packed Word 16x4 Packed Doubleword 32x2 Quadword 64 MMX Register 64-bit / 32-bit access mode from memory to MMX registers SIMD techniques for arithmetic/logical operations on bytes, words, doublewords from/to 64-bit registers

37 MMX Instruction Set MMX instruction set consists of 57 instructions, group into 7 categories: (See Intel Architecture Software Developer’s Manual Vol. 1 Basic Architecture (order#: 143190); Vol. 2 Instruction Set Ref. (order#: 243191); Vol. 3 System Programming Guide (order#: 243192) at: http://developer.intel.com/design/archives/proces sors/mmx/index.htm http://developer.intel.com/design/archives/proces sors/mmx/index.htm –Arithmetic instructions –Data transfer instructions –Comparison instructions –Conversion instructions –Logical instructions –Shift instructions –Empty MMX state instruction (EMMS)

38 Conventional scalar operations vs. SIMD - PADDW A2 A1 A3 A4 B2 B1 B3 B4 + A4+B4 A3+B3 A2+B2 A1+B1 A1A2A3A4B1B2B3B4 + A4+B4A3+B3A2+B2A1+B1 4-time faster, but require to move data in/out of the MMX registers SIMD – Parallel Operations

39 Packed Multiply Add 4 multiplications and 2 adds in one PMADDWD instruction A1xB1 + A0+B0A3xB3 + A2xB2 A0xB0A1xB1A2xB2A3xB3 A0 A1 A2 A3 B0 B1 B2 B3 xxxx Source 1 Source 2 Destination (Result DW) Intermediate PMADDWD produces 2 DW (32 bits) results –Useful inst. for many media and signal applications –Need arrange and pack input / output results to/from MMX registers, add programming complexity and performance overhead

40 Data Move Instructions MOVD m32, mm A1 A0A3 A2xx A3 A2 A1 A0 Memory m32 MOVD mm, r32 mm 0 0 15 63 A1 A0A3 A200 063 A1 A0A3 A2 031 Move data between MMX registers and memory or regular register for SIMD instructions

1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set.

Similar presentations

Presentation on theme: "1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set.

Similar presentations

Presentation on theme: "1 Appendix A: Instruction Set Principles and Examples Classifying Instruction Set Architecture Memory addressing mode Operations in the instruction set."— Presentation transcript:

Similar presentations

About project

Feedback