Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 204521 Digital System Architecture Instruction Set Architecture, the DLX and the 80x86 FALL 2000 Pradondet Nilagupta (orginal note from Dr. Robert F.

Similar presentations


Presentation on theme: "1 204521 Digital System Architecture Instruction Set Architecture, the DLX and the 80x86 FALL 2000 Pradondet Nilagupta (orginal note from Dr. Robert F."— Presentation transcript:

1 1 204521 Digital System Architecture Instruction Set Architecture, the DLX and the 80x86 FALL 2000 Pradondet Nilagupta (orginal note from Dr. Robert F. Hodson) (based on notes by Randy Katz)

2 2 204521 Digital System Architecture Review from last time Design Space of ISA Five Primary Dimensions Number of explicit operands( 0, 1, 2, 3 ) Operand StorageWhere besides memory? Effective AddressHow is memory location specified? Type & Size of Operandsbyte, int, float, vector,... How is it specified? Operationsadd, sub, mul,... How is it specifed? Other Aspects SuccessorHow is it specified? ConditionsHow are they determined? EncodingsFixed or variable? Wide? Parallelism

3 3 204521 Digital System Architecture ISA Metrics Aesthetics: Orthogonality –No special registers, few special cases, all operand modes available with any data type or instruction type Completeness –Support for a wide range of operations and target applications Regularity –No overloading for the meanings of instruction fields Streamlined –Resource needs easily determined Ease of compilation (programming?) Ease of implementation Scalability

4 4 204521 Digital System Architecture Basic ISA Classes Accumulator: 1 addressadd Aacc ฌ  acc + mem[A] 1+x addressaddx Aacc ฌ  acc + mem[A + x] Stack: 0 addressaddtos ฌ  tos + next General Purpose Register: 2 addressadd A BEA(A) ฌ  EA(A) + EA(B) 3 addressadd A B CEA(A) ฌ  EA(B) + EA(C) Load/Store: 3 addressadd Ra Rb RcRa ฌ  Rb + Rc load Ra RbRa ฌ  mem[Rb] store Ra Rbmem[Rb] ฌ  Ra

5 5 204521 Digital System Architecture Stack Machines Instruction set: +, -, *, /,... push A, pop A Example: a*b - (a+c*b) push a push b * push a push c push b * + - AB A A*B - + a ab * b * c A A C A A CB B*C

6 6 204521 Digital System Architecture The Case Against Stacks Performance is derived from the existence of several fast registers, not from the way they are organized Data does not always “surface” when needed –Constants, repeated operands, common subexpressions so TOP and Swap instructions are required Code density is about equal to that of GPR instruction sets –Registers have short addresses –Keep things in registers and reuse them Slightly simpler to write a poor compiler, but not an optimizing compiler

7 7 204521 Digital System Architecture Variable format, 2 and 3 address instruction 32-bit word size, 16 GPR (four reserved) Rich set of addressing modes (apply to any operand) Rich set of operations – bit field, stack, call, case, loop, string, system Rich set of data types (B, W, L, Q, O, F, D, G, H) Condition codes VAX-11

8 8 204521 Digital System Architecture Kinds of Addressing Modes Register directRi Immediate (literal)v Direct (absolute)M[v] Register indirect M[Ri] Base+DisplacementM[Ri + v] Base+IndexM[Ri + Rj] Scaled IndexM[Ri + Rj*d + v] AutoincrementM[Ri++] AutodecrementM[Ri - -] Memory Indirect (deferred)M[ M[Ri] ] [Indirection Chains] Ri Rj v memory reg. file

9 9 204521 Digital System Architecture A "Typical" RISC 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, Double Precision takes a register pair) 3-address, reg-reg arithmetic instruction Single address mode for load/store: base + displacement –no indirection Simple branch conditions Delayed branch see: SPARC, MIPS, MC88100, AMD2900, i960, i860 PARisc, DEC Alpha, Clipper, CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3

10 10 204521 Digital System Architecture Example: MIPS Op 312601516202125 Rs1Rd immediate Op 3126025 Op 312601516202125 Rs1Rs2 target RdOpx Register-Register 56 1011 Register-Immediate Op 312601516202125 Rs1Rs2/Opx immediate Branch Jump / Call

11 11 204521 Digital System Architecture Example: DLX OpRs1Rd immediate Op Rs1Rs2 Offset Added to PC Rd R-Type I-Type J-Type Function Rd <-- Rs1 Function Rs2 Load, Stores, Conditional Branched Jump, Jump&Link,RTE 11 55 5 6 6 55 16 6 26

12 12 204521 Digital System Architecture DLX Architecture Introduced by Hennessey and Patterson in 1990. DLX illustrates a typical RISC architecture very similar to the MIPS architecture. –32-bit byte addresses (algined) –32-bit fixed length instructions –3 instruction formats –Load/store architecture –Simple branch conditions (no condition codes). DLX registers –32 32-bit GPRs (R0 = 0) –32 32-bit (or 16 64-bit) FPRs –Special purpose registers (e.g., FP Status and PC)

13 13 204521 Digital System Architecture DLX Instruction Set Appendix C.3 Data transfer –Load/store word –Load/store halfword or byte (singed/unsigned loads) –Load/store floating point single/double –Register moves (many varieties) Arithmetic and Logic –Add/subtract (signed or unsigned, reg. or imm.) –Multiply/divide (signed or unsigned, operands in FP reg.) –And, or, xor (reg. or imm.) –Load high word (loads upper half of a reg. with imm.) –Shifts (LL, RL, RA) (reg. or imm.) –Set conditionals (LT, GT, LE, GE, EQ, NE) (reg. or imm.)

14 14 204521 Digital System Architecture DLX Instruction Set Control –Conditional branch on register (compare with zero) –Conditional on FP status bit (bit true or false) –Jump, jump register (26 bit imm. or reg.) –Jump and link, jump and link register (26 bit imm. or reg.) –Trap, return from exception (trap to and return from O.S.) Floating Point –Add, subtract, multiply, divide (single or double) –FP converts (convert between single, double, and integer) –FP compares (single or double, sets bit in FP status)

15 15 204521 Digital System Architecture Examples of DLX Instructions Data Transfer –LW R1, 30(R2) Regs[R1] <= Mem[30 + Regs[R2]] –SD 40(R3), F0Mem[40 + Regs[3]] <= Regs[F0] Mem[41 + Regs[3]] <= Regs[F1] –How would you perform a register move? a no-op? Arithmetic and Logic –LHI R1, #42 Regs[R1] <= 42##0 –SLT R1, R2, R3if (Regs[R2] < Regs[R3]) Regs[1] <= 1 else Regs[1] <= 0 - How would you load a 32 bit immediate into a register? 16

16 204521 Digital System Architecture Examples of DLX Instructions Control –JALR R2Regs[31] <= PC+4, PC <= Regs[R2] –JR R3PC <= Regs[R3] –How would you implement a subroutine call and return? Floating Point –MULF F1, F2, F3Regs[F1] <= Regs[F2] + Regs[F3] –LTD F1, R2If (Regs[R1] < Regs[R2]) then set a bit in the FP status. – Why don’t they have LTD be a 3 operand instruction, compares 2 floating point registers the third to zero or one? –What would be difficult about adding a floating-point multiply and add instruction to DLX?

17 17 204521 Digital System Architecture DLX Instruction Formats Op 312601516202125 rs1rd immediate Op 3126025 Op 312601516202125 rs1rs2 offset added to PC rd Register-Register (R-type) 56 1011 Register-Immediate (I-type) Jump / Call (J-type) func (ALU imm. operations, loads and stores, conditional branch, jump (and link) (jump, jump and link, trap and return from exception) (ALI reg. operations, read/write special registers and moves)

18 18 204521 Digital System Architecture DLX Addressing Modes Displacement –Register Deferred if Displacement is 0 –PC Relative if Jump or Branch –Absolute if R0 is the base (R0 is always 0) Immediate –Constants contained with the instruction Register Direct –For R-Type Instructions Addressing Mode Encoded in the Opcode –LW (Displacement), ADD (Register), ANDI (Immediate)

19 19 204521 Digital System Architecture DLX Data Types Signed/Unsigned Integer –Byte, HalfWord, Word, DoubleWord Floating Point –Single & Double Precision –IEEE Standard 754 (0.f X 2 ) E

20 20 204521 Digital System Architecture DLX Load/Stores Loads –Word, Byte, Unsigned Byte, Halfword, Float, Double Stores –Word, Byte, Double, Halfword, Float Examples –LW R1, 30(R2)R1 <-- MEM[30+R2] –LB R1,40(R3)R1 <-- MEM[40+R3] 0 ## MEM[40+R3] –SW 500(R4), R3MEM[500+R4] <-- R3 24

21 21 204521 Digital System Architecture DLX Arithmetic/Logical Add/Subtract –immediate,unsigned,immediate/unsigned Multiply/Divide –signed, unsigned Logical –And, Or, Xor Shift –left/right, logial/arithmetic

22 22 204521 Digital System Architecture Additional ALU Functions Set Condition Code Instructions –SLT, SGT, SLE, SGE, SEQ, SNE (Signed Test) –SGT R1, R2, R3 R1 R3 DLX does limits the number of instructions that set condition codes –simplifies compiler instruction scheduling –pipelining must insure a transfer instruction access to a previous instructions condition codes –No PSW

23 23 204521 Digital System Architecture DLX Control Branch –EQ/NEQ to Zero, FP comparison Bit T/F Jumps –Offset, Register, Jump&Link Traps –Transfer to OS at vectored address RFE –Return from exception

24 24 204521 Digital System Architecture DLX Floating Point Add/Subtract/Multiply/Divide –Single/Double Convert –F2D, F2I, D2F, D2I, I2F, I2D Compares –Single/Double, LT, GT, LE, GE, EQ, NE

25 25 204521 Digital System Architecture DLX Instruction Usage

26 26 204521 Digital System Architecture DLX Summary Simple load/store architecture –Only accesses memory on loads/stores –All other operations use registers and immediate Designed for pipeline efficiency –Fixed length instruction encoding –Simple instructions Easy to compile to –Simple, frequently used instructions –Orthogonal instruction set –Few addressing modes Reduces execution time by –reducing CPI –reducing clock rate

27 27 204521 Digital System Architecture History of the Intel 80x86 1971: Intel invents microprocessor - 4004 1975: 8080 introduced –8-bit microprocessor –Accumulator machine 1978: 8086 introduced –16 bit microprocessor –Accumulator plus dedicated registers 1980: IBM selects 8088 as basis for IBM PC –8088 is 8-bit external bus version of 8086 1980: 8087 floating point coprocessor –adds 60 floating point instructions –80 bit floating point registers –uses hybrid stack/register scheme

28 28 204521 Digital System Architecture History of the Intel 80x86 1982: 80286 introduced –24-bit address –memory mapping & protection 1985: 80386 introduced –32-bit address –32-bit GP registers 1989: 80486 introduced 1992: Pentium introduced 1995: Pentium Pro introduced 1996: Pentium with MMX extensions –57 new instructions –Primarily for multimedia applications 1997: Pentium II (Pentium Pro with MMX)

29 29 204521 Digital System Architecture Intel 80x86 Integer Registers

30 30 204521 Digital System Architecture Intel 80x86 Floating Point Registers Operations on the top of stack and one register within the stack

31 31 204521 Digital System Architecture Usage of Intel 80x86 Floating Point Registers NASA7Spice Stack (2nd operand ST(1)) 0.3%2.0% Register (2nd operand ST(i), i>1)23.3%8.3% Memory 76.3%89.7% Above are dynamic instruction percentages (i.e., based on counts of executed instructions) Stack unused by Solaris compilers for fastest execution

32 32 204521 Digital System Architecture 80x86 Addressing/Protection 1 MB 16 MB 4 GB

33 33 204521 Digital System Architecture 8086 in black; 80386 extensions in color 80x86 Instruction Format (Base reg + 2 Scale x Index reg)

34 34 204521 Digital System Architecture 80x86 Instructions Data movement (move, push, pop) Arithmetic and logic (logic ops, tests CCs, shifts, integer and decimal arithmetic) Control flow (branches, jumps, calls, returns) String instructions (move and compare) FP data movement (load, load const., store) Arithmetic instructions (add, subtract, multiply, divide, square root, absolute value) Comparisons (can send result to ALU) Transcendental functions (sin, cos, log, etc.)

35 35 204521 Digital System Architecture 80x86 Instruction Encoding: Mod, Reg, R/M Field r w=0 w=1 r/m mod=0 mod=1 mod=2 mod=3 16b32b16b32b16b32b16b32b 0ALAXEAX0addr=BX+SI=EAXsamesamesamesame same 1CLCXECX1addr=BX+DI=ECXaddr addr addr addr as 2DLDXEDX2addr=BP+SI=EDXmod=0mod=0mod=0mod=0 reg 3BLBXEBX3addr=BP+SI=EBX+d8+d8+d16+d32 field 4AHSPESP4addr=SI=(sib)SI+d8(sib)+d8SI+d8(sib)+d32“ 5CHBPEBP5addr=DI=d32DI+d8EBP+d8DI+d16EBP+d32“ 6DHSIESI6addr=d16=ESIBP+d8ESI+d8BP+d16ESI+d32“ 7BHDIEDI7addr=BX=EDIBX+d8EDI+d8BX+d16EDI+d32“ First address specifier: Reg=3 bits, R/M=3 bits, Mod=2 bits w from opcode r/m field depends on mod and machine mode

36 36 204521 Digital System Architecture 80x86 Instruction Encoding Sc/Index/Base field IndexBase 0EAXEAX 1ECXECX 2EDXEDX 3EBXEBX 4no indexESP 5EBPif mod=0, d32 if mod ฐ 0, EBP 6ESIESI 7EDIEDI Base + Scaled Index Mode Used when: mod = 0,1,2 in 32-bit mode AND r/m = 4! 2-bit Scale Field 3-bit Index Field 3-bit Base Field

37 37 204521 Digital System Architecture 80x86 Addressing Mode Usage for 32-bit Mode Addressing ModeGccEspr.NASA7SpiceAvg. Register indirect10%10%6%2%7% Base + 8-bit disp46%43%32%4%31% Base + 32-bit disp2%0%24%10%9% Indexed1%0%1%0%1% Based indexed + 8b disp0%0%4%0%1% Based indexed + 32b disp0%0%0%0%0% Base + Scaled Indexed12%31%9%0%13% Base + Scaled Index + 8b disp2%1%2%0%1% Base + Scaled Index + 32b disp6%2%2%33%11% 32-bit Direct19%12%20%51%26%

38 38 204521 Digital System Architecture 80x86 Length Distribution

39 39 204521 Digital System Architecture Instruction Counts: 80x86 vs. DLX SPEC pgm x86DLX DLX86 gcc 3,771,327,742 3,892,063,4601.03 espresso 2,216,423,413 2,801,294,2861.26 spice15,257,026,309 16,965,928,7881.11 nasa715,603,040,963 6,118,740,3210.39 DLX tends to perform more instructions for integer programs, while the 80x86 performs more instructions for floating point programs 80x86 performs many more data transfers –Two to four times more for floating point programs –About 1.25 times more for integer programs

40 40 204521 Digital System Architecture Intel Compiler vs. Compilers YOU Can Buy 66 MHz Pentium ComparisonSpecInt92SpecFP92 Intel Internal Optimizing Compiler64.659.7 Best 486 Compiler (June 1993)57.639.9 Typical 486 Compiler in 1990, 41.032.5 when Intel started project Integer Intel 1.1X faster, FP 1.5X faster 486 ComparisonSpecInt92SpecFP92 Intel Internal Optimizing Compiler35.517.5 Best 486 Compiler (June 1993)32.216.0 Typical 486 Compiler in 1990, 23.012.8 when Intel started project Integer: Intel 1.1X faster, FP 1.1X faster

41 41 204521 Digital System Architecture Intel Summary Archeology: history of instruction design in a single product –Address size: 16 bit vs. 32-bit –Protection: Segmentation vs. paged –Temp. storage: accumulator vs. stack vs. registers “Golden Handcuffs” of binary compatability affect design 20 years later, as Moore predicted Not too difficult to make faster, as Intel has shown HP/Intel announcement of common future instruction set by 2000 means end of 80x86??? “Beauty is in the eye of the beholder” –At 50M/year sold, it is a beautiful business


Download ppt "1 204521 Digital System Architecture Instruction Set Architecture, the DLX and the 80x86 FALL 2000 Pradondet Nilagupta (orginal note from Dr. Robert F."

Similar presentations


Ads by Google