© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 3: Instruction Set Architecture.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The Big Picture Requirements Algorithms Prog. Lang./OS ISA uArch Circuit Device Problem Focus Performance Focus f2() { f3(s2, &j, &i); *s2->p = 10; i = *s2->q + i; } i1: ld r1, b i2: ld r2, c i3: ld r5, z i4: mul r6, r5, 3 i5: add r3, r1, r2 f1f2 f3 f4 f5 sq p j i fp f3 SPEC

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Instruction Set Architecture Application Instruction Set Architecture Implementation …SPARC MIPSARM x86 HP-PA IA-64… Intel Pentium X AMD K6, Athlon, Opteron Transmeta Crusoe TM5x00

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Instruction Set Architecture Strong influence on cost/performance New ISAs are rare, but versions are not –16-bit, 32-bit and 64-bit X86 versions Longevity is a strong function of marketing prowess

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Strongly constrained by the number of bits available to instruction encoding Opcodes/operands Registers/memory Addressing modes Orthogonality 0, 1, 2, 3 address machines Instruction formats Decoding uniformity Traditional Issues

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Instruction Formats Alpha (fixed length) 32 bits 6 bits opcode RA RB RC TRAP Branch Mem Operate x86 (variable length) prefixesopcodeaddr modedisplimm 0 to 4 bytes of prefix 1 or 2 bytes of opcode 0 to 2 bytes (ModR/M and SIB) 0 to 8 bytes

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The (old) Debate : RISC vs. CISC At the time, IBM 370 and VAX dominated CMOS was up and coming technology –Small number of transistors per chip RISC was appealing –lower design complexity –easier to pipeline –higher performance when fit on a chip IBM 801 (Cocke et al, 1982) RISC I (Patterson et al, 1982) MIPS (Hennesey et al, 1982)

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois What is RISC? Fixed length instructions Few formats Load/Store Few addressing modes Simple decode/control Many registers Few “unpipelinable” insts Compiler Complexity Hardware Complexity

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The MIPS Pipeline Compiler knows the pipeline organization Schedules instructions around “hazards” Branches are handled by delay slots No need to “interlock” the pipeline FetchDecodeALUMemoryWriteBack

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois RISC Baggage In hindsight, like CISC, even RISC architectures suffered from legacy effects. –Delay slots Used for dealing with branches in short pipelines Helps primarily with target generation Becomes a burden for the future generations whose pipelines need to be deeper –Register windows Quick save/restore state for procedure calls Reduce procedure call overhead to 1 cycle Makes register renaming and out of order execution more complex

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The Dynamic-Static Interface Perhaps the main contribution of the “RISC Revolution” John Cocke (IBM) is credited for the original idea. John Hennessy a major driving force later, followed by IMPACT team at Illinois. “…a willingness to make design tradeoffs freely… between the architecture and implementation…” Colwell et al, 1985. This legacy is still alive and kicking today. DSI

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois DSI and Static Optimization Granularity of ISA instruction ISAs for reconfigurable architectures VAX ISA MIPS ISA Potential for Static Optimization Itanium ISA

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Variable Instruction Format Motivation 1: to accommodate a large number of opcodes with nonuniform frequency of occurrence –VAX has 304 opcodes. If we insisted using uniform opcode encoding, we would need 9 bits. Due to the policy of byte alignment, one needs two bytes to encode each VAX opcode. –An observation: some opcodes are used more often than the others. The top 200 opcodes acount for about 98% of the dynamic opcode usage. –Instead of using 2 bytes to encode all the 304 opcodes, use 1 byte to encode the frequently used ones and use 2 bytes to encode the infrequently used ones.

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Variable Instruction Format (cont.) Motivation 2: to allow each instr. exactly the number of operands it needs. –RET (0), INC (1), ADD (2 or 3), … Motivation 3: to allow each operand specifier exactly the number of bytes ineeds. –Reg (1), Disp (1 to 8), … All motvations come from reducing the amount of bytes needed to –represent the program –be fetched during execution

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois VIF Cost Sequential Decoding Problem –The decoder cannot be sure where the 1st operand specifier is until the opcode is decoded. –The decoder cannot locate the ith operand specifier is until the (i-1)th operand specifier is decoded. –The decoder can not be sure where the jth instruction starts until the last operand specifier of the (j-1)th instruction is decode. –Typical solution: instruction buffering with multi- stage decode pipeline plus post-decode I-cache or trace cache

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois VIF Cost (cont.) Non-aligned Instruction Access: instructions are not aligned to any byte position in each memory word. –Instruction opcodes and operand specifiers are not aligned to the decoding logic when fetched from memory. –Instructions may spill over cache block boundaries and page boundaries –Typical solution: instruction buffer that decouples fetch and decode

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Data Dependent Decoding What an instruction does depends on the values of the explicit and/or implicit input operands. –Cause: generality of instructions –Example: string move instructions in x86 generates different number of loads and stores according to an input operand value –Typical solution: use microcode when executing these instructions

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Number of Registers The large number of registers allows the compiler to eliminate memory references and redundant computation by storing more values in the register file –Cost: more bits to encode register operends –Benefit: suppot for the compiler to achieve high performance –MIPS had 32, IA-64 has 128 (levels of metallines is a factor here)

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Compatibility – subtle issues Most from incomplete ISA specification –Needed to have extendability User imposed requirements –Inappropriate use of ISA Undefined bits being used Implementation imposed compatibility –Bug compatibility Pentium II had to reproduce the bugs of Pentium

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Today’s Issues Where should the DSI be placed? What control is given to the compiler (static) and what is relegated to the hardware (dynamic). –This is becoming a more pressing issue as the power crisis continue to grow Information flow across the DSI interface. –Speculation, predication, registers, analysis info There is an emerging difference between the target architecture and the implementation architecture. –Java,.NET

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 3: Instruction Set Architecture.

Similar presentations

Presentation on theme: "© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 3: Instruction Set Architecture."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 3: Instruction Set Architecture.

Similar presentations

Presentation on theme: "© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 3: Instruction Set Architecture."— Presentation transcript:

Similar presentations

About project

Feedback