Download presentation
Presentation is loading. Please wait.
Published byViolet Harper Modified over 8 years ago
1
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Lecture 3: Instruction Set Architecture
2
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Outline Instruction Set Architecture –Traditional issues –The (old) debate: RISC vs. CISC –New issues
3
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The Big Picture Requirements Algorithms Prog. Lang./OS ISA uArch Circuit Device Problem Focus Performance Focus f2() { f3(s2, &j, &i); *s2->p = 10; i = *s2->q + i; } i1: ld r1, b i2: ld r2, c i3: ld r5, z i4: mul r6, r5, 3 i5: add r3, r1, r2 f1f2 f3 f4 f5 sq p j i fp f3 SPEC
4
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Instruction Set Architecture Application Instruction Set Architecture Implementation …SPARC MIPSARM x86 HP-PA IA-64… Intel Pentium X AMD K6, Athlon, Opteron Transmeta Crusoe TM5x00
5
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Instruction Set Architecture Strong influence on cost/performance New ISAs are rare, but versions are not –16-bit, 32-bit and 64-bit X86 versions Longevity is a strong function of marketing prowess
6
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Strongly constrained by the number of bits available to instruction encoding Opcodes/operands Registers/memory Addressing modes Orthogonality 0, 1, 2, 3 address machines Instruction formats Decoding uniformity Traditional Issues
7
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Instruction Formats Alpha (fixed length) 32 bits 6 bits opcode RA RB RC TRAP Branch Mem Operate x86 (variable length) prefixesopcodeaddr modedisplimm 0 to 4 bytes of prefix 1 or 2 bytes of opcode 0 to 2 bytes (ModR/M and SIB) 0 to 8 bytes
8
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The (old) Debate : RISC vs. CISC At the time, IBM 370 and VAX dominated CMOS was up and coming technology –Small number of transistors per chip RISC was appealing –lower design complexity –easier to pipeline –higher performance when fit on a chip IBM 801 (Cocke et al, 1982) RISC I (Patterson et al, 1982) MIPS (Hennesey et al, 1982)
9
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois What is RISC? Fixed length instructions Few formats Load/Store Few addressing modes Simple decode/control Many registers Few “unpipelinable” insts Compiler Complexity Hardware Complexity
10
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The MIPS Pipeline Compiler knows the pipeline organization Schedules instructions around “hazards” Branches are handled by delay slots No need to “interlock” the pipeline FetchDecodeALUMemoryWriteBack
11
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Pipelining a CISC [Patt et al 85] Fetch Instruction Bytes Decode µOp store Emits RISC-like micro-operations RF ReadExecuteMemWB
12
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois RISC Baggage In hindsight, like CISC, even RISC architectures suffered from legacy effects. –Delay slots Used for dealing with branches in short pipelines Helps primarily with target generation Becomes a burden for the future generations whose pipelines need to be deeper –Register windows Quick save/restore state for procedure calls Reduce procedure call overhead to 1 cycle Makes register renaming and out of order execution more complex
13
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois The Dynamic-Static Interface Perhaps the main contribution of the “RISC Revolution” John Cocke (IBM) is credited for the original idea. John Hennessy a major driving force later, followed by IMPACT team at Illinois. “…a willingness to make design tradeoffs freely… between the architecture and implementation…” Colwell et al, 1985. This legacy is still alive and kicking today. DSI
14
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois DSI and Static Optimization Granularity of ISA instruction ISAs for reconfigurable architectures VAX ISA MIPS ISA Potential for Static Optimization Itanium ISA
15
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Variable Instruction Format Motivation 1: to accommodate a large number of opcodes with nonuniform frequency of occurrence –VAX has 304 opcodes. If we insisted using uniform opcode encoding, we would need 9 bits. Due to the policy of byte alignment, one needs two bytes to encode each VAX opcode. –An observation: some opcodes are used more often than the others. The top 200 opcodes acount for about 98% of the dynamic opcode usage. –Instead of using 2 bytes to encode all the 304 opcodes, use 1 byte to encode the frequently used ones and use 2 bytes to encode the infrequently used ones.
16
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Variable Instruction Format (cont.) Motivation 2: to allow each instr. exactly the number of operands it needs. –RET (0), INC (1), ADD (2 or 3), … Motivation 3: to allow each operand specifier exactly the number of bytes ineeds. –Reg (1), Disp (1 to 8), … All motvations come from reducing the amount of bytes needed to –represent the program –be fetched during execution
17
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois VIF Cost Sequential Decoding Problem –The decoder cannot be sure where the 1st operand specifier is until the opcode is decoded. –The decoder cannot locate the ith operand specifier is until the (i-1)th operand specifier is decoded. –The decoder can not be sure where the jth instruction starts until the last operand specifier of the (j-1)th instruction is decode. –Typical solution: instruction buffering with multi- stage decode pipeline plus post-decode I-cache or trace cache
18
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois VIF Cost (cont.) Non-aligned Instruction Access: instructions are not aligned to any byte position in each memory word. –Instruction opcodes and operand specifiers are not aligned to the decoding logic when fetched from memory. –Instructions may spill over cache block boundaries and page boundaries –Typical solution: instruction buffer that decouples fetch and decode
19
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Data Dependent Decoding What an instruction does depends on the values of the explicit and/or implicit input operands. –Cause: generality of instructions –Example: string move instructions in x86 generates different number of loads and stores according to an input operand value –Typical solution: use microcode when executing these instructions
20
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Number of Registers The large number of registers allows the compiler to eliminate memory references and redundant computation by storing more values in the register file –Cost: more bits to encode register operends –Benefit: suppot for the compiler to achieve high performance –MIPS had 32, IA-64 has 128 (levels of metallines is a factor here)
21
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Compatibility – subtle issues Most from incomplete ISA specification –Needed to have extendability User imposed requirements –Inappropriate use of ISA Undefined bits being used Implementation imposed compatibility –Bug compatibility Pentium II had to reproduce the bugs of Pentium
22
© Wen-mei Hwu and S. J. Patel, 2005 ECE 511, University of Illinois Today’s Issues Where should the DSI be placed? What control is given to the compiler (static) and what is relegated to the hardware (dynamic). –This is becoming a more pressing issue as the power crisis continue to grow Information flow across the DSI interface. –Speculation, predication, registers, analysis info There is an emerging difference between the target architecture and the implementation architecture. –Java,.NET
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.