1 Lecture 3: Instruction Sets Section 1.3, Sections Technology trends Design issues in defining an instruction set Register and memory access Instruction and operand types
2 Processor Technology Trends Shrinking of transistor sizes: 250nm (1997) 130nm (2002) 70nm (2008) 35nm (2014) Transistor density increases by 35% per year and die size increases by 10-20% per year… functionality improvements! Transistor speed improves linearly with size (complex equation involving voltages, resistances, capacitances)… clock speed improvements! Wire delays do not scale down at the same rate as logic delays… the Pentium 4 has pipeline stages for wire delays
3 Technology Trends DRAM density increases by 40-60% per year, latency has reduced by 33% in 10 years (the memory wall!), bandwidth improves twice as fast as latency decreases Disk density improves by 100% every year, latency improvement similar to DRAM Networks: primary focus on bandwidth; 10Mb 100Mb in 10 years; 100Mb 1Gb in 5 years
4 Power Consumption Trends Dyn power activity x capacitance x voltage 2 x frequency Capacitance per transistor and voltage are decreasing, but number of transistors and frequency are increasing at a faster rate Leakage power is also rising and will soon match dynamic power Power consumption is already between W in high-performance processors today
5 Notable Points Complexity-effective design is important: a complex design takes longer to build, verify, and consumes more power Don’t forget about software cost while evaluating a system’s cost-performance Similarly, power-performance of a single component is misleading Can’t use CPI or IPC while comparing different ISAs Don’t rely on peak performance metrics or on results obtained with synthetic benchmarks
6 The Effect of Clock Speed Even with the same instruction set, performance does not closely track clock speed – depends on the benchmark set and processor functionalities Even within the same processor family, performance improvements are slower than clock speed improvements
7 ISAs for Different Segments Instruction sets for all three segments are very similar Desktops: equal emphasis for int and fp, little regard for code size and power Servers: little need for high floating-point performance Embedded: emphasis on low cost and power – code size is important, floating-point may be optional Desktops and embedded also care about multimedia apps -- hence, use special media extension instructions
8 RISC Vs. CISC Complex Instruction Set Computer: if you do it in hardware, it’s fast hence, implement every functionality in hardware rich instruction set complex decoding complex analysis to identify dependences Reduced Instruction Set Computer: by using a few simple instruction primitives, the hardware is simpler easy to extract parallelism easy to effect high clock speeds x86 is CISC and is popular for compatibility reasons – CISC instrs are converted to RISC instrs in hardware
9 Accessing Internal Storage Implicit or explicit operands? – compact or flexible? Representing C = A + B Stack Accumulator Reg (reg-mem) Reg (load-store) Push A Load A Load R1, A Load R1, A Push B Add B Add R3, R1, B Load R2, B Add Store C Store R3, C Add R3, R1, R2 Pop C Store R3, C Registers: fast, exploit locality, reduced memory traffic, easier to re-order
10 Register Architectures TypeAdvantagesDisadvantagesExamples Register- Register (0 mem, 3 ops) Simple, fixed-length, simple code- generation, easy pipelining and parallelism extraction High instr count and code size Alpha, MIPS, ARM, PowerPC, SPARC Register- Memory (1 mem, 2 ops) Can access data without doing a load, small code size One of the operands is destroyed, instr latency is variable Intel 80x86, Motorola Memory- Memory (2 mem, 2 ops) or (3, 3) Most compact code size, doesn’t waste registers Variation in instr size (hard to decode), frequent memory accesses, variable instr latency VAX
11 Addressing Modes for Memory More addressing modes low instr counts, more complexity (CISC-like) Most common modes: immediate and displacement Displacement and immediate values: often require fewer than 8 bits, but also often require 16 bits Addressing modeExample instrMeaning RegisterAdd R4, R3Regs[R4] Regs[R4] + Regs[R3] ImmediateAdd R4, #3Regs[R4] Regs[R4] + 3 DisplacementAdd R4, 100(R1)Regs[R4] Regs[R4] + Mem[100+Regs[R1]] Register indirectAdd R4, (R1)Regs[R4] Regs[R4] + Mem[Regs[R1]] Direct/absoluteAdd R1,(1001)Regs[R1] Regs[R1] + Mem[1001] Memory indirectAdd Regs[R1] + Mem[Mem[Regs[R3]]]
12 Interpreting Memory Addresses Most computers are byte addressed and also allow access to half words (16 bits), words (32), and double words (64) Accesses are usually required to be aligned: a half word can not have an odd address, a double word must have an address A, where A mod 8 = 0, etc. Misalignment increases hardware complexity and worsens performance (if data cross cache line boundaries)
13 Little and Big Endian Consider a 64-bit quantity, composed of bytes 0-7 (LSB-MSB) In Little-Endian format, memory address A will contain byte 0, address A+1 will contain byte 1,….address A+7 will contain byte 7 Advantage: easier to organize bytes, half-words, words, double words, etc. into registers (Alpha, x86) In Big-Endian format, memory address A will contain byte 7, address A+1 will contain byte 6,… address A+7 will contain byte 0 Advantage: values are stored in the order they are printed out, the sign is available early (Motorola)
14 Endianness Example Consider the hexadecimal number: MSB 0x 43fa27c77156ab91 LSB Two options: 43fa27c77156ab91 address ab5671c727fa43
15 Endianness Example Consider the hexadecimal number: MSB 0x 43fa27c77156ab91 LSB Two options: 43fa27c77156ab91 address ab5671c727fa43 Little-endian Big-endian
16 Common Operations Operator TypeExamples Arithmetic/LogicalAdd, sub, and, or, mult, div Data transferLoads/stores ControlBranch, jump, call, return SystemOS call, virtual memory management Floating pointFP add, sub, mult, div DecimalDecimal add, sub, mult, decimal to character conversions StringMove, compare, search GraphicsCompression/decompression, vertex/pixel ops
17 Common Operations 80x86 instructionInteger average (% total executed) Load22% Conditional branch20% Compare16% Store12% Add8% And6% Sub5% Move register-register4% Call/Return2%
18 Title Bullet