EEL5708/Bölöni Lec 3.1 Fall 2004 Sept 1, 2004 Lotzi Bölöni Fall 2004 EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets
EEL5708/Bölöni Lec 3.2 Fall 2004 Acknowledgements All the lecture slides were adopted from the slides of David Patterson (1998, 2001) and David E. Culler (2001), Copyright , University of California Berkeley
EEL5708/Bölöni Lec 3.3 Fall 2004 Review: Instruction sets
EEL5708/Bölöni Lec 3.4 Fall 2004 The Instruction Set: a Critical Interface instruction set software hardware
EEL5708/Bölöni Lec 3.5 Fall 2004 Levels of Representation High Level Language Program Assembly Language Program Machine Language Program Control Signal Specification Compiler Assembler Machine Interpretation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw$15,0($2) lw$16,4($2) sw$16,0($2) sw$15,4($2) °°°° ALUOP[0:3] <= InstReg[9:11] & MASK
EEL5708/Bölöni Lec 3.6 Fall 2004 Instruction Set Architecture... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. – Amdahl, Blaaw, and Brooks, 1964SOFTWARE -- Organization of Programmable Storage -- Data Types & Data Structures: Encodings & Representations -- Instruction Formats -- Instruction (or Operation Code) Set -- Modes of Addressing and Accessing Data Items and Instructions -- Exceptional Conditions
EEL5708/Bölöni Lec 3.7 Fall 2004 Review: MIPS R3000 (core) 0 r0 r1 ° r31 PC lo hi Programmable storage 2^32 x bytes 31 x 32-bit GPRs (R0=0) 32 x 32-bit FP regs (paired DP) HI, LO, PC Data types ? Format ? Addressing Modes? Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI SLL, SRL, SRA, SLLV, SRLV, SRAV Memory Access LB, LBU, LH, LHU, LW, LWL,LWR SB, SH, SW, SWL, SWR Control J, JAL, JR, JALR BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL 32-bit instructions on word boundary
EEL5708/Bölöni Lec 3.8 Fall 2004 Review: Basic ISA Classes Accumulator: 1 addressadd Aacc acc + mem[A] 1+x addressaddx Aacc acc + mem[A + x] Stack: 0 addressaddtos tos + next General Purpose Register: 2 addressadd A BEA(A) EA(A) + EA(B) 3 addressadd A B CEA(A) EA(B) + EA(C) Load/Store: 3 addressadd Ra Rb RcRa Rb + Rc load Ra RbRa mem[Rb] store Ra Rbmem[Rb] Ra
EEL5708/Bölöni Lec 3.9 Fall 2004 Instruction Formats Variable: Fixed: Hybrid: … Addressing modes –each operand requires address specifier => variable format code size => variable length instructions performance => fixed length instructions –simple decoding, predictable operations With load/store instruction arch, only one memory address and few addressing modes => simple format, address mode given by opcode
EEL5708/Bölöni Lec 3.10 Fall 2004 MIPS Addressing Modes & Formats Simple addressing modes All instructions 32 bits wide oprsrtrd immed register Register (direct) oprsrt register Base+index + Memory immedoprsrt Immediate immedoprsrt PC PC-relative + Memory Register Indirect?
EEL5708/Bölöni Lec 3.11 Fall 2004 Execution Cycle Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor instruction
EEL5708/Bölöni Lec 3.12 Fall 2004 Review: Measuring performance
EEL5708/Bölöni Lec 3.13 Fall 2004 Which is faster? Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concorde Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200
EEL5708/Bölöni Lec 3.14 Fall 2004 Performance(X) Execution_time(Y) n == Performance(Y) Execution_time(X) Definitions Performance is in units of things per sec –bigger is better If we are primarily concerned with response time –performance(x) = 1 execution_time(x) " X is n times faster than Y" means
EEL5708/Bölöni Lec 3.15 Fall 2004 Computer Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X inst count CPI Cycle time
EEL5708/Bölöni Lec 3.16 Fall 2004 Cycles Per Instruction (Throughput) “Instruction Frequency” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count “Average Cycles per Instruction”
EEL5708/Bölöni Lec 3.17 Fall 2004 Example: Calculating CPI bottom up Typical Mix of instruction types in program Base Machine (Reg / Reg) OpFreqCyclesCPI(i)(% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5
EEL5708/Bölöni Lec 3.18 Fall 2004 Example: Branch Stall Impact Assume CPI = 1.0 ignoring branches (ideal) Assume solution was stalling for 3 cycles If 30% branch, Stall 3 cycles on 30% OpFreqCyclesCPI(i)(% Time) Other 70%1.7(37%) Branch30%4 1.2(63%) => new CPI = 1.9 New machine is 1/1.9 = 0.52 times faster (i.e. slow!)