Samira Khan University of Virginia Feb 14, 2017

Slides:



Advertisements
Similar presentations
Adding the Jump Instruction
Advertisements

CIS 314 Fall 2005 MIPS Datapath (Single Cycle and Multi-Cycle)
ELEN 468 Advanced Logic Design
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.
Randal E. Bryant Carnegie Mellon University CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University.
Randal E. Bryant CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture SequentialImplementation Slides.
Datapath Design II Topics Control flow instructions Hardware for sequential machine (SEQ) Systems I.
1 Seoul National University Pipelined Implementation : Part I.
Randal E. Bryant Carnegie Mellon University CS:APP CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
Randal E. Bryant adapted by Jason Fritts CS:APP2e CS:APP Chapter 4 Computer Architecture SequentialImplementation CS:APP Chapter 4 Computer Architecture.
Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Systems I.
Computer Architecture I: Outline and Instruction Set Architecture
1 Sequential CPU Implementation. 2 Outline Logic design Organizing Processing into Stages SEQ timing Suggested Reading 4.2,4.3.1 ~
1 SEQ CPU Implementation. 2 Outline SEQ Implementation Suggested Reading 4.3.1,
Sequential Hardware “God created the integers, all else is the work of man” Leopold Kronecker (He believed in the reduction of all mathematics to arguments.
Rabi Mahapatra CS:APP3e Slides are from Authors: Bryant and O Hallaron.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Sequential CPU Implementation Implementation. – 2 – Processor Suggested Reading - Chap 4.3.
1 Seoul National University Sequential Implementation.
CPSC 121: Models of Computation
Real-World Pipelines Idea Divide process into independent stages
CS161 – Design and Architecture of Computer Systems
Design of Digital Circuits Lecture 13: Multi-Cycle Microarch.
Prof. Onur Mutlu Carnegie Mellon University Spring 2014, 1/24/2014
Samira Khan University of Virginia Feb 9, 2017
Computer Organization
Lecture 13 Y86-64: SEQ – sequential implementation
Lecture 14 Y86-64: PIPE – pipelined implementation
Morgan Kaufmann Publishers
Performance of Single-cycle Design
ELEN 468 Advanced Logic Design
Computer Architecture Lecture 6: Multi-cycle Microarchitectures
Morgan Kaufmann Publishers The Processor
Course Outline Background Sequential Implementation Pipelining
Decode and Operand Read
Sequential Implementation
Processor (I).
Design of the Control Unit for Single-Cycle Instruction Execution
Pipelined Implementation : Part I
Processor Architecture: The Y86-64 Instruction Set Architecture
Seoul National University
Seoul National University
Instruction Decoding Optional icode ifun valC Instruction Format
Pipelined Implementation : Part II
Design of the Control Unit for One-cycle Instruction Execution
Systems I Pipelining II
Pipelined Implementation : Part I
Serial versus Pipelined Execution
Pipelining in more detail
A Multiple Clock Cycle Instruction Implementation
Processor Architecture: The Y86-64 Instruction Set Architecture
Seoul National University
Fields in the FALCON-A Instruction
Pipeline Architecture I Slides from: Bryant & O’ Hallaron
Pipelined Implementation : Part I
Guest Lecturer TA: Shreyas Chand
Computer Architecture
Systems I Pipelining II
Morgan Kaufmann Publishers The Processor
Systems I Pipelining II
Introduction to Computer Organization and Architecture
Sequential CPU Implementation
The Processor: Datapath & Control.
Sequential Design תרגול 10.
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Samira Khan University of Virginia Feb 14, 2017 Intro to Microarchitecture: Single-Cycle CS 3330 Samira Khan University of Virginia Feb 14, 2017

AGENDA Review from last lecture Single-cycle microarchitecture Evaluation of single-cycle microarchitecture

Recap of Last Week ISA basics and tradeoffs Instruction length Uniform vs. non-uniform decode Number of registers Addressing modes RISC vs. CISC properties

A Note on RISC vs. CISC Usually, … RISC CISC Simple instructions Fixed length Uniform decode Few addressing modes CISC Complex instructions Variable length Non-uniform decode Many addressing modes

Y86-64 Instruction Set #1 Byte 1 2 3 4 5 6 7 8 9 halt nop 1 1 2 3 4 5 6 7 8 9 halt nop 1 cmovXX rA, rB 2 fn rA rB irmovq V, rB 3 F rB V rmmovq rA, D(rB) 4 rA rB D mrmovq D(rB), rA 5 rA rB D OPq rA, rB 6 fn rA rB jXX Dest 7 fn Dest call Dest 8 Dest ret 9 pushq rA A rA F popq rA B rA F

A Very Basic Instruction Processing Engine Single-cycle machine What is the clock cycle time determined by? What is the critical path of the combinational logic determined by? AS’ AS (State) Combinational Logic

Instruction Processing “Stage” Instructions are processed under the direction of a “control unit” step by step. Instruction stage: Sequence of steps to process an instruction Fundamentally, there are five phases: Fetch Decode Execute Memory Store Result Not all instructions require all stages

Single-cycle vs. Multi-cycle: Control & Data Single-cycle machine: Control signals are generated in the same clock cycle as the one during which data signals are operated on Everything related to an instruction happens in one clock cycle (serialized processing) Multi-cycle machine: Control signals needed in the next cycle can be generated in the current cycle Latency of control processing can be overlapped with latency of datapath operation (more parallelism)

A Single-Cycle Microarchitecture A Closer Look

Let’s Start with the State Elements Reg Write Data and control inputs Register file A B W dstW srcA valA srcB valB valW MUX 1 PC MUX Select Mem Write Instruction Mem Instr Addr A L U Operation B Data Mem Address Read Write Mem Read

Instruction Processing 6 (5) generic steps Instruction fetch (IF) Instruction decode and register operand fetch (ID/RF) Execute/Evaluate memory address (EX/AG) Memory operand fetch (MEM) Store/writeback result (WB) PC Update MEM WB IF ID/RF EX/AG Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE new PC

Instruction Processing 6 (5) generic steps Instruction fetch (IF) Instruction decode and register operand fetch (ID/RF) Execute/Evaluate memory address (EX/AG) Memory operand fetch (MEM) Store/writeback result (WB) PC Update MEM WB IF ID/RF EX/AG Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE new PC

Instruction Processing 6 (5) generic steps Instruction fetch (IF) Instruction decode and register operand fetch (ID/RF) Execute/Evaluate memory address (EX/AG) Memory operand fetch (MEM) Store/writeback result (WB) PC Update MEM WB IF ID/RF EX/AG Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE new PC

Instruction Processing 6 (5) generic steps Instruction fetch (IF) Instruction decode and register operand fetch (ID/RF) Execute/Evaluate memory address (EX/AG) Memory operand fetch (MEM) Store/writeback result (WB) PC Update MEM WB IF ID/RF EX/AG Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE new PC

Single-Cycle Datapath for Arithmetic and Logical Instructions

Executing Arith./Logical Operation 6 fn rA rB OPq rA, rB Fetch Read 2 bytes Decode Read operand registers Execute Perform operation Set condition codes Memory Do nothing Write back Update register PC Update Increment PC by 2 if MEM[PC] == OPq rA, rB R[rB]  R[rB] op R[rA] PC  PC + 2

Stage Computation: Arith/Log. Ops OPq rA, rB icode:ifun  M1[PC] rA:rB  M1[PC+1] valP  PC+2 Fetch Read instruction byte Read register byte Compute next PC valA  R[rA] valB  R[rB] Decode Read operand A Read operand B valE  valB OP valA Set CC Execute Perform ALU operation Set condition code register Memory R[rB]  valE Write back Write back result PC  valP PC update Update PC Formulate instruction execution as sequence of simple steps Use same general form for all instructions

ALU Datapath Combinational state update logic Register file ValB rA DestE valA A L U PC Instruction Mem Instr Addr Data Address Read Write rB ValE DD Reg ALU OP 2 IF ID EX MEM WB PC if MEM[PC] == OPq rA, rB R[rB]  R[rB] op R[rA] PC  PC + 2 Combinational state update logic **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]

ALU Datapath Combinational state update logic Register file ValB rA DestE valA A L U PC Instruction Mem Instr Addr Data Address Read Write rB ValE DD Reg ALU OP 2 IF ID EX MEM WB PC if MEM[PC] == OPq rA, rB R[rB]  R[rB] op R[rA] PC  PC + 2 Combinational state update logic **Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]

Single-Cycle Datapath for Data Movement Instructions

Executing mrmovq (Load from Mem to Reg) mrmovq D(rB) ,rA 6 fn rA rB D Fetch Read 10 bytes Decode Read operand registers Execute Compute effective address Memory Read from memory Write back Write to Register PC Update Increment PC by 10 if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10

Stage Computation: mrmovq mrmovq D(rB), rA Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte valC  M8[PC+2] Read displacement D valP  PC+10 Compute next PC Decode valB  R[rB] Read operand B valE  valB + valC Execute Compute effective address valM  M8[valE] Memory Write value to memory R[rA]  valM Write back PC  valP PC update Update PC Use ALU for address computation if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10

Ld Datapath Combinational state update logic Register file Instruction ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Ld Datapath: Calculate the Effective Address Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Ld Datapath: Loaded memory needs to be written to the Register File ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Ld Datapath: Need the Destination Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Ld Datapath: Update PC Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Executing rmmovq (St from reg to Memory) rmmovq rA, D(rB) 4 rA rB D Fetch Read 10 bytes Decode Read operand registers Execute Compute effective address Memory Write to memory Write back Do nothing PC Update Increment PC by 10 if MEM[PC]== rmmovq rA, Disp (rB) EA = Disp + R[rB] MEM[EA]  R[rA] PC  PC + 10

Stage Computation: rmmovq rmmovq rA, D(rB) Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte valC  M8[PC+2] Read displacement D valP  PC+10 Compute next PC valA  R[rA] valB  R[rB] Decode Read operand A Read operand B valE  valB + valC Execute Compute effective address M8[valE]  valA Memory Write value to memory Write back PC  valP PC update Update PC Use ALU for address computation if MEM[PC]== rmmovq rA, Disp (rB) EA = Disp + R[rB] MEM[EA]  R[rA] PC  PC + 10

St Datapath Combinational state update logic Register file Instruction ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== rmmovq rA, Disp (rB) EA = Disp + R[rB] MEM[EA]  R[rA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

St Datapath: Calculate the effective address Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== rnmovq rA, Disp (rB) EA = Disp + R[rB] MEM[EA]  R[rA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

St Datapath: Store data needs to reach memory Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== rnmovq rA, Disp (rB) EA = Disp + R[rB] MEM[EA]  R[rA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

St Datapath: Update PC Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== rnmovq rA, Disp (rB) EA = Disp + R[rB] MEM[EA]  R[rA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Executing irmovq (Move imm to Reg) irmovq V, rB 3 F rB V Fetch Read 10 bytes Decode Read operand registers Execute Add 0 to V Memory Do nothing Write back Write V to rB PC Update Increment PC by 10 if MEM[PC]== irmovq V, rB R[rB]  V + 0 PC  PC + 10

Stage Computation: irmovq irmovq V, rB Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte valC  M8[PC+2] Read displacement D valP  PC+10 Compute next PC Decode valE  0 + valC Execute Compute effective address R[rB]  valA Memory Write value to memory Write back PC  valP PC update Update PC Use ALU for address computation if MEM[PC]== irmovq V, rB R[rB]  V + 0 PC  PC + 10

irmov Datapath: Option 1 Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== irmovq V, rB R[rB]  V + 0 PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

irmov Datapath Option 1: read register and add zero file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== irmovq V, rB R[rB]  V + 0 PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

irmov Datapath Option 1: Writeback value Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== irmovq V, rB R[rB]  V + 0 PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

irmov Datapath Option 1: Writeback value Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== irmovq V, rB R[rB]  V + 0 PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

irmov Datapath Option 1: Update PC Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== irmovq V, rB R[rB]  V + 0 PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

irmov Datapath Option 2 Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== irmovq V, rB R[rB]  V PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

irmov Datapath: Option 2 Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X rB M U X rA D M U X From ALU 10 From Mem MUX Select if MEM[PC]== irmovq V, rB R[rB]  V PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Tradeoffs between option 1 and option 2? Extra 2-to-1 MUX? 3-to-1 MUX? What is the cost for n-to-1 MUX? Wire length? Generality?

Executing rrmovq (Move from Reg to Reg) rrmovq rA, rB 2 rA rB Memory Do nothing Write back Write val rA to rB PC Update Increment PC by 2 Fetch Read 2 bytes Decode Read operand register rA Execute Add 0 to val rA if MEM[PC]== rrmovq rA, rB R[rB]  R[rA] + 0 PC  PC + 2

Stage Computation: rrmovq rrmovq rA, rB Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte Read displacement D valP  PC+2 Compute next PC ValA  R[rA] Decode valE  0 + valA Execute Compute effective address Memory Write value to memory R[rB]  valE Write back PC  valP PC update Update PC if MEM[PC]== rrmovq rA, rB R[rB]  R[rA] + 0 PC  PC + 2

rrmov Datapath: Option 1 Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 2 From Mem MUX Select if MEM[PC]== rrmovq rA, rB R[rB]  R[rA] + 0 PC  PC + 2 IF ID EX MEM WB PC Combinational state update logic

rrmov Datapath: Option 1 Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 2 From Mem MUX Select if MEM[PC]== rrmovq rA, rB R[rB]  R[rA] + 0 PC  PC + 2 IF ID EX MEM WB PC Combinational state update logic

rrmov Datapath: Option 1 Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 2 From Mem MUX Select if MEM[PC]== rrmovq rA, rB R[rB]  R[rA] + 0 PC  PC + 2 IF ID EX MEM WB PC Combinational state update logic

rrmov Datapath: Option 1 Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X rB M U X rA D M U X From ALU 2 From Mem MUX Select if MEM[PC]== rrmovq rA, rB R[rB]  R[rA] + 0 PC  PC + 2 IF ID EX MEM WB PC Combinational state update logic

rrmov Datapath: Option 2 Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X rB M U X rA ? D M U X From ALU 10 From Mem MUX Select if MEM[PC]== rrmovq rA, rB R[rB]  R[rA] PC  PC + 2 IF ID EX MEM WB PC Combinational state update logic

Single-Cycle Datapath for Control Flow Instructions

Executing Jumps Fetch Decode Execute Memory Write back PC Update jmp 7 jle 1 jl 2 je 3 jne 4 jge 5 jg 6 jXX Dest 7 fn Dest Not taken fall thru: XX target: XX Taken Fetch Read 9 bytes Increment PC by 9 Decode Do nothing Execute Determine whether to take branch based on jump condition and condition codes Memory Do nothing Write back PC Update Set PC to Dest if branch taken or to incremented PC if not branch if MEM[PC]== jmp Dest if (cond) PC  Dest else PC  PC + 9

Stage Computation: Jumps jXX Dest icode:ifun  M1[PC] valC  M8[PC+1] valP  PC+9 Fetch Read instruction byte Read destination address Fall through address Decode Cnd  Cond(CC,ifun) Execute Take branch? Memory Write back PC  Cnd ? valC : valP PC update Update PC Compute both addresses Choose based on setting of condition codes and branch condition if MEM[PC]== jmp Dest if (cond) PC  Dest else PC  PC + 9

Jump Combinational state update logic Register file Instruction Mem ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X M U X rB M U X rA D 2 M U X From ALU M U X 10 From Mem 9 MUX Select IF ID EX MEM WB PC if MEM[PC]== jmp Dest PC  Dest Combinational state update logic

Unconditional Jump Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X M U X rB M U X rA D 2 M U X From ALU M U X 10 From Mem 9 MUX Select IF ID EX MEM WB PC if MEM[PC]== jmp Dest PC  Dest Combinational state update logic

Conditional Jump Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write M U X M U X M U X rB M U X rA D 2 M U X From ALU M U X 10 From Mem 9 MUX Select IF ID EX MEM WB PC if MEM[PC]== jmp Dest if (cond) PC  Dest else PC  PC + 9 Combinational state update logic

Executing call Fetch Decode Execute Memory Write back PC Update call Dest 8 Dest XX return: target: Fetch Read 9 bytes Increment PC by 9 Decode Read stack pointer Execute Decrement stack pointer by 8 Memory Write incremented PC to new value of stack pointer Write back Update stack pointer PC Update Set PC to Dest if MEM[PC]== call Dest MEM [R[rsp]-8]  PC + 9 R[rsp]  R[rsp] - 8 PC  Dest

Stage Computation: call call Dest icode:ifun  M1[PC] valC  M8[PC+1] valP  PC+9 Fetch Read instruction byte Read destination address Compute return point valB  R[%rsp] Decode Read stack pointer valE  valB + –8 Execute Decrement stack pointer M8[valE]  valP Memory Write return value on stack R[%rsp]  valE Write back Update stack pointer PC  valC PC update Set PC to destination Use ALU to decrement stack pointer Store incremented PC if MEM[PC]== call Dest MEM [R[rsp]-8]  PC + 9 R[rsp]  R[rsp] - 8 PC  Dest

Call Combinational state update logic Register file Instruction Mem ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write rB M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== call Dest MEM [R[rsp]-8]  PC + 9 R[rsp]  R[rsp] - 8 PC  Dest IF ID EX MEM WB PC Combinational state update logic

Call: Decrement stack pointer Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write rB M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== call Dest MEM [R[rsp]-8]  PC + 9 R[rsp]  R[rsp] - 8 PC  Dest IF ID EX MEM WB PC Combinational state update logic

Call: Write return address to memory Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write rB M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== call Dest MEM [R[rsp]-8]  PC + 9 R[rsp]  R[rsp] - 8 PC  Dest IF ID EX MEM WB PC Combinational state update logic

Call: Update rsp in the register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write rB M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== call Dest MEM [R[rsp]-8]  PC + 9 R[rsp]  R[rsp] - 8 PC  Dest IF ID EX MEM WB PC Combinational state update logic

Call: Update PC Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP Mem Write rB M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== call Dest MEM [R[rsp]-8]  PC + 9 R[rsp]  R[rsp] - 8 PC  Dest IF ID EX MEM WB PC Combinational state update logic

Executing ret Fetch Decode Execute Memory Write back PC Update 9 return: XX Fetch Read 1 byte Decode Read stack pointer Execute Increment stack pointer by 8 Memory Read return address from old stack pointer Write back Update stack pointer PC Update Set PC to return address if MEM[PC]== ret PC  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Stage Computation: ret icode:ifun  M1[PC] Fetch Read instruction byte valA  R[%rsp] valB  R[%rsp] Decode Read operand stack pointer valE  valB + 8 Execute Increment stack pointer valM  M8[valA] Memory Read return address R[%rsp]  valE Write back Update stack pointer PC  valM PC update Set PC to return address Use ALU to increment stack pointer Read return address from memory if MEM[PC]== ret PC  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Ret Combinational state update logic Register file Instruction Mem ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== ret PC  MEM [R[rsp]] R[rsp]  R[rsp] + 8 IF ID EX MEM WB PC Combinational state update logic

Ret: Increment stack pointer Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== ret PC  MEM [R[rsp]] R[rsp]  R[rsp] + 8 IF ID EX MEM WB PC Combinational state update logic

Ret: Read return address from Memory Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== ret PC  MEM [R[rsp]] R[rsp]  R[rsp] + 8 IF ID EX MEM WB PC Combinational state update logic

Ret: Update PC Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== ret PC  MEM [R[rsp]] R[rsp]  R[rsp] + 8 IF ID EX MEM WB PC Combinational state update logic

Ret: Update rsp Combinational state update logic Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== ret PC  MEM [R[rsp]] R[rsp]  R[rsp] + 8 IF ID EX MEM WB PC Combinational state update logic

Single-Cycle Datapath for Stack Operations

Executing popq Fetch Decode Execute Memory Write back PC Update popq rA b rA 8 Fetch Read 2 bytes Decode Read stack pointer Execute Increment stack pointer by 8 Memory Read from old stack pointer Write back Update stack pointer Write result to register PC Update Increment PC by 2 if MEM[PC]== pop R[rA]  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Stage Computation: popq popq rA icode:ifun  M1[PC] rA:rB  M1[PC+1] valP  PC+2 Fetch Read instruction byte Read register byte Compute next PC valA  R[%rsp] valB  R[%rsp] Decode Read stack pointer valE  valB + 8 Execute Increment stack pointer valM  M8[valA] Memory Read from stack R[%rsp]  valE R[rA]  valM Write back Update stack pointer Write back result PC  valP PC update Update PC Use ALU to increment stack pointer Must update two registers Popped value New stack pointer if MEM[PC]== pop R[rA]  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Pop Combinational state update logic Register file Instruction Mem ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== pop R[rA]  MEM [R[rsp]] R[rsp]  R[rsp] + 8 IF ID EX MEM WB PC Combinational state update logic

Pop Data Instruction Register Mem file ValA rB DestE valB PC Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp ValM DestM 2 M U X D M U X From ALU 10 From Mem 9 From PC + x MUX Select

Pop: Increment stack pointer Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp ValM DestM 2 M U X D M U X From ALU 10 From Mem 9 From PC + x MUX Select if MEM[PC]== pop R[rA]  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Pop: Read pushed value from Memory Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp ValM DestM 2 M U X D M U X From ALU 10 From Mem 9 From PC + x MUX Select if MEM[PC]== pop R[rA]  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Pop: Move the value to register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp ValM DestM 2 M U X D M U X From ALU 10 From Mem 9 From PC + x MUX Select if MEM[PC]== pop R[rA]  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Pop: Writeback rsp value Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp ValM DestM 2 M U X D M U X From ALU 10 From Mem 9 From PC + x MUX Select if MEM[PC]== pop R[rA]  MEM [R[rsp]] R[rsp]  R[rsp] + 8

Pop Data Instruction Register Mem file ValA rB DestE valB PC Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp ValM DestM 2 M U X D M U X From ALU 10 From Mem 9 From PC + x MUX Select

Pop Data Instruction Register Mem file ValA rB DestE valB PC Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp ValM DestM 2 M U X D From ALU 10 From Mem 9 From PC + x

Evaluating the Single-Cycle Microarchitecture

A Single-Cycle Microarchitecture Is this a good idea/design? When is this a good design? When is this a bad design? How can we design a better microarchitecture?

A Single-Cycle Microarchitecture: Analysis Every instruction takes 1 cycle to execute CPI (Cycles per instruction) is strictly 1 How long each instruction takes is determined by how long the slowest instruction takes to execute Even though many instructions do not need that long to execute Clock cycle time of the microarchitecture is determined by how long it takes to complete the slowest instruction Critical path of the design is determined by the processing time of the slowest instruction

What is the Slowest Instruction to Process? Let’s go back to the basics All six phases of the instruction processing cycle take a single machine clock cycle to complete Fetch Decode Evaluate Address Fetch Operands Execute Store Result Do each of the above phases take the same time (latency) for all instructions? 1. Instruction fetch (IF) 2. Instruction decode and register operand fetch (ID/RF) 3. Execute/Evaluate memory address (EX/AG) 4. Memory operand fetch (MEM) 5. Store/writeback result (WB)

Single-Cycle Datapath Analysis Assume memory units (read or write): 200 ps ALU and adders: 100 ps register file (read or write): 50 ps other combinational logic: 0 ps steps IF ID EX MEM WB Delay resources mem RF ALU Reg-Reg Op 200 50 100 400 ir mov LW 600 SW 550 Call Jump

Ld Datapath 350 ps 250 ps 200 ps 550 ps 600 ps 100 ps Combinational Register file ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP 350 ps 250 ps 200 ps 550 ps M U X rB M U X rA 600 ps D 100 ps M U X From ALU 10 From Mem MUX Select if MEM[PC]== mrmovq Disp (rB), rA EA = Disp + R[rB] R[rA]  MEM[EA] PC  PC + 10 IF ID EX MEM WB PC Combinational state update logic

Single Cycle uArch: Complexity Contrived All instructions run as slow as the slowest instruction Inefficient Must provide worst-case combinational resources in parallel as required by any instruction Need to replicate a resource if it is needed more than once by an instruction during different parts of the instruction processing cycle Not necessarily the simplest way to implement an ISA Single-cycle implementation of REP MOVS (x86) or INDEX (VAX)? Not easy to optimize/improve performance Optimizing the common case does not work (e.g. common instructions) Need to optimize the worst case all the time

(Micro)architecture Design Principles Critical path design Find and decrease the maximum combinational logic delay Break a path into multiple cycles if it takes too long Bread and butter (common case) design Spend time and resources on where it matters most i.e., improve what the machine is really designed to do Common case vs. uncommon case Balanced design Balance instruction/data flow through hardware components Design to eliminate bottlenecks: balance the hardware for the work

Single-Cycle Design vs. Design Principles Critical path design Bread and butter (common case) design Balanced design How does a single-cycle microarchitecture fare in light of these principles?

Samira Khan University of Virginia Feb 14, 2017 Intro to Microarchitecture: Single-Cycle CS 3330 Samira Khan University of Virginia Feb 14, 2017

Stage Computation: Cond. Move cmovXX rA, rB icode:ifun  M1[PC] rA:rB  M1[PC+1] valP  PC+2 Fetch Read instruction byte Read register byte Compute next PC valA  R[rA] valB  0 Decode Read operand A valE  valB + valA If ! Cond(CC,ifun) rB  0xF Execute Pass valA through ALU (Disable register update) Memory R[rB]  valE Write back Write back result PC  valP PC update Update PC Read register rA and pass through ALU Cancel move by setting destination register to 0xF If condition codes & move condition indicate no move if MEM[PC]== cmov rA, rB if (cond) R[rB]  R[rA] If (!cond) R[rB]  0xF

Combinational state update logic Register file Instruction Mem Data ValA rB DestE valB A L U PC Instruction Mem Instr Addr Data Address Read Write rA ValE DD Reg ALU OP D Mem Write rB M U X M U X M U X M U X rsp M U X rB 8 M U X rA M U X rsp D 2 M U X From ALU M U X 10 From Mem 9 MUX Select From PC + x if MEM[PC]== cmov rA, rB if (cond) R[rB]  R[rA] If (!cond) R[rB]  0xF IF ID EX MEM WB PC Combinational state update logic