Presentation is loading. Please wait.

Presentation is loading. Please wait.

Single-Cycle DataPath

Similar presentations


Presentation on theme: "Single-Cycle DataPath"— Presentation transcript:

1 Single-Cycle DataPath
Lecture 15 CDA 3103

2 Review of Virtual Memory
Next level in the memory hierarchy Provides illusion of very large main memory Working set of “pages” residing in main memory (subset of all pages residing on disk) Main goal: Avoid reaching all the way back to disk as much as possible Additional goals: Let OS share memory among many programs and protect them from each other Each process thinks it has all the memory to itself

3 Review: Paging Terminology
Programs use virtual addresses (VAs) Space of all virtual addresses called virtual memory (VM) Divided into pages indexed by virtual page number (VPN) Main memory indexed by physical addresses (PAs) Space of all physical addresses called physical memory (PM) Divided into pages indexed by physical page number (PPN)

4 Review: Translation Look-Aside Buffers (TLBs)
TLBs usually small, typically entries Like any other cache, the TLB can be direct mapped, set associative, or fully associative hit VA PA TLB Lookup Cache Main Memory Processor miss miss hit data Trans- lation On TLB miss, get page table entry from main memory

5 Review: Memory Hierarchy
Regs Upper Level Instr. Operands Faster Cache Blocks L2 Cache Blocks { Last Week: Virtual Memory Memory Pages Disk Files Larger Tape Lower Level

6 Review Example 1 A set-associative cache consists of 64 lines, or slots, divided into four-line sets. Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.

7 Solution The cache is divided into 16 sets of 4 lines each. Therefore, 4 bits are needed to identify the set number. Main memory consists of 4K = 212 blocks. Therefore, the set plus tag lengths must be 12 bits and therefore the tag length is 8 bits. Each block contains 128 words. Therefore, 7 bits are needed to specify the word.

8 Review Example 2 A two-way set-associative cache has lines of 16 bytes and a total size of 8 kbytes. The 64-Mbyte main memory is byte addressable. Show the format of main memory addresses.

9 Solution There are a total of 8 kbytes/16 bytes = 512 lines in the cache. Thus the cache consists of 256 sets of 2 lines each. Therefore 8 bits are needed to identify the set number. For the 64-Mbyte main memory, a 26-bit address is needed. Main memory consists of 64-Mbyte/16 bytes = 222 blocks. Therefore, the set plus tag lengths must be 22 bits, so the tag length is 14 bits and the word field length is 4 bits.

10 Agenda Stages of the Datapath Datapath Instruction Walkthroughs
Datapath Design Dr Dan Garcia

11 Five Components of a Computer
Keyboard, Mouse Computer Devices Memory (passive) (where programs, data live when running) Processor Disk (where programs, data live when not running) Input Control Output Datapath Display, Printer Dr Dan Garcia

12 The CPU Processor (CPU): the active part of the computer that does all the work (data manipulation and decision-making) Datapath: portion of the processor that contains hardware necessary to perform operations required by the processor (the brawn) Control: portion of the processor (also in hardware) that tells the datapath what needs to be done (the brain) Dr Dan Garcia

13 Stages of the Datapath : Overview
Problem: a single, atomic block that “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath smaller stages are easier to design easy to optimize (change) one stage without touching the others Dr Dan Garcia

14 Five Stages of the Datapath
Stage 1: Instruction Fetch Stage 2: Instruction Decode Stage 3: ALU (Arithmetic-Logic Unit) Stage 4: Memory Access Stage 5: Register Write Dr Dan Garcia

15 Stages of the Datapath (1/5)
There is a wide variety of MIPS instructions: so what general steps do they have in common? Stage 1: Instruction Fetch no matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy) also, this is where we Increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4) Dr Dan Garcia

16 Stages of the Datapath (2/5)
Stage 2: Instruction Decode upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) first, read the opcode to determine instruction type and field lengths second, read in data from all necessary registers for add, read two registers for addi, read one register for jal, no reads necessary Dr Dan Garcia

17 Stages of the Datapath (3/5)
Stage 3: ALU (Arithmetic-Logic Unit) the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) what about loads and stores? lw $t0, 40($t1) the address we are accessing in memory = the value in $t1 PLUS the value 40 so we do this addition in this stage Dr Dan Garcia

18 Stages of the Datapath (4/5)
Stage 4: Memory Access actually only the load and store instructions do anything during this stage; the others remain idle during this stage or skip it all together since these instructions have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be fast Dr Dan Garcia

19 Stages of the Datapath (5/5)
Stage 5: Register Write most instructions write the result of some computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps? don’t write anything into a register at the end these remain idle during this fifth stage or skip it all together Dr Dan Garcia

20 Morgan Kaufmann Publishers
19 September, 2018 Logic Design Basics Information encoded in binary Low voltage = 0, High voltage = 1 One wire per bit Multi-bit data encoded on multi-wire buses Combinational element Operate on data Output is a function of input State (sequential) elements Store information §4.2 Logic Design Conventions Chapter 4 — The Processor — 20 Chapter 4 — The Processor

21 Combinational Elements
Morgan Kaufmann Publishers 19 September, 2018 Combinational Elements AND-gate Y = A & B Adder Y = A + B A B Y + A B Y Arithmetic/Logic Unit Y = F(A, B) Multiplexer Y = S ? I1 : I0 A B Y ALU F I0 I1 Y M u x S Chapter 4 — The Processor — 21 Chapter 4 — The Processor

22 Morgan Kaufmann Publishers
19 September, 2018 ALU Control ALU used for Load/Store: F = add Branch: F = subtract R-type: F depends on funct field §4.4 A Simple Implementation Scheme ALU control Function 0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR Chapter 4 — The Processor — 22 Chapter 4 — The Processor

23 Morgan Kaufmann Publishers
19 September, 2018 Sequential Elements Register: stores data in a circuit Uses a clock signal to determine when to update the stored value Edge-triggered: update when Clk changes from 0 to 1 Clk D Q D Clk Q Chapter 4 — The Processor — 23 Chapter 4 — The Processor

24 Morgan Kaufmann Publishers
19 September, 2018 Sequential Elements Register with write control Only updates on clock edge when write control input is 1 Used when stored value is required later Write D Q Clk D Clk Q Write Chapter 4 — The Processor — 24 Chapter 4 — The Processor

25 Morgan Kaufmann Publishers
19 September, 2018 Clocking Methodology Combinational logic transforms data during clock cycles Between clock edges Input from state elements, output to state element Longest delay determines clock period Chapter 4 — The Processor — 25 Chapter 4 — The Processor

26 Morgan Kaufmann Publishers
19 September, 2018 Building a Datapath Datapath Elements that process data and addresses in the CPU Registers, ALUs, mux’s, memories, … We will build a MIPS datapath incrementally Refining the overview design §4.3 Building a Datapath Chapter 4 — The Processor — 26 Chapter 4 — The Processor

27 Morgan Kaufmann Publishers
19 September, 2018 Instruction Fetch Increment by 4 for next instruction 32-bit register Chapter 4 — The Processor — 27 Chapter 4 — The Processor

28 R-Format Instructions
Morgan Kaufmann Publishers 19 September, 2018 R-Format Instructions Read two register operands Perform arithmetic/logical operation Write register result Chapter 4 — The Processor — 28 Chapter 4 — The Processor

29 Load/Store Instructions
Morgan Kaufmann Publishers 19 September, 2018 Load/Store Instructions Read register operands Calculate address using 16-bit offset Use ALU, but sign-extend offset Load: Read memory and update register Store: Write register value to memory Chapter 4 — The Processor — 29 Chapter 4 — The Processor

30 Morgan Kaufmann Publishers
19 September, 2018 Branch Instructions Read register operands Compare operands Use ALU, subtract and check Zero output Calculate target address Sign-extend displacement Shift left 2 places (word displacement) Add to PC + 4 Already calculated by instruction fetch Chapter 4 — The Processor — 30 Chapter 4 — The Processor

31 Morgan Kaufmann Publishers
19 September, 2018 Branch Instructions Just re-routes wires Sign-bit wire replicated Chapter 4 — The Processor — 31 Chapter 4 — The Processor

32 Composing the Elements
Morgan Kaufmann Publishers 19 September, 2018 Composing the Elements First-cut data path does an instruction in one clock cycle Each datapath element can only do one function at a time Hence, we need separate instruction and data memories Use multiplexers where alternate data sources are used for different instructions Chapter 4 — The Processor — 32 Chapter 4 — The Processor

33 R-Type/Load/Store Datapath
Morgan Kaufmann Publishers 19 September, 2018 R-Type/Load/Store Datapath Chapter 4 — The Processor — 33 Chapter 4 — The Processor

34 Morgan Kaufmann Publishers
19 September, 2018 Full Datapath Chapter 4 — The Processor — 34 Chapter 4 — The Processor

35 Generic Steps of Datapath
rd ALU instruction memory PC registers rs memory Data rt +4 imm missing: multiplexors or “data selectors” – where should they be in this picture and why? also missing – opcode for control of what operations to perform state elements vs combinational ones – combinational given the same input will always produce the same output – out depends only on the current input 2. Decode/ Register Read 1. Instruction Fetch 3. Execute 4. Memory 5. Register Write Dr Dan Garcia

36 Datapath Walkthroughs (1/3)
add $r3,$r1,$r2 # r3 = r1+r2 Dr Dan Garcia

37 Datapath Walkthroughs (1/3)
add $r3,$r1,$r2 # r3 = r1+r2 Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is an add, then read registers $r1 and $r2 Stage 3: add the two values retrieved in Stage 2 Stage 4: idle (nothing to write to memory) Stage 5: write result of Stage 3 into register $r3 9/19/2018 Dr Dan Garcia

38 Example: add Instruction
reg[1]+ reg[2] reg[2] reg[1] 2 1 3 add r3, r1, r2 ALU instruction memory PC registers memory Data imm +4 Dr Dan Garcia

39 Datapath Walkthroughs (2/3)
slti $r3,$r1,17 # if (r1 <17 )r3 = 1 else r3 = 0 Dr Dan Garcia

40 Datapath Walkthroughs (2/3)
slti $r3,$r1,17 # if (r1 <17 )r3 = 1 else r3 = 0 Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is an slti, then read register $r1 Stage 3: compare value retrieved in Stage 2 with the integer 17 Stage 4: idle Stage 5: write the result of Stage 3 (1 if reg source was less than signed immediate, 0 otherwise) into register $r3 9/19/2018 Dr Dan Garcia

41 Example: slti Instruction
reg[1] <17? 17 reg[1] 3 1 x slti r3, r1, 17 ALU instruction memory PC registers memory Data imm +4 Dr Dan Garcia

42 Datapath Walkthroughs (3/3)
sw $r3,17($r1) # Mem[r1+17]=r3 Dr Dan Garcia

43 Datapath Walkthroughs (3/3)
sw $r3,17($r1) # Mem[r1+17]=r3 Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is a sw, then read registers $r1 and $r3 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) to compute address Stage 4: write the value contained in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 Stage 5: idle (nothing to write into a register) Dr Dan Garcia

44 Example: sw Instruction
reg[1] +17 17 reg[1] 3 1 x SW r3, 17(r1) ALU instruction memory PC registers memory Data MEM[r1+17]<=r3 reg[3] imm +4 Dr Dan Garcia

45 Why Five Stages? (1/2) Could we have a different number of stages?
Yes, and other architectures do So why does MIPS have five if instructions tend to idle for at least one stage? Five stages are the union of all the operations needed by all the instructions. One instruction uses all five stages: the load Dr Dan Garcia

46 Why Five Stages? (2/2) lw $r3,17($r1) # r3=Mem[r1+17]
Stage 1: fetch this instruction, increment PC Stage 2: decode to determine it is a lw, then read register $r1 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) Stage 4: read value from memory address computed in Stage 3 Stage 5: write value read in Stage 4 into register $r3 Dr Dan Garcia

47 Example: lw Instruction
reg[1] +17 17 reg[1] 3 1 x LW r3, 17(r1) ALU instruction memory PC registers memory Data MEM[r1+17] imm +4 Dr Dan Garcia

48 Peer Instruction How many places in this diagram will need a multiplexor to select one from multiple inputs? a) 0 b) 1 c) 2 d) 3 e) 4 or more Dr Dan Garcia

49 Morgan Kaufmann Publishers
19 September, 2018 Peer Instruction Can’t just join wires together Use multiplexers Dr Dan Garcia Chapter 4 — The Processor

50 Datapath and Control Controller
Datapath based on data transfers required to perform instructions Controller causes the right transfers to happen PC instruction memory +4 rt rs rd registers Data imm ALU Controller opcode, funct Dr Dan Garcia

51 What Hardware Is Needed? (1/2)
PC: a register that keeps track of address of the next instruction to be fetched General Purpose Registers Used in Stages 2 (Read) and 5 (Write) MIPS has 32 of these Memory Used in Stages 1 (Fetch) and 4 (R/W) Caches makes these stages as fast as the others (on average, otherwise multicycle stall) Dr Dan Garcia

52 What Hardware Is Needed? (2/2)
ALU Used in Stage 3 Performs all necessary functions: arithmetic, logicals, etc. Miscellaneous Registers One stage per clock cycle: Registers inserted between stages to hold intermediate data and control signals as they travel from stage to stage Note: Register is a general purpose term meaning something that stores bits. Realize that not all registers are in the “register file” Dr Dan Garcia

53 CPU Clocking (1/2) For each instruction, how do we control the flow of information though the datapath? Single Cycle CPU: All stages of an instruction completed within one long clock cycle Clock cycle sufficiently long to allow each instruction to complete all stages without interruption within one cycle 2. Decode/ Register Read 1. Instruction Fetch 3. Execute 4. Memory 5. Reg. Write Dr Dan Garcia

54 CPU Clocking (2/2) Alternative multiple-cycle CPU: only one stage of instruction per clock cycle Clock is made as long as the slowest stage Several significant advantages over single cycle execution: Unused stages in a particular instruction can be skipped OR instructions can be pipelined (overlapped) 2. Decode/ Register Read 1. Instruction Fetch 3. Execute 4. Memory 5. Register Write Dr Dan Garcia

55 Processor Design Analyze instruction set architecture (ISA) to determine datapath requirements Meaning of each instruction is given by register transfers Datapath must include storage element for ISA registers Datapath must support each register transfer Select set of datapath components and establish clocking methodology Assemble datapath components to meet requirements Analyze each instruction to determine sequence of control point settings to implement the register transfer Assemble the control logic to perform this sequencing Dr Dan Garcia

56 Instruction Level Parallelism
IF ID ALU MEM WR Instr 1 IF ID ALU MEM WR Instr 2 Instr 2 IF ID ALU MEM WR Instr 3 IF ID ALU MEM WR IF ID ALU MEM WR Instr 4 IF ID ALU MEM WR Instr 5 IF ID ALU MEM WR Instr 6 IF ID ALU MEM WR Instr 7 IF ID ALU MEM WR Instr 8 Dr Dan Garcia

57 Summary CPU design involves Datapath, Control
5 Stages for MIPS Instructions Instruction Fetch Instruction Decode & Register Read ALU (Execute) Memory Register Write Datapath timing: single long clock cycle or one short clock cycle per stage Dr Dan Garcia


Download ppt "Single-Cycle DataPath"

Similar presentations


Ads by Google