Download presentation
Presentation is loading. Please wait.
1
15-447 Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr msakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/ CS-447– Computer Architecture M,W 10-11:20am Lecture 11 Single Cycle Datapath
2
15-447 Computer ArchitectureFall 2007 © Lecture Objectives ° Learn what a datapath is, and how does it provide the required functions. ° Appreciate why different implementation strategies affects the clock rate and CPI of a machine. ° Understand how the ISA determines many aspects of the hardware implementation.
3
15-447 Computer ArchitectureFall 2007 © Implementation vs. Performance Performance of a processor is determined by Instruction count of a program CPI Clock cycle time (clock rate) The compiler & the ISA determine the instruction count. The implementation of the processor determines the CPI and the clock cycle time.
4
15-447 Computer ArchitectureFall 2007 © Possible Execution Steps of Any Instructions ° Instruction Fetch ° Instruction Decode and Register Fetch ° Execution of the Memory Reference Instruction ° Execution of Arithmetic-Logical operations ° Branch Instruction ° Jump Instruction
5
15-447 Computer ArchitectureFall 2007 © Instruction Processing °Five steps: Instruction fetch (IF) Instruction decode and operand fetch (ID) ALU/execute (EX) Memory (not required) (MEM) Write-back (WB) IF ID EX MEM WB
6
15-447 Computer ArchitectureFall 2007 © Datapath & Control Control
7
15-447 Computer ArchitectureFall 2007 © Datapath Elements The data path contains 2 types of logic elements: Combinational: (e.g. ALU) Elements that operate on data values. Their outputs depend on their inputs. State: (e.g. Registers & Memory) Elements with internal storage. Their state is defined by the values they contain.
8
15-447 Computer ArchitectureFall 2007 © State Elements
9
15-447 Computer ArchitectureFall 2007 © Pentium Processor Die °State Registers Memory °Control ROM °Combinational logic (Compute) REG
10
15-447 Computer ArchitectureFall 2007 © Abstract View of the Datapath
11
15-447 Computer ArchitectureFall 2007 © Single Cycle Implementation °This simple processor can compute ALU instructions, access memory or compute the next instruction's address in a single cycle.
12
15-447 Computer ArchitectureFall 2007 © Program Counter If each instruction needs 4 memory locations then, Next PC <= PC + 4
13
15-447 Computer ArchitectureFall 2007 © PC Datapath – Branch Offset PC <= PC + Branch Offset
14
15-447 Computer ArchitectureFall 2007 © Abstract View After PC Basic Implementation
15
15-447 Computer ArchitectureFall 2007 © The Register File °Arithmetic & Logical instructions (R-type), read the contents of 2 registers, perform an ALU operation, and write the result back to a register. °Registers are stored in the register file. The register file has inputs to specify the registers, outputs for the data read, input for the data written and 1 control signal to decide if data should be written in. In addition we will need an ALU to perform the operations.
16
15-447 Computer ArchitectureFall 2007 © The Register File
17
15-447 Computer ArchitectureFall 2007 © R-Type Instructions Assembly (e.g., register-register signed addition) ADD rd reg rs reg rt reg Machine encoding Semantics if MEM[PC] == ADD rd rs rt GPR[rd] ← GPR[rs] + GPR[rt] PC ← PC + 4
18
15-447 Computer ArchitectureFall 2007 © ADD rd rs rt
19
15-447 Computer ArchitectureFall 2007 © Datapath for Add
20
15-447 Computer ArchitectureFall 2007 © I-Type ALU Instructions °Assembly (e.g., register-immediate signed additions) ADDI rt reg rs reg immediate 16 °Machine encoding °Semantics if MEM[PC] == ADDI rt rs immediate GPR[rt] ← GPR[rs] + sign-extend (immediate) PC ← PC + 4
21
15-447 Computer ArchitectureFall 2007 © ADDI rt reg rs reg immediate16
22
15-447 Computer ArchitectureFall 2007 © Datapath for R and I-Type ALU Instructions
23
15-447 Computer ArchitectureFall 2007 © Data Memory °The element needed to implement load and store instructions are data memory. In addition we use the existing ALU to compute the address to access. °The data memory has 2 x-bit inputs: the address and the write data, and 1 x-output: the read data. In addition it has 2 control lines: MemWrite and MemRead.
24
15-447 Computer ArchitectureFall 2007 © Data Memory
25
15-447 Computer ArchitectureFall 2007 © Load Instruction °Assembly (e.g., load 4-byte word) LW rt reg offset 16 (base reg ) °Machine encoding °Semantics if MEM[PC]==LW rt offset16 (base) EA = sign-extend(offset) + GPR[base] GPR[rt] ← MEM[ translate(EA) ] PC ← PC + 4
26
15-447 Computer ArchitectureFall 2007 © LW Datapath
27
15-447 Computer ArchitectureFall 2007 © Branch Equal °The beq (branch if equal) instruction has 3 operands two registers that are compared for equality and a n-bit offset used to compute the branch address relative to the PC.
28
15-447 Computer ArchitectureFall 2007 © Branch Equal
29
15-447 Computer ArchitectureFall 2007 © Unconditional Jump °Assembly J immediate 26 °Machine encoding °Semantics if MEM[PC]==J immediate26 target = { PC[31:28], immediate26, 2’b00 } PC ← target
30
15-447 Computer ArchitectureFall 2007 © Unconditional Jump Datapath
31
15-447 Computer ArchitectureFall 2007 © Combining ALU and Memory Instructions °The ALU datapath and the Memory datapath are similar. The differences are: The second input to the ALU is a register (R- type) or the offset (I-type). The value stored into the destination register comes from the ALU (R-type) or from memory (I-type). °Using 2 multiplexers (Mux) we can combine both datapaths.
32
15-447 Computer ArchitectureFall 2007 © Combining ALU and Memory Instructions
33
15-447 Computer ArchitectureFall 2007 © The Complete Datapath
34
15-447 Computer ArchitectureFall 2007 © Complete Datapath
35
15-447 Computer ArchitectureFall 2007 © What’s Wrong with Single Cycle? °All instructions run at the speed of the slowest instruction. °Adding a long instruction can hurt performance What if you wanted to include multiply? °You cannot reuse any parts of the processor We have 3 different adders to calculate PC+1, PC+1+offset and the ALU °No profit in making the common case fast Since every instruction runs at the slowest instruction speed -This is particularly important for loads as we will see later
36
15-447 Computer ArchitectureFall 2007 © What’s Wrong with Single Cycle? 1 ns – Register read/write time 2 ns – ALU/adder 2 ns – memory access 0 ns – MUX, PC access, sign extend, ROM add: 2ns + 1ns + 2ns + 1ns = 6 ns beq: 2ns + 1ns + 2ns = 5 ns sw: 2ns + 1ns + 2ns + 2ns = 7 ns lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns Get read ALU mem write Instr reg operation reg
37
15-447 Computer ArchitectureFall 2007 © Computing Execution Time Assume: 100 instructions executed 25% of instructions are loads, 10% of instructions are stores, 45% of instructions are adds, and 20% of instructions are branches. Single-cycle execution: 100 * 8ns = 800 ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.