Shift Instructions (1/4)

Slides:



Advertisements
Similar presentations
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
Advertisements

©UCB CS 161Computer Architecture Chapter 5 Lecture 9 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)
The Processor: Datapath & Control
1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
Processor II CPSC 321 Andreas Klappenecker. Midterm 1 Tuesday, October 5 Thursday, October 7 Advantage: less material Disadvantage: less preparation time.
331 W9.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 9 Building a Single-Cycle Datapath [Adapted from Dave Patterson’s.
Levels in Processor Design
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:
Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer Structure - Datapath and Control Goal: Design a Datapath  We will design the datapath of a processor that includes a subset of the MIPS instruction.
The Processor 2 Andreas Klappenecker CPSC321 Computer Architecture.
Chapter Five The Processor: Datapath and Control.
The Datapath Andreas Klappenecker CPSC321 Computer Architecture.
1 The Processor: Datapath and Control We will design a microprocessor that includes a subset of the MIPS instruction set: –Memory access: load/store word.
Datapath and Control Andreas Klappenecker CPSC321 Computer Architecture.
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
S. Barua – CPSC 440 CHAPTER 5 THE PROCESSOR: DATAPATH AND CONTROL Goals – Understand how the various.
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
The Processor: Datapath & Control. Implementing Instructions Simplified instruction set memory-reference instructions: lw, sw arithmetic-logical instructions:
Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Chapter 4 CSF 2009 The processor: Building the datapath.
Processor: Datapath and Control
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
ECE 445 – Computer Organization
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /19/2013 Lecture 17: The Processor - Overview Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER.
CDA 3101 Fall 2013 Introduction to Computer Organization
CS2100 Computer Organisation The Processor: Datapath (AY2015/6) Semester 1.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
D ATA P ATH OF A PROCESSOR (MIPS) Module 1.1 : Elements of computer system UNIT 1.
CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) September 2003.
1 Chapter 5: Datapath and Control (Part 2) CS 447 Jason Bakos.
MIPS Processor.
Morgan Kaufmann Publishers The Processor
May 22, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 14: A Simple Implementation of MIPS * Jeremy R. Johnson Mon. May 17, 2000.
Computer Organization Lecture Set – 05.1 Chapter 5 Huei-Yung Lin.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Multi-Cycle Datapath and Control.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
Morgan Kaufmann Publishers
Introduction CPU performance factors
/ Computer Architecture and Design
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor (I).
CS/COE0447 Computer Organization & Assembly Language
CSCI206 - Computer Organization & Programming
Single-Cycle CPU DataPath.
Levels in Processor Design
Topic 5: Processor Architecture Implementation Methodology
Rocky K. C. Chang 6 November 2017
The Processor Lecture 3.2: Building a Datapath with Control
Topic 5: Processor Architecture
Systems Architecture I
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Lecture 14: Single Cycle MIPS Processor
COMP541 Datapaths I Montek Singh Mar 18, 2010.
The Processor: Datapath & Control.
Processor: Datapath and Control
Presentation transcript:

Shift Instructions (1/4) Move (shift) all the bits in a word to the left or right by a number of bits. Example: shift right by 8 bits 0001 0010 0011 0100 0101 0110 0111 1000 0000 0000 0001 0010 0011 0100 0101 0110 Example: shift left by 8 bits 0001 0010 0011 0100 0101 0110 0111 1000 0011 0100 0101 0110 0111 1000 0000 0000

Shift Instructions (2/4) MIPS Shift Instruction Syntax: 1 2,3,4 where 1) operation name 2) register that will receive value 3) first operand (register) 4) shift amount (constant < 32, 5 bits) MIPS shift instructions: 1. sll (shift left logical): shifts left and fills emptied bits with 0s 2. srl (shift right logical): shifts right and fills emptied bits with 0s 3. sra (shift right arithmetic): shifts right and fills emptied bits by sign extending

Shift Instructions (3/4) Example: shift right arith by 8 bits 0001 0010 0011 0100 0101 0110 0111 1000 0000 0000 0001 0010 0011 0100 0101 0110 Example: shift right arith by 8 bits 1001 0010 0011 0100 0101 0110 0111 1000 1111 1111 1001 0010 0011 0100 0101 0110

Shift Instructions (4/4) Since shifting may be faster than multiplication, a good compiler usually notices when C code multiplies by a power of 2 and compiles it to a shift instruction: a *= 8; (in C) would compile to: sll $s0,$s0,3 (in MIPS) Likewise, shift right to divide by powers of 2 remember to use sra

“Shift and Add” Signed Multiplier Signed extend partial product at each stage Final step is a subtract n-clock cycles

Fast multiplication hardware

Chap.5 The processor: Datapath and control Jen-Chang Liu, Spring 2006

Hierarchy of Machine Structures I/O system Processor Compiler Operating System (Windows 98) Application (Netscape) Digital Design Circuit Design Instruction Set Architecture Datapath & Control transistors Memory Hardware Software Assembler

Five components of computer Input, output, memory, datapath, control

Inside Mother board (for Pentium Pro)

Chapter overview Chap5: datapath and control Chap6: pipeline Chap7: memory hierarchy Chap8: I/O Chap9: multiprocessor Inside CPU

Inside Processor: datapath and control Datapath: brawn of the processor Perform the arithmetic operations Control: brain of the processor Tells the datapath, memory, and I/O what to do 生產線

Inside Pentium Processor 1/3 cache

Inside Pentium Pro Processor

Clocks methodology high low Edge-triggered clocking: the content of the state elements (flip-flops, registers, memory) only change on the active clock edge 100 101 001 111 110 001 100

Timing constraint The clock period must be long enough to allow signals to be stable

Design Target: MIPS The instruction set architecture (ISA) determines the implementation We know how to execute MIPS codes manually, how to design a circuit to execute them? We design a simple implementation that includes a subset of MIPS inst. Memory-reference inst.: lw, sw Arithmetic-logic inst.: add,sub,and,or,slt Branch: beq, j

Outline of chapter 5 Building a datapath Instruction fetch R-type instructions Load/store Branch Single Datapath implementation Multiple cycle implementation

Preview: How to carry out an instruction 4 steps to implement an instruction 執行 Instruction fetch Data/register read Instruction execution Memory/register read/write Read inst. from memory ALU add $t0, $t1, $t2 $t1, $t2 $t1 + $t2 Write to $t0 lw $t0, 0($a0) $a0 $a0 + 0 Read from memory beq $t0, $t1, loop $t0, $t1 $t0 - $t1 Write PC

Abstract view of carrying out an instruction fetch Data/register read Instruction execution Memory/register read/write

How to build datapath for MIPS ISA? Datapath: path to perform an instruction Consider each major components Build datapath for each instruction class

Outline Building a datapath 1. Instruction fetch 2. R-type instructions 3. Load/store 4. Branch Build datapath for each instruction class, then combine them

1. Instruction fetch Increment the Address of the Place to store PC to next instruction Place to store the instructions Address of the instructions

Instruction fetch (cont.) 3 always adds, therefore no control lines 1 2

2. R-type instruction R-format instructions Arithmetic-logic instrcutions add, sub Ex. add $t1, $t2, $t3 and, or slt Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5

Datapath elements for R-type inst. 4 input output 1. Read register: read register no., output data 2. Write register: write register no., input data, RegWrite=1

Datapath for R-type inst. 4 2 1 3 Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5

3. Load/store from/to memory I-format Load/store examples lw $t1, offset_value($t2) sw $t1, offset_value($t2) Opcode 6 rs 5 rt 5 Signed offset 16 … offset $t2

Datapath elements for load/store lw $t1, offset_value($t2) Register file, ALU, and data memory Base+offset Store -> MemWrite Load -> MemRead Sign-extend the 16-bit offset field

Datapath for load/store Opcode 6 rs 5 rt 5 Signed offset 16 Datapath for load/store 4 2 1

4. Branch I-format Example beq $t1, $t2, offset PC-relative addressing Opcode 6 rs 5 rt 5 Signed offset 16

Details for branch: target address calculation Base address for offset: PC+4 Instructions are word-aligned: the offset is shifted left 2 bits … PC+4 offset Opcode 6 rs 5 rt 5 Immediate 16 00 offset

Opcode 6 rs 5 rt 5 Signed offset 16 Datapath for branch 2 4 1

How to combine these datapaths ? We have shown datapaths for Instruction fetch R-type instructions Load/store branch How to assemble the datapaths? How to handle control lines?

Outline Building a datapath Single Datapath implementation Instruction fetch R-type instructions Load/store Branch Single Datapath implementation Multiple cycle implementation

Single datapath implementation Attempt to execute all instructions in 1 clock cycle No datapath resources can be used more than once per instruction Duplicated units: ex. Memory for instructions and memory for data Shared units: use multiplexor to select input 生產線 add,… lw, sw beq,…

1. Combine R-type and lw/sw Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5 1. Combine R-type and lw/sw Opcode 6 rs 5 rt 5 Signed offset 16 4 R-type 4 lw/sw

R-type + load/store 4 2 1

2. Add the instruction fetch 4

3. Add the branch unit 4

Simple datapath and control. See Fig 5.17 (p.307)

Trace the operation of the datapath !!! Explain in 4 steps, but they are actually operates in a single clock cycle Quiz later !!! Instruction fetch Data/register read Instruction execution Memory/register read/write

add $t1,$t2,$t3 => add $9, $10, $11 => 10 11 9 32 Step 1. Instruction fetch

add $t1,$t2,$t3 => 10 11 9 32 Step 2. Read source registers

add $t1,$t2,$t3 => 10 11 9 32 Step 3. Instruction execution

add $t1,$t2,$t3 => 10 11 9 32 Step 4. Write result

lw $t1, 0($t2) 36 9 10

How to combine the datapaths ? We have shown datapaths for Instruction fetch R-type instructions Load/store branch How to assemble the datapaths? How to handle control lines?

Simple datapath and control. See Fig 5.19 (p.360)

How to generate control? 6 bits 6 bits Truth table look-up 10 bits Control signal

Hierarchy of control units Instructions (binary representation) Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)

Why multiple levels of control? Purpose: Reduce the size of main control unit ? Potentially increase the speed of the control unit ALUop(2 bits):指令分類 define 3 classes of instructions R-type Load/store Branch

Design main control unit Instructions (binary representation) Opcode[31-26] Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)

Main control unit Observe instruction set

See Fig 5.19 Control signal for R-format?

1

Create truth table for main control unit

Design ALU control unit Instructions (binary representation) Opcode[31-26] Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)

ALU control unit Instruction[5-0] ALUop ALU control 3 bits ALU control Input 1 (2 bits) Input 2 (6 bits) Output (3 bits) See Figure 4.20

ALU control signal (1 bit) (2 bits) ALU control line function 0 00 and 0 01 or 0 10 add 1 10 sub 1 11 slt +

Instruction set formats 決定ALU 動作 instruction set

creating truth table 28

Why a single-cycle implementation is not used? It is inefficient. Why? Single-cycle implementation => the clock cycle time is the same for every instruction Clock cycle = longest path = load Other instruction class can fit in a shorter cycle !!!

Performance evaluation for single-cycle implementation Assume the operation time Memory units: 2 ns ALU: 2ns Register file: 1 ns Calculate the necessary time for each instruction class

Memory units: 2 ns ALU: 2ns Register file: 1 ns

How to improve single-cycle datapath? A variable-speed clock for each instruction class Difficult to implement Multi-cycle implementation