1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.

Slides:



Advertisements
Similar presentations
PIPELINE AND VECTOR PROCESSING
Advertisements

Machine Instructions Operations
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
CS1104: Computer Organisation School of Computing National University of Singapore.
Machine Instructions Operations 1 ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson Slides4-1.ppt Modification date: March 18, 2015.
CMPT 334 Computer Organization
ITCS 3181 Logic and Computer Systems 2015 B. Wilkinson slides3.ppt Modification date: March 16, Addressing Modes The methods used in machine instructions.
Computer Organization and Architecture
1 Today  All HW1 turned in on time, this is great!  HW2 will be out soon —You will work on procedure calls/stack/etc.  Lab1 will be out soon (possibly.
1 ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides8.ppt Modification date: Nov 3, 2014 Random Logic Approach The approach described so far.
CHAPTER 4 COMPUTER SYSTEM – Von Neumann Model
Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,
Midterm Wednesday Chapter 1-3: Number /character representation and conversion Number arithmetic Combinational logic elements and design (DeMorgan’s Law)
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Basic Computer Organization, CPU L1 Prof. Sin-Min Lee Department of Computer Science.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
Chapter 7. Basic Processing Unit
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
Appendix A Pipelining: Basic and Intermediate Concepts
Specifying the Actions Internal Architecture of a Simple Processor
ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides6.ppt Modification date: Oct 30, Processor Design Specifying the Actions Internal Architecture.
Class 9.1 Computer Architecture - HUJI Computer Architecture Class 9 Microprogramming.
Pipelining By Toan Nguyen.
Basic Processing Unit (Week 6)
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.
Chapter 5 Basic Processing Unit
Lecture 16 Today’s topics: –MARIE Instruction Decoding and Control –Hardwired control –Micro-programmed control 1.
Multiple-bus organization
EXECUTION OF COMPLETE INSTRUCTION
CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.
Execution of an instruction
1 Control Unit Operation and Microprogramming Chap 16 & 17 of CO&A Dr. Farag.
Computer Architecture 2 nd year (computer and Information Sc.)
Microarchitecture. Outline Architecture vs. Microarchitecture Components MIPS Datapath 1.
Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Designing a CPU –Reading a programs instruction from memory –Decoding the instruction –Executing the instruction –Transferring Data to/From memory / IO.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
CPU (Central Processing Unit). The CPU is the brain of the computer. Sometimes referred to simply as the processor or central processor, the CPU is where.
Basic Computer Organization and Design
Chapter 4 The Von Neumann Model
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers
Processor (I).
COSC 2021: Computer Organization Instructor: Dr. Amir Asif
Basic Processing Unit Unit- 7 Engineered for Tomorrow CSE, MVJCE.
Functional Units.
Computer Organization “Central” Processing Unit (CPU)
MIPS Processor.
Control Unit Introduction Types Comparison Control Memory
Levels in Processor Design
Rocky K. C. Chang 6 November 2017
The Processor Lecture 3.1: Introduction & Logic Design Conventions
Guest Lecturer TA: Shreyas Chand
Instruction Execution Cycle
William Stallings Computer Organization and Architecture 8th Edition
Branch instructions We’ll implement branch instructions for the eight different conditions shown here. Bits 11-9 of the opcode field will indicate the.
Control units In the last lecture, we introduced the basic structure of a control unit, and translated our assembly instructions into a binary representation.
ECE 352 Digital System Fundamentals
The Stored Program Computer
Review: The whole processor
MIPS Processor.
COMPUTER ARCHITECTURE
Chapter 4 The Von Neumann Model
Presentation transcript:

1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design

2 Three basic ways: Random logic approach – Design a unique logic system to implement the instructions using gates and flip-flops, using techniques described in Part 1 of the course. Microprogrammed approach – Also touched upon in Part 1. Each step in the state diagram encoded into a binary pattern called a microinstruction. Creates microprogram that is held in a control memory. Step through the microinstructions of microprogram generating logic signals to effect the register transfers. Only used today for very complicated instructions. Pipeline design – Each step, or group of steps, implemented by one unit (stage). Units linked together in a pipeline. Most common approach as it leads to concurrent high speed operation. We will only consider this way.

3 Pipelined Processor Design The operation of the processor are divided into a number of sequential actions, e.g.: 1.Fetch instruction. 2.Fetch operands. 3.Execute operation. 4.Store results or more steps. Each step is performed by a separate unit (stage). Each action is performed by a separate logic unit which are linked together in a “pipeline.”

4 Processor Pipeline Space-Time Diagram

5 Pipeline Staging Latches Usually, pipelines designed using latches (registers) between units (stages) to hold the information being transferred from one stage to next. Transfer occurs in synchronism with a clock signal:

6 Processing time Time to process s instructions using a pipeline with p stages = p + s - 1 cycles

7 Note: This does not take into account the extra time due to the latches in the pipeline version

8 Dividing Processor Actions The operation of the processor can be divided into: Fetch Cycle Execute Cycle

9 Two Stage Fetch/Execute Pipeline

10 A Two-Stage Pipeline Design

11 Fetch/decode/execute pipeline Relevant for complex instruction formats Recognizes instruction - separates operation and operand addresses

12 Try to have each stage require the same time otherwise pipeline will have to operate at the time of the slowest stage. Usually have more stages to equalize times. Let’s start at four stages: Four-Stage Pipeline IF OS EX OF Space-Time Diagram

13 Four-stage Pipeline “Instruction-Time Diagram” An alternative diagram: This form of diagram used later to show pipeline dependencies.

14 Information Transfer in Four-Stage Pipeline Register file Memory Instruction Address PC OF IF EX OS Contents Register #’s Latch Clock ALU

15 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R1 Add R3 R2 R1 After instruction fetched: Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers. PC = PC+4

16 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R After operands fetched: Add R3 V2 V1 V1 is contents of R1, V2 is contents of R2

17 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R After execution (addition): Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers. Add --- V2 V1 R3 Result ---

18 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R After result stored: Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers R3, result

19 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Register Instructions ADD R3, R2, R1 Overall: Note: where R3, R2, and R1 mentioned in the latch, it actually holds just register numbers. R3, result Add R3 R2 R1 PC = PC+4 Add R3 V2 V1 Add V2 V1 R3 Result

20 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, 123 Add R3 R2 123 After instruction fetched: Note: where R3 and R2 mentioned in the latch, it actually holds just register numbers. PC = PC+4

21 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, After operands fetched: Add R3 V2 123 V2 is contents of R2

22 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, After execution (addition): Note: where R3 and R2 mentioned in the latch, it actually holds just register numbers. Add --- V2 123 R3 Result ---

23 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions ADD R3, R2, After result stored: Note: where R3 and R2 mentioned in the latch, it actually holds just register numbers R3, result

24 Register file Memory Instruction Address PC OF IF EX OS Latch Clock ALU Register-Constant Instructions (Immediate addressing) ADD R3, R2, 123 Add R3 R2 123 Overall: R3 Result V2 R3, result R2 Add 123 V2 is contents of R2

25 Branch Instructions A couple of issues to deal with here: 1.Number of steps needed. 1.Dealing with program counter incrementing after each instruction fetch.

26 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 Bcond R1 R2 Offset After instruction fetched: R2 Offset to L1 held in instruction R1 Test +

27 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 Offset After operands fetched: V1 V2 Bcond V1 is contents of R1, V2 is contents of R2 Offset to L1 held in instruction Offset V1 V2 Test +

28 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 After execution (addition): V1 Result V2 Bcond V1 is contents of R1 Offset to L1 held in instruction Test + Offset

29 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 After result stored: Result Result (TRUE/FALSE) V1 is contents of R1 Offset to L1 held in instruction Test + Offset If TRUE add offset to PC else do nothing

30 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock ALU (Complex) Branch Instructions Bcond R1, R2, L1 Bcond R1 R2 Offset Overall: V1 Result V2 Result (TRUE/FALSE) R2 Bcond V1 is contents of R1 Offset to L1 held in instruction Offset R1 V1 V2 Test + Offset If TRUE add offset to PC else do nothing

31 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Simpler Branch Instructions Bcond R1, L1 Bcond R1 Offset Overall: V1 Result Result (TRUE/FALSE) Bcond V1 is contents of R1 Offset R1 V1 Test + Offset If TRUE add offset to PC else do nothing Tests R1 against zero

32 Dealing with program counter incrementing after each instruction fetch Previous design will need to taking into account that by the time the branch instruction is in the execute unit, the program counter will have been incremented three times. Solutions: 1.Modify the offset value in the instruction (subtract 12). 2. Modify the arithmetic operation to be PC + offset – Feed the program counter value through the pipeline. (This is the best way as it takes into account any pipeline length. Done in the Patterson-Hennessy architecture book)

33 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 Bcond R1 Offset After instruction fetched: V1 is contents of R1 R1 V1 Test Tests R1 against zero PC Add

34 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 After operand fetched: V1 Bcond V1 is contents of R1 Offset R1 V1 Test Tests R1 against zero PC Add

35 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 After branch computed: Result V1 is contents of R1 R1 V1 Test Tests R1 against zero New PC value Add

36 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 After PC updated: Result (TRUE/FALSE) V1 is contents of R1 R1 V1 Test New PC value If TRUE update PC else do nothing Tests R1 against zero Add

37 Register file Memory Instruction Address PC OF IF EX/BR OS Latch Clock Feeding PC value through pipeline Bcond R1, L1 Bcond R1 Offset Overall: V1 Result Result (TRUE/FALSE) Bcond V1 is contents of R1 Offset R1 V1 Test New PC value If TRUE update PC else do nothing Tests R1 against zero PC New PC value Add

38 Load and Store Instructions Need at least one extra stage to handle memory accesses. Early RISC processor arrangement was to place memory stage (MEM) between EX and OS as below. Now a five-stage pipeline. LD R1, 100[R2]

39 ST 100[R2], R1 Note: Convenient to have separate instruction and data memories connecting to processor pipeline - usually separate cache memories, see later.

40 Usage of Stages Uses IF twice

41 Number of Pipeline Stages As the number of stages is increased, one would expect the time for each stage to decrease, i.e. the clock period to decrease and the speed to increase. However one must take into account the pipeline latch delay. 5-stage pipeline represents an early RISC design - “underpipelined” Most recent processors have more stages.

42 Optimum Number of Pipeline Stages* Suppose one homogeneous unit doing everything takes T s time units. With p pipeline stages with equally distributed work, each stage takes T/p. Let t L = time for latch to operate. Then: Execution time T ex = (p + s - 1)  (T s /p + t L ) * Adapted from “Computer Architecture and Implementation” by H. G. Cragon, Cambridge University Press, Typical results (T s = 128, T L =2) In practice, there are a lot more factors involved, see later for some.

43 Questions