More on Pipelining 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.

Slides:



Advertisements
Similar presentations
The Fetch – Execute Cycle
Advertisements

Stored Program Architecture
Fetch Execute Cycle – In Detail -
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
2.3) Example of program execution 1. instruction  B25 8 Op-code B means to change the value of the program counter if the contents of the indicated register.
Microprocessor.  The CPU of Microcomputer is called microprocessor.  It is a CPU on a single chip (microchip).  It is called brain or heart of the.
The Little man computer
Computer Systems. Computer System Components Computer Networks.
Chapter 12 Pipelining Strategies Performance Hazards.
LAB 9 Simulator Chap 14 REED. Datapath Simulator accompanying the text is a datapath simulator a.k.a. the Knob & Switch Computer developed by Grant Braught.
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Lecture 13 - Introduction to the Central Processing Unit (CPU)
CPU Fetch/Execute Cycle
Computer Science 210 Computer Organization The Instruction Execution Cycle.
Lecture 3. Diff b/w RAM and Registers Registers are used to hold data immediately applicable to the operation at hand Registers are used to hold data.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patterson and Hennessy Text.
Computer Architecture and the Fetch-Execute Cycle
Computer Architecture and the Fetch-Execute Cycle
CPU Design. Introduction – The CPU must perform three main tasks: Communication with memory – Fetching Instructions – Fetching and storing data Interpretation.
Model Computer CPU Arithmetic Logic Unit Control Unit Memory Unit
Mic-1: Microarchitecture University of Fribourg, Switzerland System I: Introduction to Computer Architecture WS December 2006 Béat Hirsbrunner,
Lecture 14 Today’s topics MARIE Architecture Registers Buses
CS 111 – Sept. 15 Chapter 2 – Manipulating data by performing instructions “What is going on in the CPU?” Commitment: –Please read through section 2.3.
Lecture 11: System Fundamentals Intro to IT COSC1078 Introduction to Information Technology Lecture 11 System Fundamentals James Harland
1 Purpose of This Chapter In this chapter we introduce a basic computer and show how its operation can be specified with register transfer statements.
Computer Science 101 Computer Systems Organization ALU, Control Unit, Instruction Set.
DH2T 34 – HNC Computer Architecture 1 Lecture 14 The Fetch-Decode-Execute Cycle [1]. © C Nyssen/Aberdeen College 2003 All images © C Nyssen/Aberdeen College.
Fetch-execute cycle.
CMSC 150 PROGRAM EXECUTION CS 150: Wed 1 Feb 2012.
D75P 34 – HNC Computer Architecture
© GCSE Computing Candidates should be able to:  describe the characteristics of an assembler Slide 1.
Represents different voltage levels High: 5 Volts Low: 0 Volts At this raw level a digital computer is instructed to carry out instructions.
COMPILERS CLASS 22/7,23/7. Introduction Compiler: A Compiler is a program that can read a program in one language (Source) and translate it into an equivalent.
Dale & Lewis Chapter 5 Computing components
Simple ALU How to perform this C language integer operation in the computer C=A+B; ? The arithmetic/logic unit (ALU) of a processor performs integer arithmetic.
More on Pipelining 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
Elements of Datapath for the fetch and increment The first element we need: a memory unit to store the instructions of a program and supply instructions.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CPIT Program Execution. Today, general-purpose computers use a set of instructions called a program to process data. A computer executes the.
3.1.4 Hardware a. describe the function and purpose of the control unit, memory unit and ALU (arithmetic logic unit) as individual parts of a computer;
The Little man computer
Lecture 13 - Introduction to the Central Processing Unit (CPU)
CPU Organisation & Operation
Computer Science 210 Computer Organization
Data Representation – Instructions
The fetch-execute cycle
The Processor and Machine Language
Lecture 6: Advanced Pipelines
Computer Science 210 Computer Organization
Computer Organization and ASSEMBLY LANGUAGE
Lecture 8: Dynamic ILP Topics: out-of-order processors
Computer Architecture and the Fetch-Execute Cycle
The Little Man Computer
Control unit extension for data hazards
MARIE: An Introduction to a Simple Computer
Control unit extension for data hazards
Program Execution.
The Von Neumann Machine
Control unit extension for data hazards
Lecture 9: Dynamic ILP Topics: out-of-order processors
Instruction execution and ALU
Computer Architecture
Presentation transcript:

More on Pipelining 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington

Fetch-Decode-Execute Cycle in Detail The CPU clock ticks to mark start of cycle. 1.Fetch next instruction from memory 2.Change program counter to point to next instruction 3.Determine type of instruction just fetched 4.If instruction uses a word in memory, locate it 5.Fetch word, if needed, into a CPU register. 6.Execute instruction. 7.The clock cycle is completed. Go to step 1 to begin executing the next instruction. 2

Toy ISA Instructions add A B C: – Adds contents of registers A and B, stores result in register C. addi N A C: – Adds integer N to contents of register A, stores result in register C. load address A: – Loads data from the specified memory address to register A. store A address: – Stores data from register A to the specified memory address. goto line: – Set the instruction counter to the specified line. That line should be executed next. if A line: – If the contents of register A are NOT 0, set the instruction counter to the specified line. That line should be be executed next. 3

Defining Pipeline Behavior In the following slides, we will explicitly define how each instruction goes through the pipeline. This is a toy ISA that we have just made up, so the following conventions are designed to be simple, and easy to apply. You may find that, in some cases, we could have followed other conventions that would make execution even more efficient. 4

Pipeline Steps for: add A B C Fetch Step: Decode Step: Operand Fetch Step: Execution Step: Output Save Step: NOTES: 5

Pipeline Steps for: add A B C Fetch Step: Fetch instruction from memory location specified by PC. Increment PC to point to the next instruction. Decode Step: Determine that this statement uses the ALU, takes input from registers A and B, and modifies register C. Operand Fetch Step: Copy contents of registers A and B to ALU input registers. Execution Step: The ALU unit performs addition. Output Save Step: The result of the addition is copied to register C. NOTES: This instruction must wait at the decode step until all previous instructions have finished modifying the contents of registers A and B. 6

Pipeline Steps for: addi N A C Fetch Step: Decode Step: Operand Fetch Step: Execution Step: Output Save Step: NOTES: 7

Pipeline Steps for: addi N A C Fetch Step: Fetch instruction from memory location specified by PC. Increment PC to point to the next instruction. Decode Step: Determine that this statement uses the ALU, takes input from register A, and modifies register C. Operand Fetch Step: Copy content of register A into one ALU input register, copy integer N into the other ALU input register. Execution Step: The ALU unit performs addition. Output Save Step: The result of the addition is copied to register C. NOTES: This instruction must wait at the decode step until all previous instructions have finished modifying the contents of register A. 8

Pipeline Steps for: load address A Fetch Step: Decode Step: Operand Fetch Step: Execution Step: Output Save Step: NOTES: 9

Pipeline Steps for: load address A Fetch Step: Fetch instruction from memory location specified by PC. Increment PC to point to the next instruction. Decode Step: Determine that this statement accesses memory, takes input from address, and modifies register A. Operand Fetch Step: Not applicable for this instruction. Execution Step: The bus brings to the CPU the contents of address. Output Save Step: The data brought by the bus is copied to register C. NOTES: This instruction must wait at the decode step until all previous instructions have finished modifying the contents of address. 10

Pipeline Steps for: store A address Fetch Step: Decode Step: Operand Fetch Step: Execution Step: Output Save Step: NOTES: 11

Pipeline Steps for: store A address Fetch Step: Fetch instruction from memory location specified by PC. Increment PC to point to the next instruction. Decode Step: Determine that this statement accesses memory, takes input from register A, and modifies address. Operand Fetch Step: Not applicable for this instruction. Execution Step: The bus receives the contents of register A from the CPU. Output Save Step: The bus saves the data at address. NOTES: This instruction must wait at the decode step until all previous instructions have finished modifying the contents of register A. 12

Pipeline Steps for: goto line Fetch Step: Decode Step: Operand Fetch Step: Execution Step: Output Save Step: NOTES: 13

Pipeline Steps for: goto line Fetch Step: Fetch instruction from memory location specified by PC. Increment PC to point to the next instruction. Decode Step: Determine that this statement is a goto. Flush (erase) what is stored at the fetch step in the pipeline. Operand Fetch Step: Not applicable for this instruction. Execution Step: Not applicable for this instruction. Output Save Step: The program counter (PC) is set to the specified line. NOTES: See next slide. 14

Pipeline Steps for: goto line NOTES: When a goto instruction completes the decode step: – The pipeline stops receiving any new instructions. However, instructions that entered the pipeline before the goto instruction continue normal execution. – The pipeline ignores and does not process any further the instruction that was fetched while the goto instruction was decoded. Fetching statements resumes as soon as the goto instruction has finished executing, i.e., when the goto instruction has completed the output save step. 15

Pipeline Steps for: if A line Fetch Step: Decode Step: Operand Fetch Step: Execution Step: Output Save Step: NOTES: 16

Pipeline Steps for: if A line Fetch Step: Fetch instruction from memory location specified by PC. Increment PC to point to the next instruction. Decode Step: Determine that this statement is an if and that it accesses register A. Flush (erase) what is stored at the fetch step in the pipeline. Operand Fetch Step: Copy contents of register A to first ALU input register. Execution Step: The ALU compares the first input register with 0, and outputs 0 if the input register equals 0, outputs 1 otherwise. Output Save Step: If the ALU output is 1, the program counter (PC) is set to the specified line. Nothing done otherwise. NOTES: See next slide. 17

Pipeline Steps for: if A line NOTE 1: an if instruction must wait at the decode step until all previous instructions have finished modifying register A. When an if instruction completes the decode step: – The pipeline stops receiving any new instructions. However, instructions that entered the pipeline before the if instruction continue normal execution. – The pipeline erases and does not process any further the instruction that was fetched while the if instruction was decoded. Fetching statements resumes as soon as the if instruction has finished executing, i.e., when the if instruction has completed the output save step. 18

Pipeline Execution: An Example line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 19 Consider the program on the right. The previous specifications define how this program is executed step- by-step through the pipeline. To trace the execution, we need to specify the inputs to the program. Program inputs: Program outputs:

Pipeline Execution: An Example line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 20 Consider the program on the right. The previous specifications define how this program is executed step- by-step through the pipeline. To trace the execution, we need to specify the inputs to the program. Program inputs: – address1, let's assume it contains 0. – address2, let's assume it contains 10. Program outputs: – address10 – address11 – address12

21 TimeFetchDecodeOperand Fetch ALU exec. Output Save PC Notes 11XXXX XXX XX X X21 4line 3 waits for line 2 to finish. 6 43XX2 4 7 XX3XX 4 line 3 moves on. if detected. Stop fetching, flush line 4 from fetch step. 8 XXX3X 4 9 XXXX3 4 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12

22 TimeFetchDecodeOperand Fetch ALU exec. Output Save PC Notes 9 XXXX XXXX4if has finished, PC does NOT change. 1154XXX XX6 13XX54XX goto detected. Stop fetching, flush line 6 from fetch step. 14XXX54X 15XXXX5X 167XXXX7goto has finished, PC set to XXX8 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12

23 TimeFetchDecodeOperand Fetch ALU exec. Output Save PC Notes 1787XXX XX9 1998X7X9line 8 waits for line 7 to finish. 2098XX XX line 8 moves on X X9811line 10 waits for line 9 to finish XX XX12line 10 moves on. line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12

24 TimeFetchDecodeOperand Fetch ALU exec. Output Save PC Notes XX12line 10 moves on. 26X121110XXno more instructions to fetch. 27X12X11XXline 12 waits for line 11 to finish. 28X12XX11X 29XX12XXXline 12 moves on. 30XXX12XX 31XXXX12X 32program execution has finished! line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12

Reordering Instructions 25 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 Reordering of instructions can be done by a compiler, as long as the compiler knows how instructions are executed. The goal of reordering is to obtain more efficient execution through the pipeline, by reducing dependencies. Obviously, reordering is not allowed to change the meaning of the program. What is the meaning of a program?

Meaning of a Program 26 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 What is the meaning of a program? A program can be modeled mathematically as a function, that takes specific input and produces specific output. In this program, what is the input? Where is information stored that the program accesses? What is the output? What is information left behind by the program?

Meaning of a Program 27 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 What is the meaning of a program? A program can be modeled mathematically as a function, that takes specific input and produces specific output. In this program, what is the input? Where is information stored that the program accesses? – address1 and address2. What is the output? What is information left behind by the program? – address10, address11, address12.

Reordering Instructions 28 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 Reordering is not allowed to change the meaning of a program. Therefore, when given the same input as the original program, the re- ordered program must produce same output as the original program. Therefore, the re-ordered program must ALWAYS leave the same results as the original program on address10, address11, address12, as long as it starts with the same contents as the original program on address1 and address2.

Reordering Instructions 29 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 Reordering of instructions can be done by a compiler, as long as the compiler knows how instructions are executed. How can we rearrange the order of instructions? Heuristic approach: when we find an instruction A that needs to wait on instruction B: – See if instruction B can be moved earlier. – See if some later instructions can be moved ahead of instruction A.

Reordering Instructions 30 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 What is the first instruction that has to wait? What can we do for that case?

Reordering Instructions 31 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 What is the first instruction that has to wait? – line 3 needs to wait on line 2. What can we do for that case? – Swap line 2 and line 1, so that line 2 happens earlier.

Reordering Instructions 32 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 What is another instruction that has to wait? What can we do for that case?

Reordering Instructions 33 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 What is another instruction that has to wait? – line 8 needs to wait on line 7. What can we do for that case? – We can move line 9 and line 11 ahead of line 8.

Result of Reordering 34 line 1: load address2 R2 line 2: load address1 R1 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: store R4 address10 line 9: addi 30 R2 R5 line 10: store R5 address11 line 11: add R3 R2 R8 line 12: store R8 address12 line 1 (old 2): load address1 R1 line 2 (old 1): load address2 R2 line 3 (old 3): if R1 6 line 4 (old 4): addi 20 R1 R3 line 5 (old 5): goto 7 line 6 (old 6): addi 10 R1 R3 line 7 (old 7): addi 5 R2 R4 line 8 (old 9): addi 30 R2 R5 line 9 (old 11): add R3 R2 R8 line 10 (old 8): store R4 address10 line 11 (old 10): store R5 address11 line 12 (old 12): store R8 address12

35 TimeFetchDecodeOperand Fetch ALU exec. Output Save PC Notes 11XXXX XXX XX X X21 4line 3 waits for line 1 to finish. 6 XX3X2 4 line 3 moves on. if detected. Stop fetching, flush line 4 from fetch step. 7XXX3X4 8XXXX34 94XXXX4if has finished, PC does NOT change. line 1: load address1 R1 line 2: load address2 R2 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: addi 30 R2 R5 line 9: add R3 R2 R8 line 10: store R4 address10 line 11: store R5 address11 line 12: store R8 address12

36 TimeFetchDecodeOperand Fetch ALU exec. Output Save PC Notes 94XXXX4if has finished, PC does NOT change. 1054XXX XX6 12XX54XX goto detected. Stop fetching, flush line 6 from fetch step. 13XXX5XX 14XXXX5X 157XXXX7goto has finished, PC set to XXX XX9 line 1: load address1 R1 line 2: load address2 R2 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: addi 30 R2 R5 line 9: add R3 R2 R8 line 10: store R4 address10 line 11: store R5 address11 line 12: store R8 address12

37 TimeFetchDecodeOperand Fetch ALU exec. Output Save PC Notes 17987XX X X X 22XX121110X 23XXX1211X 24XXXX12X 25program execution has finished! line 1: load address1 R1 line 2: load address2 R2 line 3: if R1 6 line 4: addi 20 R1 R3 line 5: goto 7 line 6: addi 10 R1 R3 line 7: addi 5 R2 R4 line 8: addi 30 R2 R5 line 9: add R3 R2 R8 line 10: store R4 address10 line 11: store R5 address11 line 12: store R8 address12 Execution took 24 clock ticks. Compare to 31 ticks for the original program.