ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.

ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013

Dr. WangLecture #132 Administrative Issues (Wednesday, Dec 4) Project –Report Due Dec 9 –Presentation Due 2:00 pm, Dec 9 –Order: Group 1, Group 2, Group 3, Group 4 Exam 2 review

Dr. Wang Review of Lecture #12 & 13 Machine instruction characteristics –constituent elements, instruction representation, instruction types, and number of addresses Instruction set design –types of operands –operation repertoire –Addressing modes (how is the operand address specified?): immediate, direct, indirect, register, register indirect, displacement (relative, base-register, indexing), stack –Instruction formats Little-, big-, and bi-endian (byte ordering, bit ordering)

Dr. Wang Topics Instruction cycle Instruction pipelining –Principle –Performance –Problems (L15) –Examples (L15)

Dr. Wang Instruction Cycle + Indirect Cycle (for indirect addressing operands)

Dr. Wang Instruction Cycle with Indirect Sub-Cycle

Dr. Wang Instruction Cycle State Diagram

Dr. Wang Data Flow in Each Cycle

Dr. Wang Data Flow (1: Fetch Cycle) –PC contains address of next instruction –Address moved to MAR –Address placed on address bus –Control unit requests a memory read –Result placed on data bus, copied to MBR, then to IR –Meanwhile PC incremented by 1

Dr. Wang Data Flow (2: Indirect Cycle) IR is examined If indirect addressing, indirect cycle is performed –Right most N bits of MBR transferred to MAR –Control unit requests a memory read –Result (address of operand) moved to MBR

Dr. Wang Data Flow (3: Execute Cycle) May take many forms Depends on instruction being executed May include –Register transfers –Memory read/write –Input/Output –ALU operations

Dr. Wang Data Flow (4: Interrupt Cycle) Simple &Predictable Current PC saved to allow resumption after interrupt –Contents of PC copied to MBR –Special memory location (e.g. stack pointer) loaded to MAR –MBR written to memory PC loaded with address of ISR Next instruction (first of ISR) can be fetched

Dr. Wang Agenda Instruction cycle –Fetch, indirect, execute, interrupt cycle –Data flow Instruction pipelining –Principle –Performance –Problems –Examples

Dr. Wang A Laundry Example Let us assume there are four steps to the weekly (monthly) laundry: 4 loads

Dr. Wang Do the Laundry Pipelined 4 loads 16 cycles 7 cycles Sequential 4 loads:

Dr. Wang Principles of Pipelining Tasks are subdivided into successive subtasks A pipeline stage is associated with each subtask The same amount of time is allocated to each subtask All pipeline stages operate like an assembly line; 1 st stage accepts input, the last stage delivers the output Basic pipeline is synchronous

Dr. Wang Instruction Pipelining A key, powerful technique to make fast CPU An ‘assembly line’ in computing used for instruction processing; 6 stages of (nearly) equal duration –Fetch instruction (FI) –Decode instruction (DI) –Calculate operands, i.e. EAs (CO) –Fetch operands (FO) –Execute instruction (EI) –Write operand / result (WO) Multiple instructions are overlapped in execution

Dr. Wang Timing of Instruction Pipeline (1) 54 cycles  14 cycles

Dr. Wang Timing of Instruction Pipeline (2) Time progresses vertically down the figure Each row shows the state of the pipeline at a given point in time Pipeline is full at time 6 through 9 with different instructions in different stages

Dr. Wang Comments (1) Each instruction is assumed to go through all 6 stages of the pipeline –not always the case, e.g., no WO stage for ‘LOAD’ –timing is set up so for simplifying pipeline hardware Assume no potential hazard –data dependency, branch, interrupt

Dr. Wang Comments (2) Assumes no memory conflicts –Most memory systems don’t permit simultaneous accesses –Desired value may be in cache, or FO, or WO may be null stage, or separate instruction and data memories are used  pipeline is not slowed down for much of time

Dr. Wang Timing of Instruction Pipeline (1)

Dr. Wang Pipeline Performance (1) Cycle time  –the time available for each stage to accomplish the required operations –Determined by the worst-case processing time of the longest stage –Currently pipelined processors: 2-20 ns

Dr. Wang Pipeline Performance (2) Total time to execute n instructions –k: number of stages in the pipeline –To complete the execution of the 1 st ins: k cycles –The remaining n-1 ins require n-1 cycles

Dr. Wang Pipeline Performance (3) Speedup factor –Compared to execution without pipeline: –The larger the # of pipeline stages, the larger the potential for speedup

Dr. Wang Speedup Factor Illustration Pipeline Performance (3)

Dr. Wang Pipeline Performance (4) Throughput –Also called “repetition rate” –The shortest possible time interval between subsequent independent instructions in the pipeline –When the basic pipe is full, throughput is 1 cycle

Dr. Wang Hands-On Problem If you have a simple 6-stage pipeline executing a basic code block containing 10 instructions. Assume the pipeline clock cycle time is 10ns and there is no potential hazard (data / branch / interrupt). 1.What is the total time to execute this block of code? 2.What is the repetition rate of this pipeline for this basic block? 3.What is the speedup factor?

Dr. Wang Difficulties with Pipelining The stages are not of equal duration –use the worst-case processing time of the longest stage –waiting must be involved Data hazard due to Read-After-Write dependency Conditional branch instructions could invalidate the fetched instructions behind them Interrupt could invalidate the fetched instructions

Dr. Wang Summary of Lecture #14 Instruction cycle (elaborated version) –Fetch, indirect, execute, interrupt cycle –Data flow Instruction pipelining –Principle: assembly line –Performance measures –Problems / difficulties introduction

Dr. Wang Things To Do Work on the project Check out the class website about lecture notes

Dr. Wang Solution T=(k+(n-1))*c, where k=6, the number of stages in the pipeline; n=10, the number of instructions to be executed; c=10ns, the clock cycle time, so, the total time to execute the code is: 150ns Repetition rate also known as throughput, for this pipeline, the throughput is 1 cycle Speedup factor is the ratio of total execution time without pipelining to total execution time with pipelining. The total time without pipelining is n*k*c=600ns. So, the speedup factor s=600/150=4

ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.

Similar presentations

Presentation on theme: "ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013.

Similar presentations

Presentation on theme: "ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013."— Presentation transcript:

Similar presentations

About project

Feedback