Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining

Pipelining Review Pipelining: — Break instruction cycle into n phases (one stage per phase) –e.g. Fetch, Decode, ReadOPs, Execute1, Execute2, WriteBack — Fetch a new instruction each phase — Maximum speed gain is n — Hazards reduce the ability to achieve a gain of n –Types of Hazards +Resource oHazard occurs when instruction needs a resource being used by another instruction +Data oRAW (hazard if read can occur before write has finished) oWAR (hazard if write can occur before read is finished) oWAW (hazard if writes occur in the unintended order) +Control oHazard occurs when a wrong fetch decision at a branch results in an extra instruction fetch and a pipeline flush — Stalling can always “fix” a hazard

Data Hazards Read after Write (RAW) – true dependency — A Hazard occurs if the Read occurs before the Write is complete –e.g. Reg 1  Reg 1 + Reg 2 {write occurs after execution} Reg 3  reg 1 – Reg 3 {read occurs before execution} Write after Read (WAR) – anti-dependency — A Hazard occurs if the Write occurs before the Read happens –e.g. Reg  M(ptr) {2 memory accesses – long read} {M(ptr) & M(pc) are same loc} M(pc)  Reg {1 memory access – short write} Write after Write (WAW) – output dependency — A Hazard occurs if the two Writes occur in the reverse order than intended –e.g. Reg A  M(PTR) {2 memory accesses – long write} Reg A  Reg B {0 memory accesses – short write}

Control Hazard Control Hazards occur when a wrong fetch decision results in a new instruction fetch and the pipeline being flushed Solutions include: — Multiple Pipeline streams — Prefetching the branch target — Using a Loop Buffer — Branch Prediction — Delayed Branch — Reordering of Instructions — Multiple Copies of Registers (backups)

Recall Key Features of RISC —Limited and simple instruction set —Memory access instructions limited to memory registers —Operations are register to register —Large number of general purpose registers (and use of compiler technology to optimize register use) —Emphasis on optimising the instruction pipeline (& memory management) —Hardwired for speed (no microcode)

Supporting Pipelining with Registers Software contribution — Require compiler to allocate registers –Allocate based on most used variables in a given time +Requires sophisticated program analysis Hardware contribution — Have more registers –Thus more variables will be in registers

Register uses Store local scalar variables in registers — Reduces memory accesses Every procedure (function) call changes locality (typically lots of procedure calls are encountered) — Parameters must be passed — Partial context switch — Results must be returned — Variables from calling program must be restored — Partial Context switch Store Global Variables in Registers ?

Using “Register Windows” Observations: Typically only a few Local & Pass parameters Typically limited range of depth of calls Implications: If we Partition register set We can use multiple small sets of registers per context Let Calls switch to a new set of registers Let Returns switch back to the previously used set of registers

Using “Register Windows” Partition register set into: — Parameter registers (Passed Parameters) — Local registers (includes local variables) —Temporary registers (Passing Parameters) Then: —Temporary registers from one set overlap parameter registers from the next And: —This provides parameter passing without moving data (just move one pointer)

Overlapping “Register Windows” Picture of Calls & Returns:

Circular Buffer diagram of Overlapping “Register Windows”

Operation of Circular Buffer When a call is made, a current window pointer is moved to show the currently active register window If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory A saved window pointer indicates where the next saved windows should be restored

Global Variables How should we accommodate Global Variables? Allocate by the compiler to memory ? Have a static set of registers for global variables ? Put them in cache ?

Registers v Cache – which is better? Large Register FileCache All local scalarsRecently-used local scalars Individual variablesBlocks of memory Compiler-assigned global variablesRecently-used global variables Save/Restore based on procedure nesting depthSave/Restore based on cache replacement algorithm Register addressingMemory addressing

Referencing a Scalar - Window Based Register File

Referencing a Scalar - Cache

Compiler Based Register Optimization Basis: Assuming relatively small number of registers (16-32) Optimizing the use is given to the compiler HLL programs have no explicit references to registers Then: Assign symbolic, or virtual, register to each candidate variable Map (unlimited) symbolic registers to (limited) real registers Symbolic registers that are not used at the same time can share real registers If you run out of real registers some variables will use memory

Graph Coloring Algorithm for Register Assignment Given: A graph of nodes and edges Nodes represent symbolic registers Two symbolic registers that are used in the same program fragment are joined by an edge Then: Assign a color to each node Adjacent nodes must have different colors (connected by an edge) Assign a minimum number of colors And then: Try to color the graph with n colors, where n is the number of real registers Nodes that can not be colored must be placed in memory

Graph Coloring Algorithm Example

RISC Features Again Key features — Large number of general purpose registers (and use of compiler technology to optimize register use) — Limited and simple instruction set — Memory access instructions – memory registers — Operations are register to register — Emphasis on optimising the instruction pipeline & memory management — Hardwired for speed (no microcode)

Memory to Memory vs Register to Memory Operations (RISC uses only Register to memory) Lab Project 1  Actually these numbers are bits, not bytes

RISC Pipelining Basics Define two phases of execution for register based instructions —I: Instruction fetch —E: Execute –ALU operation with register input and output For load and store there will be three —I: Instruction fetch —E: Execute –Calculate memory address —D: Memory –Register to memory or memory to register operation

Effects of RISC Pipelining (Allows 2 memory accesses per stage) (E 1 register read, E 2 execute & register write Particularly beneficial if E phase is long) (2 stage since ED are effectively one stage)

Optimization of RISC Pipelining Delayed branch — Leverages branch that does not take effect until after execution of following instruction — The following instruction becomes the delay slot

Normal vs Delayed Branch (Text diagram is wrong)

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Similar presentations

Presentation on theme: "Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Similar presentations

Presentation on theme: "Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining."— Presentation transcript:

Similar presentations

About project

Feedback