Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Slides:



Advertisements
Similar presentations
Reduced Instruction Set Computers
Advertisements

CPU Structure and Function
CH14 Instruction Level Parallelism and Superscalar Processors
Topics Left Superscalar machines IA64 / EPIC architecture
Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Computer Organization and Architecture
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Chapter 13 Reduced Instruction Set Computers (RISC) CISC – Complex Instruction Set Computer RISC – Reduced Instruction Set Computer HW: 13.6, 13.7 (Due.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Chapter 13 Reduced Instruction Set Computers (RISC)
Chapter 12 Pipelining Strategies Performance Hazards.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation Microporgrammed control unit.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Chapter 12 CPU Structure and Function. Example Register Organizations.
1 Pertemuan 23 Reduced Instruction Set Computer 1 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
Midterm Thursday let the slides be your guide Topics: First Exam - definitely cache,.. Hamming Code External Memory & Buses - Interrupts, DMA & Channels,
Chapter 13 Reduced Instruction Set Computers (RISC) CISC – Complex Instruction Set Computer RISC – Reduced Instruction Set Computer.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Reduced Instruction Set Computers (RISC)
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
CH12 CPU Structure and Function
Advanced Computer Architectures
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
RISC Processors – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: RISC Processors Reading: Stallings, Chapter 13.
Computer Organization and Architecture Reduced Instruction Set Computers (RISC) Chapter 13.
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
CMPE 421 Parallel Computer Architecture
M. Mateen Yaqoob The University of Lahore Spring 2014.
ECEG-3202 Computer Architecture and Organization Chapter 7 Reduced Instruction Set Computers.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
Topics to be covered Instruction Execution Characteristics
Computer Architecture Chapter (14): Processor Structure and Function
William Stallings Computer Organization and Architecture 8th Edition
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Advanced Topic: Alternative Architectures Chapter 9 Objectives
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
William Stallings Computer Organization and Architecture 8th Edition
Computer Architecture
Chapter 13 Reduced Instruction Set Computers
Instruction-level Parallelism: Reduced Instruction Set Computers and
Chapter 12 Pipelining and RISC
COMPUTER ORGANIZATION AND ARCHITECTURE
Presentation transcript:

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining

Pipelining Review Pipelining: — Break instruction cycle into n phases (one stage per phase) –e.g. Fetch, Decode, ReadOPs, Execute1, Execute2, WriteBack — Fetch a new instruction each phase — Maximum speed gain is n — Hazards reduce the ability to achieve a gain of n –Types of Hazards +Resource oHazard occurs when instruction needs a resource being used by another instruction +Data oRAW (hazard if read can occur before write has finished) oWAR (hazard if write can occur before read is finished) oWAW (hazard if writes occur in the unintended order) +Control oHazard occurs when a wrong fetch decision at a branch results in an extra instruction fetch and a pipeline flush — Stalling can always “fix” a hazard

Data Hazards Read after Write (RAW) – true dependency — A Hazard occurs if the Read occurs before the Write is complete –e.g. Reg 1  Reg 1 + Reg 2 {write occurs after execution} Reg 3  reg 1 – Reg 3 {read occurs before execution} Write after Read (WAR) – anti-dependency — A Hazard occurs if the Write occurs before the Read happens –e.g. Reg  M(ptr) {2 memory accesses – long read} {M(ptr) & M(pc) are same loc} M(pc)  Reg {1 memory access – short write} Write after Write (WAW) – output dependency — A Hazard occurs if the two Writes occur in the reverse order than intended –e.g. Reg A  M(PTR) {2 memory accesses – long write} Reg A  Reg B {0 memory accesses – short write}

Control Hazard Control Hazards occur when a wrong fetch decision results in a new instruction fetch and the pipeline being flushed Solutions include: — Multiple Pipeline streams — Prefetching the branch target — Using a Loop Buffer — Branch Prediction — Delayed Branch — Reordering of Instructions — Multiple Copies of Registers (backups)

Recall Key Features of RISC —Limited and simple instruction set —Memory access instructions limited to memory registers —Operations are register to register —Large number of general purpose registers (and use of compiler technology to optimize register use) —Emphasis on optimising the instruction pipeline (& memory management) —Hardwired for speed (no microcode)

Supporting Pipelining with Registers Software contribution — Require compiler to allocate registers –Allocate based on most used variables in a given time +Requires sophisticated program analysis Hardware contribution — Have more registers –Thus more variables will be in registers

Register uses Store local scalar variables in registers — Reduces memory accesses Every procedure (function) call changes locality (typically lots of procedure calls are encountered) — Parameters must be passed — Partial context switch — Results must be returned — Variables from calling program must be restored — Partial Context switch Store Global Variables in Registers ?

Using “Register Windows” Observations: Typically only a few Local & Pass parameters Typically limited range of depth of calls Implications: If we Partition register set We can use multiple small sets of registers per context Let Calls switch to a new set of registers Let Returns switch back to the previously used set of registers

Using “Register Windows” Partition register set into: — Parameter registers (Passed Parameters) — Local registers (includes local variables) —Temporary registers (Passing Parameters) Then: —Temporary registers from one set overlap parameter registers from the next And: —This provides parameter passing without moving data (just move one pointer)

Overlapping “Register Windows” Picture of Calls & Returns:

Circular Buffer diagram of Overlapping “Register Windows”

Operation of Circular Buffer When a call is made, a current window pointer is moved to show the currently active register window If all windows are in use, an interrupt is generated and the oldest window (the one furthest back in the call nesting) is saved to memory A saved window pointer indicates where the next saved windows should be restored

Global Variables How should we accommodate Global Variables? Allocate by the compiler to memory ? Have a static set of registers for global variables ? Put them in cache ?

Registers v Cache – which is better? Large Register FileCache All local scalarsRecently-used local scalars Individual variablesBlocks of memory Compiler-assigned global variablesRecently-used global variables Save/Restore based on procedure nesting depthSave/Restore based on cache replacement algorithm Register addressingMemory addressing

Referencing a Scalar - Window Based Register File

Referencing a Scalar - Cache

Compiler Based Register Optimization Basis: Assuming relatively small number of registers (16-32) Optimizing the use is given to the compiler HLL programs have no explicit references to registers Then: Assign symbolic, or virtual, register to each candidate variable Map (unlimited) symbolic registers to (limited) real registers Symbolic registers that are not used at the same time can share real registers If you run out of real registers some variables will use memory

Graph Coloring Algorithm for Register Assignment Given: A graph of nodes and edges Nodes represent symbolic registers Two symbolic registers that are used in the same program fragment are joined by an edge Then: Assign a color to each node Adjacent nodes must have different colors (connected by an edge) Assign a minimum number of colors And then: Try to color the graph with n colors, where n is the number of real registers Nodes that can not be colored must be placed in memory

Graph Coloring Algorithm Example

RISC Features Again Key features — Large number of general purpose registers (and use of compiler technology to optimize register use) — Limited and simple instruction set — Memory access instructions – memory registers — Operations are register to register — Emphasis on optimising the instruction pipeline & memory management — Hardwired for speed (no microcode)

Memory to Memory vs Register to Memory Operations (RISC uses only Register to memory) Lab Project 1  Actually these numbers are bits, not bytes

RISC Pipelining Basics Define two phases of execution for register based instructions —I: Instruction fetch —E: Execute –ALU operation with register input and output For load and store there will be three —I: Instruction fetch —E: Execute –Calculate memory address —D: Memory –Register to memory or memory to register operation

Effects of RISC Pipelining (Allows 2 memory accesses per stage) (E 1 register read, E 2 execute & register write Particularly beneficial if E phase is long) (2 stage since ED are effectively one stage)

Optimization of RISC Pipelining Delayed branch — Leverages branch that does not take effect until after execution of following instruction — The following instruction becomes the delay slot

Normal vs Delayed Branch (Text diagram is wrong)