Superscalar Pipelines Part 2

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Computer Organization and Architecture
Computer architecture
CSCI 4717/5717 Computer Architecture
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
CSE 8383 Superscalar Processor 1 Abdullah A Alasmari & Eid S. Alharbi.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Computer Architecture Pipelines & Superscalars. Pipelines Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub$2, $1, $3 and $12, $2, $5 or $13, $6, $2.
Computer Organization and Architecture
Computer Organization and Architecture
Superscalar Implementation Simultaneously fetch multiple instructions Logic to determine true dependencies involving register values Mechanisms to communicate.
Inside The CPU. Buses There are 3 Types of Buses There are 3 Types of Buses Address bus Address bus –between CPU and Main Memory –Carries address of where.
SUPERSCALAR EXECUTION. two-way superscalar The DLW-2 has two ALUs, so it’s able to execute two arithmetic instructions in parallel (hence the term two-way.
Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Part 2. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved A five-level memory.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
1 Superscalar Pipelines 11/24/08. 2 Scalar Pipelines A single k stage pipeline capable of executing at most one instruction per clock cycle. All instructions,
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Advanced Processor Technology Architectural families of modern computers are CISC RISC Superscalar VLIW Super pipelined Vector processors Symbolic processors.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.
Pipelining and Parallelism Mark Staveley
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Pentium Architecture Arithmetic/Logic Units (ALUs) : – There are two parallel integer instruction pipelines: u-pipeline and v-pipeline – The u-pipeline.
EKT303/4 Superscalar vs Super-pipelined.
EECS 322 March 18, 2000 RISC - Reduced Instruction Set Computer Reduced Instruction Set Computer  By reducing the number of instructions that a processor.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
Instruction level parallelism And Superscalar processors By Kevin Morfin.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
CS 352H: Computer Systems Architecture
Dynamic Scheduling Why go out of style?
Visit for more Learning Resources
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
PowerPC 604 Superscalar Microprocessor
Chapter 14 Instruction Level Parallelism and Superscalar Processors
CS203 – Advanced Computer Architecture
Flow Path Model of Superscalars
Introduction to Pentium Processor
Pipelining: Advanced ILP
Instruction Level Parallelism and Superscalar Processors
Superscalar Processors & VLIW Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Morgan Kaufmann Publishers Computer Organization and Assembly Language
15-740/ Computer Architecture Lecture 5: Precise Exceptions
* From AMD 1996 Publication #18522 Revision E
Computer Architecture
Computer Architecture
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Created by Vivi Sahfitri
Dynamic Pipelines Like Wendy’s: once ID/RD has determined what you need, you get queued up, and others behind you can get past you. In-order front end,
Chapter 11 Processor Structure and function
Interrupts and exceptions
CMSC 611: Advanced Computer Architecture
ECE 721, Spring 2019 Prof. Eric Rotenberg.
Presentation transcript:

Superscalar Pipelines Part 2 12/1/08

An example six stage superscalar pipeline The six stages: fetch, decode, dispatch, execute, complete, and retiring. The execute stage can include multiple (pipelined) functional units of different types with different execution latencies. The dispatch stage distributes instructions of different types to their corresponding functional units. the complete stage reorders instructions to ensure in-order updating of the machine state.

k = 6 pipeline stages s = 7 width

Fetch Multiple instructions are fetched from I-cache on each machine cycle. I-cache line needs to be a multiple of pipeline width s.

Fetch continued S instructions must be fetched on each clock cycle to sustain pipeline bandwidth. Problems Instruction misalignment Instructions that change program flow. i.e. branches.

Decode Identify individual instructions. Determine instruction types. Detect inter-instruction dependences among instructions that have been fetched but not yet dispatched. Determines which instructions can be dispatched in parallel. Complicated for s > 1. Much simpler for RISC than CISC. CISC’s typical require multiple pipeline stages for decoding. CISC instructions are translated into internal RISC instructions. Intel refers to these as ops (pronounced “you-ops”). Must quickly identify control-flow changing instructions and provide feedback to the fetch stage.

Pentium Pro example

Dispatch Different types of instructions are executed by different functional units. The decode stage identifies the instruction type. The dispatch stage routs the instruction to the appropriate functional unit in the execution stage.

Execution Execution unit is the heart of a superscalar computer. The trend is towards more parallel and more diversified pipelines. More functional units and more specialized functional units. Early scalar pipeline machines had only one functional unit. i.e. our Mips. With possibly a separate functional unit for floating point. Current superscalar processors may have multiple integer units and multiple floating point units.

Completion and retiring An instruction is complete when it finishes execution and updates the machine state. An instruction finishes execution when it exits the functional unit and enters the completion buffer. When the instruction exits the completion unit all registers have been updated. The instruction is architecturally complete. However, memory may still need to be written. The instruction exits the retire unit when memory has been written. Instructions that do not update memory are retired as soon as they exit the completion unit.

Interrupts and exceptions Interrupts – asynchronous external events. Stop fetching new instructions. Allow instructions in pipeline to finish. Save machine state. Transfer control to interrupt service routine. Exceptions – induced by the execution of an instruction. Precise interrupts require that machine state be save just prior to the exception. Complicated.