Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors
Topics Left Superscalar machines IA64 / EPIC architecture
Computer Organization and Architecture
Computer architecture
CSCI 4717/5717 Computer Architecture
Arsitektur Komputer Pertemuan – 13 Super Scalar
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Superscalar and VLIW Architectures Miodrag Bolic CEG3151.
ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
CSE 8383 Superscalar Processor 1 Abdullah A Alasmari & Eid S. Alharbi.
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Instruction-Level Parallelism (ILP)
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.
Chapter 12 Pipelining Strategies Performance Hazards.
1 Pertemuan 21 Parallelism and Superscalar Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
(6.1) Central Processing Unit Architecture  Architecture overview  Machine organization – von Neumann  Speeding up CPU operations – multiple registers.
Parallelism Processing more than one instruction at a time. Pipelining
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall:  Multicycle instructions lead to the requirement.
RISC By Ryan Aldana. Agenda Brief Overview of RISC and CISC Features of RISC Instruction Pipeline Register Windowing and renaming Data Conflicts Branch.
RISC Architecture RISC vs CISC Sherwin Chan.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Lecture 8Fall 2006 Chapter 6: Superscalar Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design, Patterson & Hennessy,
Pipelining and Parallelism Mark Staveley
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
EKT303/4 Superscalar vs Super-pipelined.
Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Dataflow Order Execution  Use data copying and/or hardware register renaming to eliminate WAR and WAW ­register name refers to a temporary value produced.
CSIE30300 Computer Architecture Unit 13: Introduction to Multiple Issue Hsin-Chou Chi [Adapted from material by and
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Instruction level parallelism And Superscalar processors By Kevin Morfin.
PipeliningPipelining Computer Architecture (Fall 2006)
Chapter Six.
Advanced Architectures
William Stallings Computer Organization and Architecture 8th Edition
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
Superscalar Processors & VLIW Processors
Superscalar Pipelines Part 2
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Instruction Level Parallelism and Superscalar Processors
Chapter Six.
Chapter Six.
Computer Architecture
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Chapter 13 Instruction-Level Parallelism and Superscalar Processors
Superscalar and VLIW Architectures
Created by Vivi Sahfitri
COMPUTER ORGANIZATION AND ARCHITECTURE
Presentation transcript:

Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors

Outline What is a superscalar architecture? Superpipelining Features of superscalar architectures Data dependencies Policies for parallel instruction execution Register renaming

Performance Improvement There are two typical approaches today, in order to improve performance: Superpipelining Superscalar

Superpipelining Many pipeline stages need less than half a clock cycle Double internal clock speed gets two tasks per external clock cycle Superpipelining is based on dividing the stages of a pipeline into substages and thus increasing the number of instructions which are supported by the pipeline at a given moment.

Superpipelining By dividing each stage into two, the clock cycle period t will be reduced to the half, t/2; hence, at the maximum capacity, the pipeline produces a result every t/2 s For a given architecture and the corresponding instruction set there is an optimal number of pipeline stages; increasing the number of stages over this limit reduces the performance A solution to further improve speed is the superscalar architecture

Pipelined, Superpiplined and Superscalar Execution

What is a Superscalar Architecture? A superscalar architecture is one in which several instructions can be initiated simultaneously and executed independently. Pipelining allows several instructions to be executed at the same time, but have to be in different pipeline stages at a given moment. Superscalar architectures include all features of pipelining but, in addition, can be several instructions executing simultaneously in the same pipeline stage.

Superscalar Architectures Superscalar architectures allow several instructions to be issued and completed per clock cycle A superscalar architecture consists of a number of pipelines that are working in parallel Depending on the number and kind of parallel units available, a certain number of instructions can be executed in parallel.

General Superscalar Organization

Superscalar Architecture

Limitations on Parallel Execution The situations which prevent instructions to be executed in parallel by a superscalar architecture are very similar to those which prevent an efficient execution on any pipelined architecture The consequences of these situations on superscalar architectures are more severe than those on simple pipelines, because the potential of parallelism is greater and, thus, a greater opportunity is lost Limited by: Resource conflicts Procedural dependency (control dependency) Data dependency

Resource Conflicts They occur if two or more instructions compete for the same resource (register, memory, functional unit) at the same time; they are similar to structural hazards discussed with pipelines. Introducing several parallel pipelined units, superscalar architectures try to reduce a part of possible resource conflicts.

Control Dependency The presence of branches creates major problems in assuring an optimal parallelism If instructions are of variable length, they cannot be fetched and issued in parallel; an instruction has to be decoded in order to identify the following one and to fetch it Superscalar techniques are efficiently applicable to RISCs, with fixed instruction length and format

Data Conflicts Data conflicts are produced by dependencies Because superscalar architectures provide a great liberty in the order in which instructions can be issued and completed, data dependencies have to be considered with much attention. Three types of data dependencies True data dependency Output dependency Antidependency

Pipelined, Superpiplined and Superscalar Execution

Dependencies

True Data Dependency True data dependency exists when the output of one instruction is required as an input to a subsequent instruction

True Data Dependency True data dependencies are intrinsic features of the user program. They cannot be eliminated by compiler or hardware techniques. The simplest solution is to stall the adder until the multiplier has finished. In order to avoid the adder to be stalled, the compiler or hardware can find other instructions which can be executed by the adder until result of the multiplication is available.

Output Dependency An output dependency exists if two instructions are writing into the same location; if the second instruction writes before the first one, an error occurs:

Antidependency An antidependency exists if an instruction uses a location as an operand while a following one is writing into that location; if the first one is still using the location when the second one writes into it, an error occurs:

Output Dependencies and Antidependencies

Solution: Additional Registers

Policies for Parallel Instruction Execution (1)

Policies for Parallel Instruction Execution (2)

Policies for Parallel Instruction Execution (3)

Execution Policies In-order issue with in-order completion. In-order issue with out-of-order completion. Out-of-order issue with out-of-order completion.

Example We consider the following superscalar architecture Two instructions can be fetched and decoded at a time Three functional units can work in parallel: floating point unit, integer adder, integer multiplier Two instructions can be written back (completed) at a time; results are written (completion) in the same order

Example

In-Order Issue with In-Order Completion

To guarantee in-order completion, when there is a conflict for a functional unit or when a functional unit require more than 1 cycle, the issuing temporarily stalls.

In-Order Issue with In-Order Completion Has not to bother about output-dependency and antidependency It has only to detect true data dependencies. No one of two dependencies will be violated if instructions are issued/completed in- order

In-Order Issue with In-Order Completion

In-Order Issue with Out-of- Order Completion

Out-of-Order Issue with Out-of- Order Completion

Register Renaming

Comments

Some Architectures PowerPC 604 six independent execution units: Branch execution unit Load/Store unit 3 Integer units Floating-point unit in-order issue Power PC 620 provides in addition to the 604 out-of-order issue

Some Architectures Pentium three independent execution units: 2 Integer units Floating point unit in-order issue Intel P6 provides in addition to the Pentium out-of-order issue