Instruction level parallelism And Superscalar processors By Kevin Morfin.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

CH14 Instruction Level Parallelism and Superscalar Processors
Instruction-Level Parallelism and Superscalar Processors
Topics Left Superscalar machines IA64 / EPIC architecture
Computer Organization and Architecture
Computer architecture
CSCI 4717/5717 Computer Architecture
Arsitektur Komputer Pertemuan – 13 Super Scalar
Chapter 14 Instruction Level Parallelism and Superscalar Processors
ARM Cortex A8 Pipeline EE126 Wei Wang. Cortex A8 is a processor core designed by ARM Holdings. Application: Apple A4, Samsung Exynos What’s the.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
Lecture 2-Berkeley RISC Penghui Zhang Guanming Wang Hang Zhang.
Instruction-Level Parallelism (ILP)
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
PZ13A Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13A - Processor design Programming Language Design.
Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.
1 Pertemuan 21 Parallelism and Superscalar Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.
Superscalar Implementation Simultaneously fetch multiple instructions Logic to determine true dependencies involving register values Mechanisms to communicate.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Computer ArchitectureFall 2007 © October 29th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Pipelining By Toan Nguyen.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
Chapter One Introduction to Pipelined Processors.
Advanced Computer Architectures
Processor Structure & Operations of an Accumulator Machine
Instruction Sets and Pipelining Cover basics of instruction set types and fundamental ideas of pipelining Later in the course we will go into more depth.
Parallelism Processing more than one instruction at a time. Pipelining
Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
Edited By Miss Sarwat Iqbal (FUUAST) Last updated:21/1/13
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
RISC architecture and instruction Level Parallelism (ILP) based on “Computer Architecture: a Quantitative Approach” by Hennessy and Patterson, Morgan Kaufmann.
Chapter 8 Pipelining. A strategy for employing parallelism to achieve better performance Taking the “assembly line” approach to fetching and executing.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
PART 5: (1/2) Processor Internals CHAPTER 14: INSTRUCTION-LEVEL PARALLELISM AND SUPERSCALAR PROCESSORS 1.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
PipeliningPipelining Computer Architecture (Fall 2006)
1 Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Computer Architecture Chapter (14): Processor Structure and Function
CDA3101 Recitation Section 8
William Stallings Computer Organization and Architecture 8th Edition
William Stallings Computer Organization and Architecture 8th Edition
Parallel Processing - introduction
Chapter 9 a Instruction Level Parallelism and Superscalar Processors
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Instruction Level Parallelism and Superscalar Processors
Instruction Scheduling for Instruction-Level Parallelism
Superscalar Processors & VLIW Processors
Superscalar Pipelines Part 2
Instruction Level Parallelism and Superscalar Processors
Computer Architecture
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Chapter 13 Instruction-Level Parallelism and Superscalar Processors
Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Created by Vivi Sahfitri
Lecture 11: Machine-Dependent Optimization
Processor design Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 11.3.
Presentation transcript:

Instruction level parallelism And Superscalar processors By Kevin Morfin

What are Superscalar processors A Superscalar processor is a processor in which multiple indepent instructions pipeline are used. - Each pipeline consists of multiple stages, which allow the pipeline to handle multiple instuctions at one time

Superscalar processor fetches multiple instructions at one time and attempts to fine nearby instructions that are independent of others and can execute in parrallel Superscalar approach can be used in RISC or CISC architecture. Superscalar processors are now standard method for implementing high performance microprocessors There are multiple functional units, each of which is implemented as a pipeline which support parallel execution of instructions

What is Instruction level parallelism Superscalar processors exploit what is called instruction level parallelism. Instruction level parallelism is the degree to which intructions of a program can be executed in parallel.

An alternative to Superscalar machines is Superpipelining. Superpipelining uses multiple pipeline stages per clock cycle. A pipeline can only execute only one instruction per clock cycle and has four stages. Functions performed in each stage can be split into more than one nonoverlapping parts and each can execute in half a clock cycle.

Limitations to parallelism True Data Depencency Procedural Dependency Resource conflict Output depency Antidepency

True data dependency True data dependency refers to when an instruction needs data produced by a previous instruction in order to execute. The second instruction is delayed as many clock cycles as required to remove the dependency. Example. consider the following code. ADD EAX, ECX ; //load register EAX with the contents of ECX plus EAX. MOV EBX, EAX; //load EBX with the contents of EAX.

Procedural Dependencies Procedural Dependency happens when there is a branch on an instruction. The instructions following the branch have a procedural dependency on the branch and cannot be executed until the branch is executed. There is another type of procedural dependency when there are variable length instructions are used. Because the variable-length instructions are not known, the instuctions must be partially decoded before the following instruction can be fecthed.

Resource Conflict Resource conflict is when there is a competition between two or more instructions for the same resource (memories, caches, buses, register file ports, functional units). Some what like data dependency but it can be overcome by duplicating resources.

Output Dependency Consider the code. I1: R3 R3 op R5 I2: R4 R3 + 1 I3: R3 R5 + 1 I4: R7 R3 op R4 There is no data dependency between I1 and I3, but if I3 execute before I1 then the wrong contents will be fetched for I4

Antidependency Consider the following code again. I1:R3 R3 op R5 I2:R4 R3 +1 I3:R3 R5 + 1 I4:R7 R3 op R4 The constaint in antidepency is similar to that of true data dependency but reversed Instead of the first instruction producing a value the second instruction uses, the second instruction destroys the value the first instruction produces

Pentium 4 The original Pentium had a modest superscalar component that consisted of two integer execution units. Pentium pro had a full blown superscalar design

Processor fetches instructions from memory in the order of the static program. Each instruction is translated into one or more fixed length RISC instructions known as micro operations Then the processor executes the micro operations on a superscalar pipeline organization which allows the micro operations to execute out of order. Finally the processor sends the result of each micro operation execution to the processsors register set in the order of the program flow.

Pipeline used by Pentium 4 The Pentium 4 has an outer CISC shell and RISC core. The mircro operations pass through a pipeline with at least 20 stages.

The ARM Cortex-A8 The ARM Cortex-A8 is a RISC based superscalar design. Implements a 13 stage pipeline.

References Stallings,W. (2010). Computer Organization and Architecture. Upper Saddle River, NJ: Prentice Hall Stallings,W.(2010). Computer Organization and Architecture. Retrieved November 20, 2010, from William Stallings' website: