Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

CSCI 4717/5717 Computer Architecture
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Computer Architecture Computer Architecture Processing of control transfer instructions, part I Ola Flygt Växjö University
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Chapter 12 Pipelining Strategies Performance Hazards.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Goal: Reduce the Penalty of Control Hazards
Chapter 12 CPU Structure and Function. Example Register Organizations.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: label: P: Problem: The outcome.
EENG449b/Savvides Lec /25/05 March 24, 2005 Prof. Andreas Savvides Spring g449b EENG 449bG/CPSC 439bG.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Presented by: Sergio Ospina Qing Gao. Contents ♦ 12.1 Processor Organization ♦ 12.2 Register Organization ♦ 12.3 Instruction Cycle ♦ 12.4 Instruction.
COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Chapter 8 Pipelining. A strategy for employing parallelism to achieve better performance Taking the “assembly line” approach to fetching and executing.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
P5 Micro architecture : Intel’s Fifth generation
Pentium Architecture Arithmetic/Logic Units (ALUs) : – There are two parallel integer instruction pipelines: u-pipeline and v-pipeline – The u-pipeline.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
CPU Design and Pipelining – Page 1CSCI 4717 – Computer Architecture CSCI 4717/5717 Computer Architecture Topic: CPU Operations and Pipelining Reading:
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Chapter Six.
CS 352H: Computer Systems Architecture
Advanced Architectures
Computer Architecture Chapter (14): Processor Structure and Function
Computer Organization CS224
CS203 – Advanced Computer Architecture
Instruction Level Parallelism
Dynamic Branch Prediction
PowerPC 604 Superscalar Microprocessor
Pipeline Implementation (4.6)
Flow Path Model of Superscalars
Introduction to Pentium Processor
Pipelining: Advanced ILP
The processor: Pipelining and Branching
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.
Chapter Six.
Chapter Six.
Control unit extension for data hazards
CSC3050 – Computer Architecture
Control unit extension for data hazards
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
Control unit extension for data hazards
Instruction Level Parallelism
Presentation transcript:

Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex IFU Able to issue multiple instructions/cycle (typ 4) Able to detect hazards (unavailability of operands) Able to re-order instruction issue Aim to keep all the FUs busy Typically, 6-way superscalars can achieve instruction level parallelism of 2-3

SOFTENG 363 Computer Architecture Speculation & Branching

Speculation High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed Allows SIU to use idle bus bandwidth if there’s no spare bandwidth, this read can be given low priority Speculative because a branch may occur before it’s used we speculate that this data may be needed PowerPC mnemonic - Similar opcodes found in other architectures: SPARC v9, MIPS, …

Speculation - General Some functional units almost always idle Make them do some (possibly useful) work rather than idle If the speculation was incorrect, results are simply abandoned No loss in efficiency; Chance of a gain Researchers are actively looking at software prefetch schemes Fetch data well before it’s needed Reduce latency when it’s actually needed Speculative operations have low priority and use idle resources

Branching Expensive 2-3 cycles lost in pipeline All instructions following branch ‘flushed’ Bandwidth wasted fetching unused instructions Stall while branch target is fetched We can speculate about the target of a branch Terminology Branch Target : address to which branch jumps Branch Taken : control transfers to non- sequential address (target) Branch Not Taken : next instruction is executed

Branching - Prediction Branches can be unconditional: branch is always taken call subroutine return from subroutine conditional: branch depends on state of computation, eg has loop terminated yet? Unconditional branches are simple New instructions are fetched as soon as the branch is recognized As early in the pipeline as possible Branch units often placed with fetch & decode stages

Branching - Branch Unit PowerPC 603 logical layout

Branching - Speculation We have the following code: if ( cond ) s1; else s2; Superscalar machine Multiple functional units Start executing both branches ( s1 and s2 ) Keep idle functional units busy! One is speculative and will be abandoned Processor will eventually calculate the branch condition and select which result should be retained (written back) MIPS R up to 4 speculative at once

Branching - Speculation MIPS R Up to 4 speculative at once Instructions are “tagged” with a 4 bit mask Indicates to which branch instruction it belongs As soon as condition is determined, mis-predicted instructions are aborted

Branching - Prediction We have a sequence of instructions: add lw sub brne L1 or st ?If you were asked to guess which branch should be preferred, which would you choose: ?Next sequential instruction ( L2 ) ?Branch target ( L1 ) L2 L1 Some mixture of arithmetic, load, store, etc, instructions branch on some condition Some more arithmetic, load, store, etc, instructions

Branching - Prediction Studies show that branches are taken most of the time! Because of loops: add;any mix of arith, lw;load, store, etc, sub;instructions brne L1 ;branch back to loop start or ;some more arith, st ;memory, etc instructions L2 L1

Branching - Prediction Rule A simple prediction rule: Take backward branches works amazingly well! For a loop with n iterations, this is wrong in 1/n cases only! A system working on this rule alone would detect the backward branch and start fetching from the branch target rather than the next instruction

Branching - Improving the prediction Static prediction systems Compiler can mark branches Likely to be taken or not Instruction fetch unit will use the marking as advice on which instruction to fetch Compiler often able to give the right advice Loops are easily detected Other patterns in conditions can be recognized Checking for EOF when reading a file Error checking

Branching - Improving the prediction Dynamic prediction systems Program history determines most likely branch Branch Target Buffers - Another cache!

Branching - Branch Target Buffer Instruction Add[11:3] selects BTB entry Tag determines “hit” Stats select taken/not taken R % prediction accuracy - SPEC’92 integer