CS 152 Computer Architecture & Engineering Andrew Waterman University of California, Berkeley Section 8 Spring 2010.

Slides:



Advertisements
Similar presentations
Krste Asanovic Electrical Engineering and Computer Sciences
Advertisements

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 Instruction-Level Parallelism and Its Exploitation Computer Architecture A Quantitative.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Alpha Microarchitecture Onur/Aditya 11/6/2001.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
National & Kapodistrian University of Athens Dep.of Informatics & Telecommunications MSc. In Computer Systems Technology Advanced Computer Architecture.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture ILP III Steve Ko Computer Sciences and Engineering University at Buffalo.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Instruction Level Parallelism (ILP) Colin Stevens.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )
1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
CS 152 Computer Architecture and Engineering Lecture 15 - Out-of-Order Memory, Complex Superscalars Review Krste Asanovic Electrical Engineering and Computer.
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
March 1, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
Lecture: Out-of-order Processors
CS252 Graduate Computer Architecture Spring 2014 Lecture 8: Advanced Out-of-Order Superscalar Designs Part-II Krste Asanovic
Out of Order Processors
Dr. George Michelogiannakis EECS, University of California at Berkeley
Lecture: Out-of-order Processors
Single Clock Datapath With Control
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Flow Path Model of Superscalars
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Computer Architecture Lecture 3 – Part 1 11th May, 2006
CS 152 Computer Architecture & Engineering
Computer Architecture Lecture 3
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Lecture 11: Memory Data Flow Techniques
Computer Architecture
Alpha Microarchitecture
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Krste Asanovic Electrical Engineering and Computer Sciences
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Dynamic Hardware Prediction
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Lecture 9: Dynamic ILP Topics: out-of-order processors
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

CS 152 Computer Architecture & Engineering Andrew Waterman University of California, Berkeley Section 8 Spring 2010

Mystery Die

DEC Alpha M transistors 600 MHz in 350 nm Highly speculative OoO superscalar

Mystery Die DEC Alpha M transistors 600 MHz in 350 nm Highly speculative OoO superscalar Fetch I$D$ Bus

Alpha Pipeline

Branch Prediction Two kinds of correlating branch predictors: LocalGlobal PC Local History Table Branch History Table Global History Branch History Table

Branch Prediction uses both! (tournament predictor) LocalGlobal Prediction PC Local History Table Branch History Table Global History Branch History Table Tournament Predictor

21264 Fetch Line/way prediction keeps fetch loop short

Alpha Pipeline

21264 Register Renaming Registers are renamed, then instructions are inserted into the issue queue Map table backed up on every in-flight insn

21264 Register Renaming What hazards does renaming obviate? In what situations is renaming useful? If you had to choose between branch prediction and renaming, which would you pick?

21264 Register Renaming What hazards does renaming obviate? – WAR, WAW In what situations is renaming useful? If you had to choose between branch prediction and renaming, which would you pick?

21264 Register Renaming What hazards does renaming obviate? – WAR, WAW In what situations is renaming useful? – Code with ILP and name dependencies: loops If you had to choose between branch prediction and renaming, which would you pick?

21264 Register Renaming What hazards does renaming obviate? – WAR, WAW In what situations is renaming useful? – Code with ILP and name dependencies: loops If you had to choose between branch prediction and renaming, which would you pick? – Not much ILP within a basic block, so renaming isn’t too useful without branch prediction

Alpha Pipeline

21264 Superscalar Execution The can decode, rename, issue, execute, and commit 4 insns/cycle How does circuit complexity scale with W in the following operations? – Instruction decode – Register renaming – Result bypassing

21264 Superscalar Execution The can decode, rename, issue, execute, and commit 4 insns/cycle How does circuit complexity scale with W in the following operations? – Instruction decode: O(W) – Register renaming – Result bypassing

21264 Superscalar Execution The can decode, rename, issue, execute, and commit 4 insns/cycle How does circuit complexity scale with W in the following operations? – Instruction decode: O(W) – Register renaming: O(W 2 ) – Result bypassing

21264 Superscalar Execution The can decode, rename, issue, execute, and commit 4 insns/cycle How does circuit complexity scale with W in the following operations? – Instruction decode: O(W) – Register renaming: O(W 2 ) – Result bypassing: O(W 2 )

21264 Superscalar Execution The can decode, rename, issue, execute, and commit 4 insns/cycle How does circuit complexity scale with W in the following operations? – Instruction decode: O(W) – Register renaming: O(W 2 ) – Result bypassing: O(W 2 ) What about issue window complexity?

21264 Superscalar Execution couldn’t fit full bypassing into one clock cycle Instead, they fully bypass within each of two clusters; inter-cluster bypass takes another cycle

21264 Instruction Reordering As mentioned earlier, uses explicit renaming, as opposed to data-in-ROB design What does ROB hold?

Memory Ordering in the To execute the critical instruction path quickly, want to execute loads ASAP Initially, loads speculatively bypass stores On a misspeculation, set a “wait” bit for that load’s PC, so it will behave conservatively from then on Clear wait bits periodically

Speculation in the What does the speculate on? – Next I$ line/way – Branches, indirect jumps – Exceptions – Load/Store ordering – Load hit/miss Shortens hit time by a cycle – Anything else?