Modern Processor Design: Superscalar and Superpipelining

Slides:



Advertisements
Similar presentations
Computer Organization and Architecture
Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
Pipelined Processor II (cont’d) CPSC 321
11/1/2005Comp 120 Fall November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.
Instruction Level Parallelism (ILP) Colin Stevens.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 1 Instructor: L.N. Bhuyan
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Fall 2014, Nov ELEC / Lecture 12 1 ELEC / Computer Architecture and Design Fall 2014 Instruction-Level Parallelism.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
1 Lecture 5 Overview of Superscalar Techniques CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading: Textbook, Ch. 2.1 “Complexity-Effective.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CMPE 421 Parallel Computer Architecture
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
EKT303/4 Superscalar vs Super-pipelined.
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Modern processor design
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng
Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.
Use of Pipelining to Achieve CPI < 1
Chapter Six.
CS 352H: Computer Systems Architecture
Advanced Architectures
William Stallings Computer Organization and Architecture 8th Edition
Parallel computer architecture classification
Pipeline Architecture since 1985
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Single Clock Datapath With Control
Pipeline Implementation (4.6)
COMP4211 : Advance Computer Architecture
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Vishwani D. Agrawal James J. Danaher Professor
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
The Multicycle Implementation
Instruction Level Parallelism and Superscalar Processors
Computer Structure S.Abinash 11/29/ _02.
Computer Architecture
Systems Architecture II
The Multicycle Implementation
Chapter Six.
Chapter Six.
Control unit extension for data hazards
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Daxia Ge Friday February 9th, 2007
November 5 No exam results today. 9 Classes to go!
Vishwani D. Agrawal James J. Danaher Professor
Control unit extension for data hazards
CSC3050 – Computer Architecture
Created by Vivi Sahfitri
Control unit extension for data hazards
Systems Architecture II
Instruction Level Parallelism
Guest Lecturer: Justin Hsia
Presentation transcript:

Modern Processor Design: Superscalar and Superpipelining UCSD CSE 141 Larry Carter Winter, 2002 Modern Processor Design: Superscalar and Superpipelining Note: Some of the material in this lecture are COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED. Figures may be reproduced only for classroom or personal education use in conjunction with our text and only when the above line is included. 3/6/02 CSE 141 - Modern processors

Today’s processors Not fundamentally different than the techniques we discussed, but ... Deeper pipelines (superpipelining) Example: 20 stages in Pentium 4. Pipelining is combined with: superscalar processing: issuing more than 1 instruction per cycle (3 or 4 is common) out-of-order execution allowing instructions to jump ahead of others in line VLIW (very long instruction word) packaging instruction in group, always executed together CSE 141 - Modern processors

Superscalar Execution IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM IM Reg ALU DM CSE 141 - Modern processors

Superscalar vs. superpipelined (multiple instructions in the same stage, same CR as scalar) (more total stages, faster clock rate) CSE 141 - Modern processors

A modest superscalar MIPS what can this machine do in parallel? what other logic is required? CSE 141 - Modern processors

Superscalar Execution To execute, say, four instructions in the same cycle, we must find four independent instructions. In a VLIW processor, the compiler produces groups of four that are guaranteed to be independent. In an in-order superscalar processor, the hardware executes as many of the next instructions (up to four) that it can, making sure that they are independent. In an out-of-orde processor, the hardware finds four (not necessarily consecutive) instructions that are independent. What do you think are the tradeoffs? CSE 141 - Modern processors

Superscalar Scheduling Assume the “modest superscalar processor” (in-order; can execute one R-type and one I-type together) lw $6, 36($2) add $5, $6, $4 lw $7, 1000($5) sub $9, $12, $8 sw $5, 200($6) add $3, $9, $9 and $11, $5, $6 When does each instruction begin execution? CSE 141 - Modern processors

Out-of-order Scheduling Starts execution of an instruction as soon as all of its dependences are satisfied, even if prior instructions are stalled. lw $6, 36($2) add $5, $6, $4 lw $7, 1000($5) sub $9, $12, $8 sw $5, 200($6) add $3, $9, $9 and $11, $5, $6 CSE 141 - Modern processors

PowerPC 604, Intel Pentium Reservation stations hold decoded instructions waiting for needed values. CSE 141 - Modern processors

Pipelining -- Key Points Execution time = Number of instructions * CPI * cycle time Data hazards and branch hazards prevent CPI from reaching 1.0, but forwarding and branch prediction get it pretty close. To improve performance we must reduce cycle time (superpipelining) or reduce CPI below one (superscalar and VLIW). CSE 141 - Modern processors

Computer(s) of the Day A brief history of supercomputers: 70’s & 80’s: Vector computers (usually CRAY) Set up pipeline (e.g. y[i] = a *x[i] + b*y[i-1]) Once it gets going, delivers one result per cycle Very fast memories (usually SRAM) 90’s: Supercomputers based on “commodity processors” SMP’s “Shared Memory” or “Symmetric MultiProcessor” Many processors sharing same memory (address space) Communicating by writing/reading to common locations MPP’s (“Massively Parallel Processors”) or “Multicomputer” Processors have separate memories Communicate via I/O (message passing) Both styles are harder to program than Vector machines, but give more FLOPS per $$. CSE 141 - Modern processors

Computer(s) of the Day And today we have ... the “SMP cluster” Multiple processors sharing memory within a “node”. Multiple nodes communicating via message passing. the worst of both SMP & MPP (from programmers perspective) Ten fastest computers (on “Top500” list) ... based on Linpack benchmark performance 5 are IBM SP’s, 2 DEC Alpha clusters, 1 cluster of Intel Pentiums, 1 SGI cluster 1 Hitachi vector CSE 141 - Modern processors

Computer(s) of the Day IBM “Blue Horizon” at UCSD’s San Diego Supercomputer Center (SDSC) Installed (2/2000); 8th fastest in world (now 18th). Fastest available to academic scientists. Power3 processor – 375 MHz, 4 Float Ops/cycle 8 processors per node 144 nodes  1152 processors  1.7 TeraFLOP peak 4 (or more) GByte DRAM/node  576 GB memory 5.1 TeraBytes of disk storage Used to simulate colliding galaxies, beating heart, chemical activity in brain; to look for patterns in DNA; to factor large numbers; etc. CSE 141 - Modern processors