Superscalar SMIPS Processor Andy Wright Leslie Maldonado.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

Topics Left Superscalar machines IA64 / EPIC architecture
Computer Organization and Architecture
1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
EECE476: Computer Architecture Lecture 23: Speculative Execution, Dynamic Superscalar (text 6.8 plus more) The University of British ColumbiaEECE 476©
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Instruction Level Parallelism (ILP) Colin Stevens.
Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Constructive Computer Architecture Tutorial 4: SMIPS on FPGA Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T04-1.
Hiding Synchronization Delays in a GALS Processor Microarchitecture Greg Semeraro David H. Albonesi Grigorios Magklis Michael L. Scott Steven G. Dropsho.
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
Computer Architecture Pipelines & Superscalars Sunset over the Pacific Ocean Taken from Iolanthe II about 100nm north of Cape Reanga.
Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.
EKT303/4 Superscalar vs Super-pipelined.
Constructive Computer Architecture Tutorial 4: Running and Debugging SMIPS Andy Wright TA October 10, 2014http://csg.csail.mit.edu/6.175T04-1.
Final Review Prof. Mike Schulte Advanced Computer Architecture ECE 401.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 1: Overview of High Performance Processors * Jeremy R. Johnson Wed. Sept. 27,
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
PipeliningPipelining Computer Architecture (Fall 2006)
Computer Organization CS224
Handling Exceptions In MIPS, exceptions managed by a System Control Coprocessor (CP0) Save PC of offending (or interrupted) instruction In MIPS: Exception.
William Stallings Computer Organization and Architecture 8th Edition
PowerPC 604 Superscalar Microprocessor
Constructive Computer Architecture Tutorial 6: Discussion for lab6
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Instruction Level Parallelism and Superscalar Processors
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Lab 4 Overview: 6-stage SMIPS Pipeline
Instruction Level Parallelism and Superscalar Processors
Control unit extension for data hazards
Embedded Computer Architecture 5SAI0 Chip Multi-Processors (ch 8)
Created by Vivi Sahfitri
Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture
Procedure Return Predictors
Presentation transcript:

Superscalar SMIPS Processor Andy Wright Leslie Maldonado

Project Goals N-way superscalar execution – Up to N instructions can be issued every cycle – N execution pipelines will share a single data memory IPC > 1 – Shows that superscalar execution is working

Background Data Hazards Control Hazards Structural Hazards – An instruction can’t be issued if it needs to use the same hardware as another instruction at the same time – Relevant for: Data Memory Redirect FIFO Coprocessor

System Overview

Instruction Memory Needs to be able to output N words Normal Instruction Memory

Instruction Memory Read from Address 0

Instruction Memory Read from Address

Instruction Memory Read from address 5 Unaligned accesses need permutations

Instruction Memory

Instruction Memory

Superscalar Fifo Needs to be able to enqueue N instruction per cycle Needs to be able to dequeue 1-N instructions per cycle Architecture similar to instruction memory

Superscalar Fifo

Scoreboard Keeps track of pending register writes to prevent RAW hazards – Scoreboards are used to prevent conflicts between instructions across clock cycles and within the same clock cycle Dispatch logic searches and writes to the scoreboard Writeback removes from the scoreboard – The order of these two operations depends on the type of registerfile

Scoreboard

Execution Pipelines Cores are given priorities between them Core 0 has earlier instructions than core 1 A mispredict in core I should kill instructions in cores > i

Execution Pipelines

Results (N=2)

Results (N=3) Didn’t get IPCs greater than 1, meaning this design was slower than the N=2 case. Why? – The branch predictor The branch predictor only predicts 1 out of every N instructions using the better branch predictor. The misprediction penalty is high, and the processor is paying the penalty more often for larger N’s

Structural Hazards in Bluespec Dispatch logic prevents two modules from needing to use the same hardware Bluespec compiler also checks for structural hazards, but is more aggressive. We had to create wrappers that would allow multiple modules to attempt to write to the same modules but only one actually gets to use it based on a fixed priority. If dispatch logic works, then the priority doesn’t matter since there will always be only one module write to it.

Conclusion We added N-way superscalar execution to the original SMIPS processor We saw IPC > 1 for every benchmarks on at least one processor for N=2 We tried N=3, but it suffered too much from misprediction