Pipelining – Loop unrolling and Multiple Issue

Slides:

Advertisements

Similar presentations

CSCI 4717/5717 Computer Architecture

Advertisements

1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Datorteknik F1 bild 1 Instruction Level Parallelism Scalar-processors –the model so far SuperScalar –multiple execution units in parallel VLIW –multiple.

EECE476: Computer Architecture Lecture 22: Zero-cycle Branches (no text) Superpipelining (no text) vs. Superscalar (text 6.8) The University of British.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 19 - Pipelined.

Instruction Level Parallelism (ILP) Colin Stevens.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

Computer Architecture 2011 – out-of-order execution (lec 7) 1 Computer Architecture Out-of-order execution By Dan Tsafrir, 11/4/2011 Presentation based.

1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 3.

RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.

EECC551 - Shaaban #1 Winter 2002 lec# Static Compiler Optimization Techniques We already examined the following static compiler techniques aimed.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Architecture Basics ECE 454 Computer Systems Programming

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

10/27: Lecture Topics Survey results Current Architectural Trends Operating Systems Intro –What is an OS? –Issues in operating systems.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and

5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.

Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

Memory – Caching: Writes

CS 352H: Computer Systems Architecture

Memory – Caching: Performance

Data Prefetching Smruti R. Sarangi.

CS2100 Computer Organization

CS/COE 1541 (term 2174) Jarrett Billingsley

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

Pipelining – Out-of-order execution and exceptions

Performance of Single-cycle Design

Pipeline Architecture since 1985

Single Clock Datapath With Control

CS203 – Advanced Computer Architecture

ECE232: Hardware Organization and Design

The fetch-execute cycle

Pipelining: Advanced ILP

Morgan Kaufmann Publishers The Processor

Computer Architecture Lecture 3

Register Pressure Guided Unroll-and-Jam

CSCI206 - Computer Organization & Programming

Computer Architecture

Instruction Level Parallelism (ILP)

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

November 5 No exam results today. 9 Classes to go!

Multicycle and Microcode

Data Prefetching Smruti R. Sarangi.

Pipelining, Superscalar, and Out-of-order architectures

Instruction Rescheduling and Loop-Unroll

CSC3050 – Computer Architecture

Loop-Level Parallelism

Instruction Level Parallelism

Guest Lecturer: Justin Hsia

Lecture 11: Machine-Dependent Optimization

Presentation transcript:

Pipelining – Loop unrolling and Multiple Issue CS/COE 1541 (term 2174) Jarrett Billingsley

Class Announcements I'll make your study guide tonight! Honest! I swear! ...I say, every single time... It'll explain the format, topics, and have some practice questions. I will have your homework and quizzes graded by Monday. I will probably send the grades for them out earlier. I can send out the quiz solutions too so you can study. Project 1 comes after the exam. Probably have a month to do it. 1/25/2017 CS/COE 1541 term 2174

But first... Finishing branch prediction 1/25/2017 CS/COE 1541 term 2174

Loop unrolling 1/25/2017 CS/COE 1541 term 2174

Reducing branch frequency The fastest code is the code that never runs. One way to make branches faster is to... not branch as much. Loop unrolling is a compiler technique to reduce the number of branches. It does this by duplicating the loop body, reducing the number of iterations needed. for(i = 0; i < 100; i++) a[i] = b[i] + c[i]; Original loop for(i = 0; i < 100; i += 2){ a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; } Unrolled loop (2X) 1/25/2017 CS/COE 1541 term 2174

How far to unroll? The previous example doubled the code in the loop. Of course we can unroll 3X, 4X, 8X... what are the tradeoffs? Space vs. time is the big one. But memory today is big, network connections are fast... is this so much of a problem? Well...... Caching is the big bottleneck these days. The bigger the code is, the less of it will fit in the cache. This is bad, as we'll see. 1/25/2017 CS/COE 1541 term 2174

Multiple Issue (Superscalar) CPUs 1/25/2017 CS/COE 1541 term 2174

From CPI to IPC So far, the optimum CPI has been 1. One cycle to complete each instruction. But what if we could go below 1? (wat? half a cycle per instruction? well, no.) Instead of fetching just ONE instruction each cycle... Fetch two! Now we measure performance in IPC: Instructions per Cycle. 1/25/2017 CS/COE 1541 term 2174

Down the wrong pipe A common arrangement is to have multiple asymmetric pipelines in the CPU: for example, one to do ALU/Branches and one to do loads and stores. ALU Pipe I-Mem Ins. Decoder Register File D-Mem ALU Memory Pipe 1/25/2017 CS/COE 1541 term 2174

Keeping the pipelines full It's now up to the compiler (once again!) to schedule instructions in such a way that the pipelines are well-utilized. lw $t0, 0($s1) lw $t1, -4($s1) addi $s1, $s1, -8 add $t0, $t0, $s2 add $t1, $t1, $s2 sw $t0, 8($s1) sw $t1, 4($s1) CC ALU Pipe Mem Pipe 1 lw t0 2 addi s1 lw t1 3 add t0 4 add t1 sw t0 5 sw t1 1/25/2017 CS/COE 1541 term 2174

I told you about the compiler, bro What's wrong with the compiler doing this instruction scheduling? Well the code will run the same way every time, unlike branches. But the architecture could change, and updating compilers and recompiling code sucks. 1/25/2017 CS/COE 1541 term 2174

Oh no What about data dependencies? Oh dear lord. What about pipeline flushes? I-Mem Ins. Decoder Register File D-Mem ALU 1/25/2017 CS/COE 1541 term 2174