Copyright © 2011, Elsevier Inc. All rights Reserved.

Slides:



Advertisements
Similar presentations
VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.
Advertisements

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Programmability Issues
EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 ILP (Recap). 2 Basic Block (BB) ILP is quite small –BB: a straight-line code sequence with no branches in except to the entry and no branches out except.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Chapter 4 Advanced Pipelining and Intruction-Level Parallelism Computer Architecture A Quantitative Approach John L Hennessy & David A Patterson 2 nd Edition,
Lecture 3: Chapter 2 Instruction Level Parallelism Dr. Eng. Amr T. Abdel-Hamid CSEN 601 Spring 2011 Computer Architecture Text book slides: Computer Architec.
EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
1 Chapter 01 Authors: John Hennessy & David Patterson.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Instruction Level Parallelism Pipeline with data forwarding and accelerated branch Loop Unrolling Multiple Issue -- Multiple functional Units Static vs.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.
Pentium 4 Deeply pipelined processor supporting multiple issue with speculation and multi-threading 2004 version: 31 clock cycles from fetch to retire,
Compiler Techniques for ILP
Instruction Level Parallelism
Computer Architecture Principles Dr. Mike Frank
Chapter 6 Parallel Processors from Client to Cloud
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
The University of Adelaide, School of Computer Science
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Copyright © 2011, Elsevier Inc. All rights Reserved.
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Appendix L Authors: John Hennessy & David Patterson.
Lecture 10 Tomasulo’s Algorithm
CSL718 : VLIW - Software Driven ILP
Appendix G Authors: John Hennessy & David Patterson.
Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
CS 201 Compiler Construction
CS 704 Advanced Computer Architecture
Copyright © 2016 Elsevier Inc. All rights reserved.
Instruction Level Parallelism (ILP)
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Copyright © 2012, Elsevier Inc. All rights Reserved.
Chapter 11.
Copyright © 2012, Elsevier Inc. All rights Reserved.
Copyright © 2013 Elsevier Inc. All rights reserved.
Copyright © 2012, Elsevier Inc. All rights Reserved.
Chapter 10.
Copyright © 2014, 2000, 1992 Elsevier Inc. All rights reserved.
8 – Simultaneous Multithreading
How to improve (decrease) CPI
Copyright © 2012, Elsevier Inc. All rights Reserved.
Copyright © 2013 Elsevier Inc. All rights reserved.
The University of Adelaide, School of Computer Science
Static Scheduling Techniques
© 2012 Elsevier, Inc. All rights reserved.
Modeling Functionality with Use Cases
Copyright © 2012, Elsevier Inc. All rights Reserved.
Chapter 12.
Chapter 6.
Copyright © 2012, Elsevier Inc. All rights Reserved.
© 2012 Elsevier, Inc. All rights reserved.
Chapter 103 Long-Term Care: The Global Impact
Chapter 01.
Copyright © 2013 Elsevier Inc. All rights reserved.
Copyright © 2012, Elsevier Inc. All rights Reserved.
Chapter 15 Contraception
Copyright © 2013 Elsevier Inc. All rights reserved.
Chapter 15.
Chapter 20 Assisted Reproductive Technologies
Chapter 3.
Presentation transcript:

Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix H Authors: John Hennessy & David Patterson Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.1 A software-pipelined loop chooses instructions from different loop iterations, thus separating the dependent instructions within one iteration of the original loop. The start-up and finish-up code will correspond to the portions above and below the software-pipelined iteration. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.2 The execution pattern for (a) a software-pipelined loop and (b) an unrolled loop. The shaded areas are the times when the loop is not running with maximum overlap or parallelism among instructions. This occurs once at the beginning and once at the end for the software-pipelined loop. For the unrolled loop it occurs m/n times if the loop has a total of m iterations and is unrolled n times. Each block represents an unroll of n iterations. Increasing the number of unrollings will reduce the start-up and clean-up overhead. The overhead of one iteration overlaps with the overhead of the next, thereby reducing the impact. The total area under the polygonal region in each case will be the same, since the total number of operations is just the execution rate multiplied by the time. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.3 A code fragment and the common path shaded with gray. Moving the assignments to B or C requires a more complex analysis than for straight-line code. In this section we focus on scheduling this code segment efficiently without hardware assistance. Predication or conditional instructions, which we discuss in the next section, provide another way to schedule this code. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.4 This trace is obtained by assuming that the program fragment in Figure H.3 is the inner loop and unwinding it four times, treating the shaded portion in Figure H.3 as the likely path. The trace exits correspond to jumps off the frequent path, and the trace entrances correspond to returns to the trace. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.5 This superblock results from unrolling the code in Figure H.3 four times and creating a superblock. Copyright © 2011, Elsevier Inc. All rights Reserved.

Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.11 The performance of four multiple-issue processors for five SPECfp and SPECint benchmarks. The clock rates of the four processors are Itanium 2 at 1.5 GHz, Pentium 4 Extreme Edition at 3.8 GHz, AMD Athlon 64 at 2.8 GHz, and the IBM Power5 at 1.9 GHz. Copyright © 2011, Elsevier Inc. All rights Reserved.