Copyright © 2011, Elsevier Inc. All rights Reserved. Appendix H Authors: John Hennessy & David Patterson Copyright © 2011, Elsevier Inc. All rights Reserved.
Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.1 A software-pipelined loop chooses instructions from different loop iterations, thus separating the dependent instructions within one iteration of the original loop. The start-up and finish-up code will correspond to the portions above and below the software-pipelined iteration. Copyright © 2011, Elsevier Inc. All rights Reserved.
Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.2 The execution pattern for (a) a software-pipelined loop and (b) an unrolled loop. The shaded areas are the times when the loop is not running with maximum overlap or parallelism among instructions. This occurs once at the beginning and once at the end for the software-pipelined loop. For the unrolled loop it occurs m/n times if the loop has a total of m iterations and is unrolled n times. Each block represents an unroll of n iterations. Increasing the number of unrollings will reduce the start-up and clean-up overhead. The overhead of one iteration overlaps with the overhead of the next, thereby reducing the impact. The total area under the polygonal region in each case will be the same, since the total number of operations is just the execution rate multiplied by the time. Copyright © 2011, Elsevier Inc. All rights Reserved.
Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.3 A code fragment and the common path shaded with gray. Moving the assignments to B or C requires a more complex analysis than for straight-line code. In this section we focus on scheduling this code segment efficiently without hardware assistance. Predication or conditional instructions, which we discuss in the next section, provide another way to schedule this code. Copyright © 2011, Elsevier Inc. All rights Reserved.
Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.4 This trace is obtained by assuming that the program fragment in Figure H.3 is the inner loop and unwinding it four times, treating the shaded portion in Figure H.3 as the likely path. The trace exits correspond to jumps off the frequent path, and the trace entrances correspond to returns to the trace. Copyright © 2011, Elsevier Inc. All rights Reserved.
Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.5 This superblock results from unrolling the code in Figure H.3 four times and creating a superblock. Copyright © 2011, Elsevier Inc. All rights Reserved.
Copyright © 2011, Elsevier Inc. All rights Reserved. Figure H.11 The performance of four multiple-issue processors for five SPECfp and SPECint benchmarks. The clock rates of the four processors are Itanium 2 at 1.5 GHz, Pentium 4 Extreme Edition at 3.8 GHz, AMD Athlon 64 at 2.8 GHz, and the IBM Power5 at 1.9 GHz. Copyright © 2011, Elsevier Inc. All rights Reserved.