Download presentation
Presentation is loading. Please wait.
1
HARP Control Divergence & Assignment 4
Blaise Tine Georgia Institute of Technology
2
Questions? Agenda Harp Control Divergence Assignment 4 Predication
Split-Join Assignment 4 Codebase Clone Barriers Samples Walkthrough Questions?
3
Two techniques supported by ISA: Predication
Control Divergence Two techniques supported by ISA: Predication Control branch divergence at instruction granularity Split-Join Control branch divergence at block granularity
4
Harp Predication Full Predication Implementation
All instructions can be predicated Implementation Separate predicate register file All predicated instructions execute Fetch => Decode => Execute Conditional Commit stage Only instructions with predicate value ‘true’
5
Harp Predication (2) Compiler Support Example
If-conversion: Converts control dependencies into data dependencies Example Set predicate if (r1) { ++r2; } else { --r2; } %r1 @p0 ? addi %r2, %r2, #1 @p0 @p0 ? Subi %r2, %r2, #1 Inverse predicate
6
Predicate Value Test Instructions
Harp Predication (3) Predicate Value Test Instructions %src %src %src Predicate Manipulation Instructions @src0
7
Harp Predication (4) Advantages Limitations No branching overhead
Simple microarchitecture Limitations If-conversion is not always possible e.g. loops, indirect branches Inefficient with unanimous branches Both paths are always executed
8
Hardware stack management Compiler support
Harp Split-Join ISA Support @p split: partition a warp using predicate mask, each subset taking different target join: merge partitioned subset into single execution block Implementation Hardware stack management Compiler support
9
Harp Split-Join (2) Example Set predicate NPC mask rtop @p0, %r1
@p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
10
Harp Split-Join (2) Example push PC and mask onto HW stack NPC mask @2
1001 @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
11
Harp Split-Join (2) Example Execute threads with ‘true’ predicate NPC
mask @2 1001 @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
12
Harp Split-Join (2) Example Execute threads with ‘true’ predicate NPC
mask @2 1001 @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
13
Harp Split-Join (2) Example Pop HW stack and jmp to @2 NPC mask @7
0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
14
Harp Split-Join (2) Example Execute threads with ‘false’ predicate NPC
mask @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
15
Harp Split-Join (2) Example Execute threads with ‘false’ predicate NPC
mask @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
16
Harp Split-Join (2) Example Execute threads with ‘false’ predicate NPC
mask @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
17
Harp Split-Join (2) Example Pop HW stack and jmp to @7 NPC mask
%r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }
18
Harp Split-Join (2) Advantages Challenges
Efficient with unanimous branches Only a single path is executed The active mask turns off inactive threads Challenges Complex microarchitecture HW stack manager Split-jmp-Join overhead
19
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Mini Harp Minimal ISA Word encoding Integers only A single predicate register No Split-Join No warps creation No interrupts No virtual addressing Instructions Set Nop, Add, Sub, And, Or, Xor, Not, Shr, Shl, Ld, St, Jmp, Jal, Bar Configuration Register size, warp size, number of warps Chapter 1 — Computer Abstractions and Technology
20
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Code base Shared header Common.h // common includes and definitions Utility Library utils.cpp/h // utility functions Core classes mem.cpp/h // memory lrucache.cpp/h // cache Instr.cpp/h // instruction decode. cpp/h // decoder regfile.h // register file warp.cpp/h // warp unit core.cpp/h // processor core Chapter 1 — Computer Abstractions and Technology
21
Assignment 4: Core Initialization
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Core Initialization Program RAM Core Construction Console output Load/Store Unit ICache & DCache IDecoder Warps Chapter 1 — Computer Abstractions and Technology
22
Assignment 4: Memory Layout
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Memory Layout console RAM Chapter 1 — Computer Abstractions and Technology
23
Assignment 4: Warp Initialization
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Warp Initialization Warp Construction GP Registers Pred Registers Boot enable Chapter 1 — Computer Abstractions and Technology
24
Assignment 4: Warp Execute
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Warp Execute Step Function Pipeline stages Fetch Decode Chapter 1 — Computer Abstractions and Technology
25
Assignment 4: Warp Execute (2)
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Warp Execute (2) Execution Instructions Predication Jump instruction Set predicate Add your code! Chapter 1 — Computer Abstractions and Technology
26
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Clone Instruction Format clone %src0 Operation Copy current lane registers into %src0 lane. Register %src0 holds the destination lane index. e.g. ldi %r0, #2 clone %r0 # copy current registers into 3rd lane. Chapter 1 — Computer Abstractions and Technology
27
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Barrier Instruction Format bar %src0, %src1 Operation Synchronize %src1 number of warps with barrier identifier %src0. Register %src0 holds the barrier id (supported max value is 3). Register %src1 holds the number of warps to wait on. e.g. ldi %r0, #1 ldi %r1, # 2 bar %r0, %r1 # insert a size-2 named barrier with id=1 Chapter 1 — Computer Abstractions and Technology
28
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Testing Emulator command line ./miniharp.out –r #regs –t #threads –w #warps –o #output Sample programs $ ./miniharp.out hello.bin -t 4 -w 1 -r 8 -o output.log $ ./miniharp.out sum.bin -t 4 -w 1 -r 8 -o output.log $ ./miniharp.out barrier.bin -t 4 -w 4 -r 8 -o output.log Output format “<Program Output>” “Instruction Count: <?>” Chapter 1 — Computer Abstractions and Technology
29
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: runtime.s Print Hex Print String Print NewLine Chapter 1 — Computer Abstractions and Technology
30
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: hello.s Load string Call prints Exit String data Chapter 1 — Computer Abstractions and Technology
31
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: sum.s Clone Registers Parallel Call Print result0 Array data Output address Chapter 1 — Computer Abstractions and Technology
32
Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: barrier.s Start new Warp Barrier Single warp Print results Chapter 1 — Computer Abstractions and Technology
33
Questions? Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.