Presentation is loading. Please wait.

Presentation is loading. Please wait.

HARP Control Divergence & Assignment 4

Similar presentations


Presentation on theme: "HARP Control Divergence & Assignment 4"— Presentation transcript:

1 HARP Control Divergence & Assignment 4
Blaise Tine Georgia Institute of Technology

2 Questions? Agenda Harp Control Divergence Assignment 4 Predication
Split-Join Assignment 4 Codebase Clone Barriers Samples Walkthrough Questions?

3 Two techniques supported by ISA: Predication
Control Divergence Two techniques supported by ISA: Predication Control branch divergence at instruction granularity Split-Join Control branch divergence at block granularity

4 Harp Predication Full Predication Implementation
All instructions can be predicated Implementation Separate predicate register file All predicated instructions execute Fetch => Decode => Execute Conditional Commit stage Only instructions with predicate value ‘true’

5 Harp Predication (2) Compiler Support Example
If-conversion: Converts control dependencies into data dependencies Example Set predicate if (r1) { ++r2; } else { --r2; } %r1 @p0 ? addi %r2, %r2, #1 @p0 @p0 ? Subi %r2, %r2, #1 Inverse predicate

6 Predicate Value Test Instructions
Harp Predication (3) Predicate Value Test Instructions %src %src %src Predicate Manipulation Instructions @src0

7 Harp Predication (4) Advantages Limitations No branching overhead
Simple microarchitecture Limitations If-conversion is not always possible e.g. loops, indirect branches Inefficient with unanimous branches Both paths are always executed

8 Hardware stack management Compiler support
Harp Split-Join ISA Support @p split: partition a warp using predicate mask, each subset taking different target join: merge partitioned subset into single execution block Implementation Hardware stack management Compiler support

9 Harp Split-Join (2) Example Set predicate NPC mask rtop @p0, %r1
@p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

10 Harp Split-Join (2) Example push PC and mask onto HW stack NPC mask @2
1001 @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

11 Harp Split-Join (2) Example Execute threads with ‘true’ predicate NPC
mask @2 1001 @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

12 Harp Split-Join (2) Example Execute threads with ‘true’ predicate NPC
mask @2 1001 @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

13 Harp Split-Join (2) Example Pop HW stack and jmp to @2 NPC mask @7
0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

14 Harp Split-Join (2) Example Execute threads with ‘false’ predicate NPC
mask @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

15 Harp Split-Join (2) Example Execute threads with ‘false’ predicate NPC
mask @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

16 Harp Split-Join (2) Example Execute threads with ‘false’ predicate NPC
mask @7 0110 %r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

17 Harp Split-Join (2) Example Pop HW stack and jmp to @7 NPC mask
%r1 @p0 ? split @p0 ? jmp then subi %r2, %r2, #1 jmp next then: addi %r2, %r2, #1 next: join if (r1) { ++r2; } else { --r2; }

18 Harp Split-Join (2) Advantages Challenges
Efficient with unanimous branches Only a single path is executed The active mask turns off inactive threads Challenges Complex microarchitecture HW stack manager Split-jmp-Join overhead

19 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Mini Harp Minimal ISA Word encoding Integers only A single predicate register No Split-Join No warps creation No interrupts No virtual addressing Instructions Set Nop, Add, Sub, And, Or, Xor, Not, Shr, Shl, Ld, St, Jmp, Jal, Bar Configuration Register size, warp size, number of warps Chapter 1 — Computer Abstractions and Technology

20 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Code base Shared header Common.h // common includes and definitions Utility Library utils.cpp/h // utility functions Core classes mem.cpp/h // memory lrucache.cpp/h // cache Instr.cpp/h // instruction decode. cpp/h // decoder regfile.h // register file warp.cpp/h // warp unit core.cpp/h // processor core Chapter 1 — Computer Abstractions and Technology

21 Assignment 4: Core Initialization
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Core Initialization Program RAM Core Construction Console output Load/Store Unit ICache & DCache IDecoder Warps Chapter 1 — Computer Abstractions and Technology

22 Assignment 4: Memory Layout
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Memory Layout console RAM Chapter 1 — Computer Abstractions and Technology

23 Assignment 4: Warp Initialization
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Warp Initialization Warp Construction GP Registers Pred Registers Boot enable Chapter 1 — Computer Abstractions and Technology

24 Assignment 4: Warp Execute
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Warp Execute Step Function Pipeline stages Fetch Decode Chapter 1 — Computer Abstractions and Technology

25 Assignment 4: Warp Execute (2)
Morgan Kaufmann Publishers April 3, 2019 Assignment 4: Warp Execute (2) Execution Instructions Predication Jump instruction Set predicate Add your code! Chapter 1 — Computer Abstractions and Technology

26 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Clone Instruction Format clone %src0 Operation Copy current lane registers into %src0 lane. Register %src0 holds the destination lane index. e.g. ldi %r0, #2 clone %r0 # copy current registers into 3rd lane. Chapter 1 — Computer Abstractions and Technology

27 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Barrier Instruction Format bar %src0, %src1 Operation Synchronize %src1 number of warps with barrier identifier %src0. Register %src0 holds the barrier id (supported max value is 3). Register %src1 holds the number of warps to wait on. e.g. ldi %r0, #1 ldi %r1, # 2 bar %r0, %r1 # insert a size-2 named barrier with id=1 Chapter 1 — Computer Abstractions and Technology

28 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: Testing Emulator command line ./miniharp.out –r #regs –t #threads –w #warps –o #output Sample programs $ ./miniharp.out hello.bin -t 4 -w 1 -r 8 -o output.log $ ./miniharp.out sum.bin -t 4 -w 1 -r 8 -o output.log $ ./miniharp.out barrier.bin -t 4 -w 4 -r 8 -o output.log Output format “<Program Output>” “Instruction Count: <?>” Chapter 1 — Computer Abstractions and Technology

29 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: runtime.s Print Hex Print String Print NewLine Chapter 1 — Computer Abstractions and Technology

30 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: hello.s Load string Call prints Exit String data Chapter 1 — Computer Abstractions and Technology

31 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: sum.s Clone Registers Parallel Call Print result0 Array data Output address Chapter 1 — Computer Abstractions and Technology

32 Morgan Kaufmann Publishers
April 3, 2019 Assignment 4: barrier.s Start new Warp Barrier Single warp Print results Chapter 1 — Computer Abstractions and Technology

33 Questions? Questions?


Download ppt "HARP Control Divergence & Assignment 4"

Similar presentations


Ads by Google