Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY EXAMPLE: ADDING NEW INSTRUCTIONS - PREFETCH 1.

Similar presentations


Presentation on theme: "SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY EXAMPLE: ADDING NEW INSTRUCTIONS - PREFETCH 1."— Presentation transcript:

1 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY EXAMPLE: ADDING NEW INSTRUCTIONS - PREFETCH 1

2 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY New PTX Instructions: Prefetch, Prefetchu Support new instructions: prefetch.global.L1 [address] prefetchu.L1 [address] Modify PTX Internal Representation Add Parser and Emitter Support Implement instruction for devices PTX Emulator NVIDIA GPU Other devices as needed 2

3 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Modify PTX Internal Representation: add opcodes class PTXInstruction { enum Opcode { Prefetch, Prefetchu, }; enum CacheLevel { L1, L2, CacheLevel_invalid }; static std::string toString(CacheLevel cache); CacheLevel cacheLevel; }; 3 Add opcodes and modifiers ocelot/ir/interface/PTXInstruction.h

4 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Modify PTX Internal Representation: add emitters Add new opcodes and modifiers to emitter ocelot/ir/implementation/PTXInstruction.cpp 4 std::string ir::PTXInstruction::toString(CacheLevel cache) { switch (cache) { L1: return "L1"; L2: return "L2"; default: return “”; } std::string ir::PTXInstruction::toString() { switch (opcode) { case Prefetch: { return guard() + "prefetch." + PTXOperand::toString(addressSpace) + "." + PTXInstruction::toString(cacheLevel) + " " + d.toString(); } case Prefetchu: { return guard() + "prefetchu.L1 " + d.toString(); }

5 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Modify PTX Internal Representation: validation Check for valid address space, cache level, and address mode of address operand ocelot/ir/implementation/PTXInstruction.cpp 5 std::string ir::PTXInstruction::valid () { switch (opcode) { case Prefetch: { if (!(cacheLevel == L1 || cacheLevel == L2)) { return "cache level must be L1 or L2"; } if (!(addressSpace == Local || addressSpace == Global)) { return "address space must be.local or.global, not " + toString(addressSpace); } if (!(d.addressMode == PTXOperand::Indirect || d.addressMode == PTXOperand::Address || d.addressMode == PTXOperand::Immediate)) { return "address mode of destination operand must be Indirect, Address, or Immediate. Not " + PTXOperand::toString(d.addressMode); } } break; }

6 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Add Parser Support ptx.ll PTXParser ptxgrammar.yy 6 ocelot/ parser/ -- parser (to PTX IR)

7 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Add Parser Support: lexical analysis Define lexical analysis rules for opcode and cache level tokens ocelot/parser/implementation/ptx.ll 7 ".L1" { yylval->value = TOKEN_L1; return TOKEN_L1; } ".L2" { yylval->value = TOKEN_L2; return TOKEN_L2; } "prefetch" { sstrcpy( yylval->text, yytext, 1024 ); \ return OPCODE_PREFETCH; } "prefetchu" { sstrcpy( yylval->text, yytext, 1024 ); \ return OPCODE_PREFETCHU; }

8 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Add Parser Support: modify class PTXParser Enhance class PTXParser (translates tokens to PTX IR) ocelot/parser/implementation/PTXParser.cpp 8 ir::PTXInstruction::CacheLevel PTXParser::tokenToCacheLevel(int token) { switch (token) { case TOKEN_L1: return ir::PTXInstruction::L1; case TOKEN_L2: return ir::PTXInstruction::L2; default: break; } return ir::PTXInstruction::CacheLevel_invalid; } void PTXParser::State::cacheLevel(int token ) { statement.instruction.cacheLevel = tokenToCacheLevel(token); } ocelot/parser/implementation/PTXParser.cpp class PTXParser { class State { void cacheLevel(int token ); } ir::PTXInstruction::CacheLevel tokenToCacheLevel(int token); }

9 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Add Parser Support: modify class PTXParser Translate new opcodes from string to enum ir::PTXInstruction::Opcode ocelot/parser/implementation/PTXParser.cpp 9 ir::PTXInstruction::Opcode PTXParser::stringToOpcode( std::string string ) { if( string == "prefetch" ) return ir::PTXInstruction::Prefetch; if( string == "prefetchu" ) return ir::PTXInstruction::Prefetchu; }

10 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Add Parser Support: modify PTX grammar Define parse rules for prefetch and prefetchu instructions ocelot/parser/implementation/ptx.ll 10 %token OPCODE_PREFETCH OPCODE_PREFETCHU %token TOKEN_L1 TOKEN_L2 instruction :.... | prefetch | prefetchu |.... cacheLevel : TOKEN_L1 | TOKEN_L2 { state.cacheLevel( $ 1 ); }; prefetch : OPCODE_PREFETCH addressSpace cacheLevel '[' memoryOperand ']' ';' { state.instruction( $ 1 ); }; prefetchu : OPCODE_PREFETCHU cacheLevel '[' memoryOperand ']' ';' { state.instruction( $ 1 ); };

11 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Supported devices: PTX Emulator Add methods to evaluate prefetch and prefetchu instructions ocelot/executive/interface/CooperativeThreadArray.h 11 void executive::CooperativeThreadArray::execute(const ir::Dim3& block) { do { PTXInstruction & instr = instructions[PC]; switch (instr.opcode) { case PTXInstruction::Prefetch: eval_Prefetch(context, instr); break; case PTXInstruction::Prefetchu: eval_Prefetchu(context, instr); break; } } while (running); } ocelot/executive/implementation/CooperativeThreadArray.cpp class CooperativeThreadArray { void eval_Prefetch(CTAContext &context, const ir::PTXInstruction &instr); void eval_Prefetchu(CTAContext &context, const ir::PTXInstruction &instr); }

12 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Supported devices: PTX Emulator Implement methods to evaluate prefetch and prefetchu ocelot/executive/implementation/CooperativeThreadArray.cpp 12 void executive::eval_Prefetch(CTAContext &context, const ir::PTXInstruction &instr) { currentEvent.memory_size = 1; for (int threadID = 0; threadID < threadCount; threadID++) { if (!context.predicated(threadID, instr)) { continue; } const char *source = 0; switch (instr.d.addressMode) { case PTXOperand::Indirect: source += getRegAsU64(threadID, instr.d.reg); break; case PTXOperand::Address: case PTXOperand::Immediate: source += instr.d.imm_uint; break; default: throw RuntimeException("unsupported", context.PC, instr); } source += instr.d.offset; currentEvent.memory_addresses.push_back((ir::PTXU64)source); } trace(); }

13 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Example: PTX Emulator Sample PTX kernel augmented with prefetch instruction 13 __global__ void sequence(int *A) { A[threadIdx.x] *= 2; }.entry sequence(.param.u64 param) {.reg.s32 %r ;.reg.s64 %rl ; ld.param.u64 %rl1, [param]; cvta.to.global.u64 %rl2, %rl1; mov.u32 %r1, %tid.x; mul.wide.u32 %rl3, %r1, 4; add.s64 %rl4, %rl2, %rl3; prefetch.global.L1 [%rl4]; ld.global.u32 %r2, [%rl4]; shl.b32 %r4, %r2, 1; st.global.u32 [%rl4], %r4; ret; } virtual void event(const trace::TraceEvent & event){ if (event.instruction->opcode == ir::PTXInstruction::Prefetch) { std::cout toString() << "\n"; trace::TraceEvent::U64Vector::const_iterator address = event.memory_addresses.begin(); for (int tid = 0; tid < event.active.size(); tid++) { std::cout << " t" << tid << " - 0x" << std::hex << *address << std::dec << "\n"; ++address; } prefetch.global.L1 [%r4] t0 - 0x1e16800 t1 - 0x1e16804 t2 - 0x1e16808 t3 - 0x1e1680c

14 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Supported devices: NVIDIA GPU No additional support required prefetch and prefetchu instructions do not produce values PTX emitter is sufficient to execute on native GPU ir::PTXInstruction::toString( ) ir::PTXInstruction::valid( ) To support other devices Multicore CPU: Add translation rules to PTX-to-LLVM translator Target LLVM prefetch intrinsics AMD GPU: Depends on support from CAL IL 14

15 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Example Application 15 int main() { TraceGen traceGen; ocelot::addTraceGenerator(traceGen, true); const int N = 4; int *devPtr; cudaMalloc((void **)&devPtr, sizeof(int)*N); std::ifstream ptxFile("example-sequence.ptx"); ocelot::registerPTXModule(ptxFile, "example.ptx"); cudaConfigureCall(dim3(1,1), dim3(N, 1)); cudaSetupArgument(&devPtr, sizeof(int *), 0); ocelot::launch("example.ptx", "sequence"); cudaFree(devPtr); return 0; } #include class TraceGen: public trace::TraceGenerator { public: virtual void event(const trace::TraceEvent & event){ if (event.instruction->opcode == ir::PTXInstruction::Prefetch) { std::cout toString() << "\n"; } };

16 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Questions? 16


Download ppt "SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY EXAMPLE: ADDING NEW INSTRUCTIONS - PREFETCH 1."

Similar presentations


Ads by Google