Implementation of Efficient Check-pointing and Restart on CPU - GPU

Implementation of Efficient Check-pointing and Restart on CPU - GPU
Sumanth Suraneni Sharath Prasad Harsha Sutaone 9/17/2018

Introduction GPU. CPU – GPU systems Checkpoint GPU on CPU-GPU
Restart from checkpoint on CPU 9/17/2018

Motivation CPU – GPUs general purpose workloads
Dependability an issue in future GPUs GPU fault tolerance is a nascent field Checkpointing implementations on GPUs are at application level We explore micro-architectural changes to GPUs 9/17/2018

Background OpenCL Programming Model Southern Islands Architecture
Multi2Sim CPU-GPU Simulator 9/17/2018

OpenCL Programming Model
9/17/2018

Simplified Mapping of OpenCL onto AMD Accelerated Parallel Processing 9/17/2018

Work-item Grouping into Work-groups and Wavefronts 9/17/2018

Southern Islands Architecture
9/17/2018

Compute Unit 9/17/2018

Kernel State 9/17/2018

Multi2Sim CPU-GPU Simulator
Software entities defined in the OpenCL Programming Model An ND-Range is formed of work-groups, which are, in turn, sets of work-items executing the same OpenCl C Kernel code 9/17/2018

Interaction between user code, OS-code, and hardware, comparing native and simulated environments 9/17/2018

Running an OpenCL Kernel on a Southern islands GPU Block Diagram of a Compute Unit 9/17/2018

Implementation SIEmuCreate() SIEmuRun() si_wavefront_execute()
Assign global memory List running and waiting work-groups SIEmuRun() Dequeue & Enqueue running work-groups and waiting work-groups Work-group create si_wavefront_execute() Instruction dump Next PC = Current PC + Instruction Size 9/17/2018

Implementation Checkpoint Implementation ND-Range : ID, work dimension, number of VGPRs & SGPRs used Work-group : ID, work-groups finished, wavefronts completed & at barrier, wavefront count Wavefront : ID, SREGs, execution state of wavefront, instruction count. Work-item : ID, VREGs, global memory access size & address 9/17/2018

Implementation LDS (Local Data Share) Global memory
LDS module of executing work-group. All pages are stored. Global memory Stored until global memory top. 9/17/2018

Implementation Completed Work-groups
Store the list of finished work-groups in a file. Unexecuted Wavefronts during checkpoint Store into a separate file while writing the checkpoint file. Read from the file to start execution during restart. 9/17/2018

Implementation Checkpoint Checkpoint Trace 9/17/2018

Implementation Restart Restart Trace 9/17/2018

Implementation Verification Strategy 9/17/2018

Evaluation Workgroups 9/17/2018

Evaluation Instruction Count 9/17/2018

Evaluation Checkpoint Size 9/17/2018

Evaluation LDS Comparison 9/17/2018

Bugs Encountered LDS misalignment. 9/17/2018

Bugs Encountered Unexecuted wave front during checkpoint 9/17/2018

Future Scope Further minimization of LDS snapshot
Keeping track of pages modified and storing only those Implementing a driver call to checkpoint Hardware Complexity of the implementation Compression algorithms during multiple checkpoints 9/17/2018

THANK YOU 9/17/2018

Implementation of Efficient Check-pointing and Restart on CPU - GPU

Similar presentations

Presentation on theme: "Implementation of Efficient Check-pointing and Restart on CPU - GPU"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Implementation of Efficient Check-pointing and Restart on CPU - GPU

Similar presentations

Presentation on theme: "Implementation of Efficient Check-pointing and Restart on CPU - GPU"— Presentation transcript:

Similar presentations

About project

Feedback