Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kernel1 >>() B0 B1 Grid 1 Serial code kernel2 >>() B2 B3 B4 B5 B0 B1 Grid 2 B2 B3 B4 B5 CPUGPU Thread Block Time Serial code Thread synchronization.

Similar presentations


Presentation on theme: "Kernel1 >>() B0 B1 Grid 1 Serial code kernel2 >>() B2 B3 B4 B5 B0 B1 Grid 2 B2 B3 B4 B5 CPUGPU Thread Block Time Serial code Thread synchronization."— Presentation transcript:

1 kernel1 >>() B0 B1 Grid 1 Serial code kernel2 >>() B2 B3 B4 B5 B0 B1 Grid 2 B2 B3 B4 B5 CPUGPU Thread Block Time Serial code Thread synchronization

2 CPU/GPU CPUGPU Host Memory Device Memory Host Pinned Memory Shared Memory Memory Transfer

3 kernel1 >>() Grid 1 CPUGPU Time B0 B1 B2 B3 B4 B5 Pinned Memory Serial code

4 Marco Esposito MiceninA/A 2010/20114/21 kernel1 >>() Grid 1 CUQU::push() CPUGPU Time Serial code B0 B1 B2 B3 B4 B5 B0 B1 B2 B3 B4 B5 CUQU::fetch() Pinned Memory

5 kernel1 >>() Grid 1 CUQU::fetch() CPUGPU Time Serial code Pinned Memory barrier_wait()

6 Offload Time Computation Time Synchronization Time

7 Marco Esposito MiceninA/A 2010/20117/21 Sincronizzazione thread GPU 12 3 KSM-implicit KSM-explicit CSM-oneloop & CSM-lockfree for(…) { kernel >>(); } for(…) { kernel >>(); cudaThreadSync(); } __global__ void csm_kernel() { for(…) { compute(); barrier_wait(); } Time barrier_wait()


Download ppt "Kernel1 >>() B0 B1 Grid 1 Serial code kernel2 >>() B2 B3 B4 B5 B0 B1 Grid 2 B2 B3 B4 B5 CPUGPU Thread Block Time Serial code Thread synchronization."

Similar presentations


Ads by Google