Download presentation
Presentation is loading. Please wait.
Published bySofia Beach Modified over 10 years ago
1
kernel1 >>() B0 B1 Grid 1 Serial code kernel2 >>() B2 B3 B4 B5 B0 B1 Grid 2 B2 B3 B4 B5 CPUGPU Thread Block Time Serial code Thread synchronization
2
CPU/GPU CPUGPU Host Memory Device Memory Host Pinned Memory Shared Memory Memory Transfer
3
kernel1 >>() Grid 1 CPUGPU Time B0 B1 B2 B3 B4 B5 Pinned Memory Serial code
4
Marco Esposito MiceninA/A 2010/20114/21 kernel1 >>() Grid 1 CUQU::push() CPUGPU Time Serial code B0 B1 B2 B3 B4 B5 B0 B1 B2 B3 B4 B5 CUQU::fetch() Pinned Memory
5
kernel1 >>() Grid 1 CUQU::fetch() CPUGPU Time Serial code Pinned Memory barrier_wait()
6
Offload Time Computation Time Synchronization Time
7
Marco Esposito MiceninA/A 2010/20117/21 Sincronizzazione thread GPU 12 3 KSM-implicit KSM-explicit CSM-oneloop & CSM-lockfree for(…) { kernel >>(); } for(…) { kernel >>(); cudaThreadSync(); } __global__ void csm_kernel() { for(…) { compute(); barrier_wait(); } Time barrier_wait()
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.