Download presentation
Presentation is loading. Please wait.
1
More Charm++/TAU examples Applications: NAMD Parallel Framework for Unstructured Meshing (ParFUM) Features: Profile snapshots: Captures the runtime of the application by segregating it into user specified intervals CUDA Profiling Tracks time spent in CUDA kernel routines Shows scaling behavior for a experiment varying the number of devices used.
2
Load Balancing Phases NAMD Snapshot Profile of over 800sec on 2048 processors Mean Exclusive Time Standard Deviation enqueneSelfB enqueneSelfA Main enqueneWorkB enqueneWorkA Idle
3
NAMD CUDA events GPU efficiency gained by doubling the number of GPU from 16 to 32. These Events are broken down by routine and by device number. Device #0 ~100% efficiency ~50% efficiency
4
NAMD CUDA scaling Non-Bonded Calculations Sum Forces Calculations Scaling by event and device number, Non-Bonded Calculations scale well. Sum Forces less well but the overall time is only a few microseconds. Number of Devices Scaling Efficiency
5
ParFUM CUDA speedup Single CPU or GPU Performance on a 128x8x8 mesh. When run with GPU acceleration enabled ParFUM spent 9 seconds in the CUDA Kernel routines.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.