Download presentation
Presentation is loading. Please wait.
Published byMary Lawson Modified over 9 years ago
1
155 South 1452 East Room 380 Salt Lake City, Utah 84112 1-801-585-1233 This research was sponsored by the National Nuclear Security Administration under the Accelerating Development of Retrofitable CO2 Capture Technologies through Predictivity program through DOE Cooperative Agreement DE-NA0000740 Todd Harman Department of Mechanical Engineering Jeremy Thornock Department of Chemical Engineering Isaac Hunsaker Graduate Student Department of Chemical Engineering
2
Year 2: Demonstration of a fully-coupled problem using RMCRT within ARCHES. Scalability demonstration.
3
CFD: Finest level, (always) RMCRT: 1 Level: CFD 2 Level: coarsest level “Data Onion”: finest level, Research Topic: Region of Interest (ROI)
4
2 Levels
5
3-Levels
6
Implemented Research Topic: ROI location? Static: User defined region?
7
Implemented Research Topic: ROI location Dynamic: ROI computed every timestep? (abskg sigmaT 4 ) ROI proportional to the size of fine level patches?
8
80% Complete: Data Onion, dynamic & static region of interests. Testing phase, need benchmarks. 90% Complete: Integration of RMCRT tasks within ARCHES (2 level)
9
Single Level Verification Order of accuracy # rays (old) grid resolution Scalability studies, new mixed scheduler. 2 Levels verification Errors associated with coarsening
10
S. P. Burns and M.A Christon. Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numerical Heat Transfer, Part B, 31(4):401-421, 1997. Initial Conditions: - Uniform temperature field - Analytical function for absorption coefficient
11
S. P. Burns and M.A Christon. Spatial domain-based parallelism in large-scale, participating-media, radiative transport applications. Numerical Heat Transfer, Part B, 31(4):401-421, 1997.
13
4X error from coarsening abskg
14
Coarsening: smoothing filter Error Abskg
15
Leverage the work of Dr. Berzin’s team Hybrid MPI-threaded Task Scheduler (Qingyu Meng) GPU-RMCRT (Alan Humphrey)
16
Hybrid MPI-threaded Task Scheduler*: Memory reduction! 13.5Gb -> 1GB per node (12 cores/node)*. ( 2 material CFD problem, 2048 3 cells, on 110592 cores of Jaguar) Interconnect drivers and MPI software must be threadsafe. RMCRT requires an MPI environmental variable expert! *Q. Meng, M. Berzins, and J. Schmidt, Using hybrid parallelism to improve memory use in uintah. In Proceeding of the Teragrid 2011.
17
Kraken 100 rays per cell
18
Difficult to run on Kraken, crashing in mvapich Further testing needed on bigger machines?
19
Motivation - Utilize all available hardware Uintah’s asynchronous task-based approach is well suited to take advantage of GPUs RMCRT is ideal for GPUs Keeneland Initial Delivery System 360 GPUs DoE Titan 1000s of GPUs Nvidia M2070/90 Tesla GPU Multi-core CPU +
20
Offload Ray Tracing and RNG to GPU(s) Available CPU cores can perform other computation. Uintah infrastructure supports GPU task scheduling and execution: Can access multiple GPUs on-node Uses Nvidia CUDA C/C++ Using NVIDIA cuRAND Library GPU-accelerated random number generation (RNG)
21
Create & schedule CPU & GPU tasks Enables Uintah to “pre-fetch” GPU data Uintah infrastructure manages: Queues of CUDA Stream and Event handles Device memory allocation and transfers Utilize all available: CPU cores and GPUs
22
Capability jobs run on: Keeneland Initial Delivery System (NICS) 1440 CPU cores & 360 GPUs simultaneously Jaguar - GPU partition (OLCF) 15360 CPU cores & 960 GPUs simultaneousl Development of GPU RMCRT prototype underway.
23
Head-to-head comparison of RMCRT with Discrete Ordinates Method. Single level. Accuracy versus computational cost. 2 Levels: Coarsening error for variable temperature and radiative properties. Data Onion: Serial performance Accuracy versus number of levels, refinement ratio, dynamic/static ROI. Scalability Studies
24
Order of Accuracy: # rays 0.5, grid Cells 1 Accuracy issues related to coarsening data. Cost = f( #rays, Grid Cells 1.4-1.5 communication….) Doubling the grid resolution = 20ish X increase in cost. Good scalability characteristics Year 2: Demonstration of a fully-coupled problem using RMCRT within ARCHES. Scalability demonstration.
25
Acknowledgements: DoE for funding the CSAFE project from 1997-2012, DOE NETL, DOE NNSA, INCITE NSF for funding via SDCI and PetaApps Keeneland Computing Facility, supported by NSF under Contract OCI-0910735 Oak Ridge Leadership Computing Facility – DoE Jaguar XK6 System (GPU partition) http://www.uintah.utah.edu
26
Isotropic scattering added to the model Verification testing performed using an exact solution (Siegel, 1987) Grid convergence analysis performed Discrepancy diminishes with increased mesh refinement
27
Seigel, R. “Transient Radiative Cooling of a droplet-filled layer,” ASME Journal of Heat Transfer,109:159-164, 1987. Benchmark Case of Seigel 1987 Cube (1m 3 ) Uniform Temperature 64.7K Mirror surface on all sides Black top and bottom walls Computed surface fluxes on top & bottom walls 10 rays per cell (low)
28
Radiative Flux vs Optical Thickness Seigel, R. “Transient Radiative Cooling of a droplet-filled layer,” ASME Journal of Heat Transfer,109:159-164, 1987. RMCRT (dots) Exact solution (lines)
29
Grid convergence of the L1 error norms where the scattering coefficient is 8 m -1, and the absorption coefficient is 2m -1.
30
IFRF burner simulation (production size run) 1344 processors/cores Initial conditions taken from a previous run with DOM. Domain: (1m x 4.11 m x 1m) Resolution: (4.4mm x 8.8mm x 4.4mm) 24 million cells
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.