Download presentation
Presentation is loading. Please wait.
1
CyberShake Study 16.9 Discussion
Scott Callaghan Southern California Earthquake Center
2
CyberShake background
Physics-based probabilistic seismic hazard analysis Considers ~500,000 earthquakes for each site of interest Combines shaking from each earthquake with probability of earthquake to produce hazard curves Hazard curves from multiple sites interpolated to produce hazard map Hazard curve for downtown LA Level of shaking with 2% chance in 50 yrs
3
Past Study Study 15.4 (April 16 – May 24, 2015)
Performed hazard calculations for 336 locations using Titan and Blue Waters Used 12.9M SUs Generated about 415 TB of data on Titan Used pilot jobs for resource provisioning Daemon running on head node submitted pilot jobs when jobs in workflow queue Extra overhead: idle time Extra complexity: pilot job dependencies
4
Study 16.9: Science Goals Expand CyberShake to Central California
438 new CyberShake sites Generate hazard results using two different velocity (earth) models 1D ‘baseline’ model 3D model constructed using tomography
5
Study 16.9: Computational Plan
438 sites x 2 velocity models = 876 runs Using both Titan and Blue Waters End-to-end workflows on both systems No intermediate data transfer to Blue Waters Transfer of final data products back to USC for post-processing and archival storage Each run is a separate workflow Will assign workflows to systems dynamically
6
Workflows on Titan Using new workflow approach developed by Pegasus- WMS group, rvgahp (reverse GAHP) Server daemon runs on Titan login node Makes SSH connection back to workflow submission host, starts proxy process on submission host When workflow jobs submitted to Condor-G queue, job is given to proxy which forwards it to the server for execution Push paradigm, so avoids overhead and complexity of pilot job approach Successfully tested on Titan Investigating Rhea for small processing jobs
7
Study 16.9: Technical Requirements
Estimated runtime: 5 weeks, based on Study 15.4 Compute time Per site: 1940 node-hrs Strain Green Tensors: 400 node-hrs 2 GPU jobs x 200 nodes x 1 hour Seismogram synthesis: 1440 node-hrs 1 CPU job x 240 nodes x 6 hours Other jobs: ~100 node-hrs Total time: 15.9M SUs (~70% of allocation) Storage Purged: 400 TB SGTs TB output data products Will clean up as we go
8
Topics for Discussion Proxy certificate issues Quota increase
Change in OLCF certificate policy since last study No longer able to authenticate remotely to OLCF data transfer nodes Required to transfer files, make directories Quota increase Plan to clean up as we go, but unsure what the high-water mark will be Request project increase to 300 TB Priority bump Possible to get boost in queue for jobs?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.