Download presentation
Presentation is loading. Please wait.
Published byGwendolyn James Modified over 9 years ago
1
PetaApps: Update on software engineering and performance J. Dennis M. Vertenstein N. Hearn
2
Code Base Update Trunk+ means ccsm4 release code + IE mods scripts – trunk+ (just in) – fixes build problem inherent in alpha38+ cice – trunk+ – has weighted space filling curves and restarts on tripole grid working – has OpenMP threading capability – has PIO for history and restarts (netcdf) – has multi-frequency history capability (1 file per day)
3
Code Base Update (con’t) pop - alpha38+ – has fix to tripole grid problem and restarts are working – has multi-frequency history capability (1 file per day) – TO DO: migrate time series capability from trunk onto alpha38+ – TO DO: migrate PIO capability from trunk onto alpha38+ – TO DO: OpenMP threading capability is not functional (ORNL working on this)
4
Code Base Update (con’t) cam - alpha38+ – TO DO: migrate cam to cam trunk- will then get pio – (almost done by Nathan) clm - alpha38+ drv - alpha38+ – interactive ensembles for atm functional – TO DO: Interactive ensembles for ice in progress – TO DO: Migrate driver to the head of the trunk - where interactive ocean ensembles have been implemented
5
Interactive Ensemble Runs Update TO DO: Finish validation of 2 degree atm/ 1 degree ocean interactive ensembles – POP convergence problem at year 150 for low-res IE - Reduce pop time step Problem with branch/hybrid start for IE from HRC03 – Demonstrated functionality with a 10 member atm ensemble for high-res Execute high-res interactive ensemble run
6
Status of TRAC allocation CategoryBudgetConsumedPercentage remaining Notes Development1.98M1.54M22% HR-control run17.85M9.63M46%101 of 200 years LRIE3.45M0.51M85%155 years of validation run HRIE11.72M0.0M100% Total35.0M11.7M66.6%Allocation is 6 months old
7
Experiences on Kraken Somewhat behind on cycle usage Highly variable Disk I/O performance ~18x – Use little-endian binary writes avoids performing 4K to file system Job performance dependent on node mapping – Some jobs are ~20% slower [excludes I/O]
8
8 White = Ice only Blue = Ocean Green = Land Job Placement of CCSM within the Torus Red = Atmosphere & Ice Courtesy of Nick Jones
9
9
10
Experiences on Kraken Somewhat behind on cycle usage Highly variable Disk I/O performance ~18x – Use little-endian binary writes avoids performing 4K to file system Job performance dependent on node mapping – Some jobs are ~20% slower [excludes I/O] Friendly User access – Invaluable for development effort – Now can run < 1GB per core – Multi-frequency support in CICE, POP – Hex-core improves CCSM performance
11
Kraken Upgrade Started August 1 th October 5 th – OS upgrade Significant increase in job failures [1/3 of all jobs failed] – Subset of nodes upgraded to Hex-core Queue wait became excessive Friendly user access
12
Queue access on Kraken DateRunningQueued MAY172 hours [80%]44 hours [20%] JUNE317 hours [87%]49 hours [13%] JULY471 hours [91%]47 hours [9%] AUGUST [first 2 weeks]144 hours [36%]253 hours [64%] TOTAL1104 hours [74%]393 hours [26%]
13
Kraken Upgrade Started August 1 th October 5 th – OS upgrade Significant increase in job failures [1/3 of all jobs failed] – Subset of nodes upgraded to Hex-core Queue wait became excessive Friendly user access – Entire system down for upgrade Access to Athena Friendly user access What changed? – CPU: quad-core to hex-core [12 core per node] Improved memory controller – Memory: All nodes to 16 GB per node (1.3GB per core)
14
Simulation cost [HRC03] CCSM(1,1,1,1) @ f0.5_tx0.1v2 on 5848 cores – Monthly output [Historical perspective] First time [ATLAS]140K per year [0.8 SYPD] early 2008 NERSC [XT4]100K per year [1.3 SYPD] fall 2008 Budgeted [XT4]89K per year [1.6 SYPD] early 2009 Actual [XT5]81K per year [1.8 SYPD] summer 2009 Measured [XT5] 65K per year [2.1 SYPD]fall 2009 – upgraded Hex-core system – Small user group – Monthly + Daily output Measured:91K per year [1.6 SYPD] – Observations Time to complete additional 100 years [61 days wall-clock]
15
Simulation cost (con’t) CCSM(10,1,1,1) @ f0.5_tx0.1v2 on 7434 cores – Monthly + Daily output Budgeted:234K per year Measured: 120K per year [1.5 SYPD] – On Cray XT4 – Observations Significantly cheaper than budgeted!! Implied start times: mid January 2010 [41 days wall-clock]
16
ATM-IE performance on 7434 cores on Cray XT4 ATM on 480 cores per ensemble (10 members) Problem in CPL7 currently limits parallelism to 2000 1.5 SYPD 120K per year
17
Simulation cost (con’t) CCSM(10,1,10,1) @ f0.5_tx0.1v2 on 6000 cores – ICE-IE is still being tested/developed – Monthly + Daily output Budgeted:234K per year [0.8 SYPD] – Observations Implied start times: – December 1 st, 2009 [79 days wall-clock]
18
Resource requirements: TRAC1 ExperimentResolutionYears Total (daily) Archive (TB)CPU hours (M) CCSM(1,1,1,1)f0.5_t0.1v2200(10)61.815.64 CCSM(10,1,1,1)f0.5_t0.1v250(10)27.113.82 CCSM(1,1,1,1)f0.5_gx1v5200(10)5.5 CCSM(10,1,1,1)f0.5_gx1v550(10)11.1 CCSM(1,1,10,1)f0.5_gx1v550(10)2.2 CCSM(10,1,10,1)f0.5_gx1v550(10)6.7 CCSM(6,1,1,1)f1.9_gx1v52002.7 TOTAL117.1
19
Resource requirements: TRAC2 ExperimentResolutionYears Total (daily) Archive (TB)CPU hours (M) CCSM(1,1,10,1)f0.5_t0.1v250(10)62.117.8 CCSM(10,1,10,1)f0.5_t0.1v250(10)48.517.8 CCSM(1,10,1,1)f0.5_gx1v550(10)6.3 CCSM(10,10,1,1)f0.5_gx1v550(10)15.6 CCSM(10,10,10,1)f0.5_gx1v550(10)16.1 TOTAL148.7 Ice IE experiment moved to second year
20
Resource requirements: PRAC ExperimentResolutionYears Total (daily) Archive (TB)CPU hours (M) CCSM(1,10,1,1)f0.5_tx0.1v250(10)120.2 CCSM(10,10,1,1)f0.5_tx0.1v250(10)129.5 CCSM(10,10,10,1)f0.5_tx0.1v250(10)173.9 SP-CCSM(1,1,1,1)f0.9_tx0.1v250(10)21.8 TOTAL 445.4
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.