© Crown copyright Met Office Met Office Unified Model I/O Server Paul Selwood
© Crown copyright Met Office I/O Server motivation
© Crown copyright Met Office Some History… I/O has always been a problem for NWP, more recently for climate ~2003 – application level output buffering ~2008 – very simple, single threaded I/O servers added for benchmarking Intercepted low-level “open/write/close” Single threaded Some benefit, but limited Not addressed scaling issues – message numbers
© Crown copyright Met Office Old UM I/O – Restart Files
© Crown copyright Met Office Old UM I/O - Diagnostics
© Crown copyright Met Office Why I/O Server approach? Full parallel I/O difficult with our packing “Free” CPUs available “Spare” memory available Chance to re-work old infrastructure Our file format is neither GRIB or netCDF.
© Crown copyright Met Office Diagnostic flexibility Variables (primary and derived) Output times Temporal processing (e.g. accumulations, extrema, means) Spatial processing (sub-domains, spatial means) Variable to unit mapping Basic output resolution is a 2D field
© Crown copyright Met Office Key design decisions Parallelism over output streams Output streams distributed over servers Server is threaded “Listener” receives data & puts in queue “Writer” processes queue including packing Ensures asynchronous behaviour Shared FIFO queue Preserves instruction order Metadata/Data split Data initially stored on compute processes Data of same type combined into large messages
© Crown copyright Met Office Parallelism in I/O Servers Multiple I/O streams in typical job I/O servers spread among nodes Can utilise more memory Will improve bandwidth to disk
© Crown copyright Met Office Automatic post-processing Model can trigger automatic post-processing Requests dealt with by I/O Server FIFO queue ensures integrity of data
© Crown copyright Met Office How data gets output ComputeI/O ListenerWriter Thread 0 Thread 1
© Crown copyright Met Office I/O Server development Initial version – Synchronous data transmission Asynchronous diagnostic data Asynchronous restart data Amalgamated data Asynchronous metadata Load balancing Priority messages with I/O Server
© Crown copyright Met Office Lots of diagnostic output Which processes are I/O servers “Stall” messages Memory log Timing log Full log of metadata / queue All really useful for tuning!
© Crown copyright Met Office Lots of tuneable parameters… Number and spacing of I/O servers Memory for I/O servers Number of local data copies Number of fields to amalgamate Load balancing options Timing tunings + standard I/O tunings (write block size) etc
© Crown copyright Met Office Overloaded servers
© Crown copyright Met Office I/O Servers keeping up!
© Crown copyright Met Office MPI considerations Differing levels of MPI threading support Best with MPI_THREAD_MULTIPLE OK with MPI_THREAD_FUNNELED MPI tuning Want metadata to go as quickly as possible Want data transfer to be truly asynchronous Don’t want to interfere with model comms (e.g. halo exchange) Currently use 19 environment variables!
© Crown copyright Met Office Deployment July 2011 – Operational global forecasts January 2012 – Operational LAM forecasts February 2012 – High resolution climate work Not currently used in Operational ensembles Low resolution climate work Most research work
© Crown copyright Met Office Global Forecast Improvement QG 00/12 QG 06/18 QU Time 777s559s257s %age 19%28%27% Total saving: over 21 node-hours per day
© Crown copyright Met Office Impact on High Resolution Climate N512 resolution AMIP 59 GB restart dumps Modest diagnostics Cray XE6 with up to 9K cores All “in-run” output hidden Waits for final restart dump Most data buffered on client side
© Crown copyright Met Office Current and Future Developments MPI Parallel I/O servers Multiple I/O servers per stream Gives more memory per stream on server Reduced messaging rate per node Parallel packing Potential for parallel I/O Read ahead Potential for boundary conditions / forcings Some possibilities for initial condition
© Crown copyright Met Office Parallel I/O server improvement Before After
© Crown copyright Met Office Questions and answers