© Crown copyright Met Office Met Office Unified Model I/O Server Paul Selwood.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

© Crown copyright Met Office A New Radiative Time- stepping Scheme A short description of the time-stepping scheme test Peter Hill, 19 Feb 2008.
Experiment Workflow Pipelines at APS: Message Queuing and HDF5 Claude Saunders, Nicholas Schwarz, John Hammonds Software Services Group Advanced Photon.
1 Tuning for MPI Protocols l Aggressive Eager l Rendezvous with sender push l Rendezvous with receiver pull l Rendezvous blocking (push or pull)
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
CMPT 431 Dr. Alexandra Fedorova Lecture III: OS Support.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture III: OS Support.
CMPT 401 Dr. Alexandra Fedorova Lecture III: OS Support.
Processes Management.
Executional Architecture
Weed File System Simple and highly scalable distributed file system (NoFS)
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
Thoughts on Shared Caches Jeff Odom University of Maryland.
WSUS Presented by: Nada Abdullah Ahmed.
Parasol Architecture A mild case of scary asynchronous system stuff.
Distributed Multimedia Systems
Multiple Processor Systems
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Reference: Message Passing Fundamentals.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
1.1 Installing Windows Server 2008 Windows Server 2008 Editions Windows Server 2008 Installation Requirements X64 Installation Considerations Preparing.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Scuola Superiore Sant’Anna Project Assignments Operating Systems.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Активное распределенное хранилище для многомерных массивов Дмитрий Медведев ИКИ РАН.
Chapter 4.1 Interprocess Communication And Coordination By Shruti Poundarik.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
VTS INNOVATOR SERIES Real Problems, Real solutions.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
EGU 2011 TIGGE, TIGGE LAM and the GIFS T. Paccagnella (1), D. Richardson (2), D. Schuster(3), R. Swinbank (4), Z. Toth (3), S.
Module 12: Designing High Availability in Windows Server ® 2008.
Larry Marx and the Project Athena Team. Outline Project Athena Resources Models and Machine Usage Experiments Running Models Initial and Boundary Data.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Pursuing Faster I/O in COSMO POMPA Workshop May 3rd 2010.
DCE (distributed computing environment) DCE (distributed computing environment)
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Two Installing and Configuring Exchange Server 2003.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
1 Monday, 26 October 2015 © Crown copyright Met Office Computing Update Paul Selwood, Met Office.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Migration to Rose and High Resolution Modelling Jean-Christophe Rioual, CRUM, Met Office 09/04/2015.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.
Server to Server Communication Redis as an enabler Orion Free
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
NCEP ESMF GFS Global Spectral Forecast Model Weiyu Yang, Mike Young and Joe Sela ESMF Community Meeting MIT, Cambridge, MA July 21, 2005.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
NCAS Computational Modelling Service (CMS) Group providing services to the UK academic modelling community Output of UM Diagnostics Directly in CF NetCDF;
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 Scaling Up User Codes on the SP David Skinner, NERSC Division, Berkeley Lab.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
© Crown copyright Met Office Technical developments at the Met Office Matthew Glover.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Remote Procedure Calls
Achieving the Ultimate Efficiency for Seismic Analysis
BIMSB Bioinformatics Coordination
Storage elements discovery
University of Technology
Parallel NetCDF + MASS Development
Unified Model I/O Server : Recent Developments and Lessons
AWS Cloud Computing Masaki.
Lecture 15 Reading: Bacon 7.6, 7.7
How Yahoo! use to serve millions of videos from its video library.
Presentation transcript:

© Crown copyright Met Office Met Office Unified Model I/O Server Paul Selwood

© Crown copyright Met Office I/O Server motivation

© Crown copyright Met Office Some History… I/O has always been a problem for NWP, more recently for climate ~2003 – application level output buffering ~2008 – very simple, single threaded I/O servers added for benchmarking Intercepted low-level “open/write/close” Single threaded Some benefit, but limited Not addressed scaling issues – message numbers

© Crown copyright Met Office Old UM I/O – Restart Files

© Crown copyright Met Office Old UM I/O - Diagnostics

© Crown copyright Met Office Why I/O Server approach? Full parallel I/O difficult with our packing “Free” CPUs available “Spare” memory available Chance to re-work old infrastructure Our file format is neither GRIB or netCDF.

© Crown copyright Met Office Diagnostic flexibility Variables (primary and derived) Output times Temporal processing (e.g. accumulations, extrema, means) Spatial processing (sub-domains, spatial means) Variable to unit mapping Basic output resolution is a 2D field

© Crown copyright Met Office Key design decisions Parallelism over output streams Output streams distributed over servers Server is threaded “Listener” receives data & puts in queue “Writer” processes queue including packing Ensures asynchronous behaviour Shared FIFO queue Preserves instruction order Metadata/Data split Data initially stored on compute processes Data of same type combined into large messages

© Crown copyright Met Office Parallelism in I/O Servers Multiple I/O streams in typical job I/O servers spread among nodes Can utilise more memory Will improve bandwidth to disk

© Crown copyright Met Office Automatic post-processing Model can trigger automatic post-processing Requests dealt with by I/O Server FIFO queue ensures integrity of data

© Crown copyright Met Office How data gets output ComputeI/O ListenerWriter Thread 0 Thread 1

© Crown copyright Met Office I/O Server development Initial version – Synchronous data transmission Asynchronous diagnostic data Asynchronous restart data Amalgamated data Asynchronous metadata Load balancing Priority messages with I/O Server

© Crown copyright Met Office Lots of diagnostic output Which processes are I/O servers “Stall” messages Memory log Timing log Full log of metadata / queue All really useful for tuning!

© Crown copyright Met Office Lots of tuneable parameters… Number and spacing of I/O servers Memory for I/O servers Number of local data copies Number of fields to amalgamate Load balancing options Timing tunings + standard I/O tunings (write block size) etc

© Crown copyright Met Office Overloaded servers

© Crown copyright Met Office I/O Servers keeping up!

© Crown copyright Met Office MPI considerations Differing levels of MPI threading support Best with MPI_THREAD_MULTIPLE OK with MPI_THREAD_FUNNELED MPI tuning Want metadata to go as quickly as possible Want data transfer to be truly asynchronous Don’t want to interfere with model comms (e.g. halo exchange) Currently use 19 environment variables!

© Crown copyright Met Office Deployment July 2011 – Operational global forecasts January 2012 – Operational LAM forecasts February 2012 – High resolution climate work Not currently used in Operational ensembles Low resolution climate work Most research work

© Crown copyright Met Office Global Forecast Improvement QG 00/12 QG 06/18 QU Time 777s559s257s %age 19%28%27% Total saving: over 21 node-hours per day

© Crown copyright Met Office Impact on High Resolution Climate N512 resolution AMIP 59 GB restart dumps Modest diagnostics Cray XE6 with up to 9K cores All “in-run” output hidden Waits for final restart dump Most data buffered on client side

© Crown copyright Met Office Current and Future Developments MPI Parallel I/O servers Multiple I/O servers per stream Gives more memory per stream on server Reduced messaging rate per node Parallel packing Potential for parallel I/O Read ahead Potential for boundary conditions / forcings Some possibilities for initial condition

© Crown copyright Met Office Parallel I/O server improvement Before After

© Crown copyright Met Office Questions and answers