PASC PASCHA Project The next HPC step for the COSMO model

Slides:



Advertisements
Similar presentations
Refining High Performance FORTRAN Code from Programming Model Dependencies Ferosh Jacob University of Alabama Department of Computer Science
Advertisements

1 Copyright © 2012 Oracle and/or its affiliates. All rights reserved. Convergence of HPC, Databases, and Analytics Tirthankar Lahiri Senior Director, Oracle.
Test Automation Success: Choosing the Right People & Process
Optimizing the Performance of Streaming Numerical Kernels on the IBM Blue Gene/P PowerPC 450 Tareq Malas Advisors: Prof. David Keyes, Dr. Aron Ahmadia.
Portable and Predictable Performance on Heterogeneous Embedded Manycores (ARTEMIS ) ARTEMIS 2 nd Project Review October 2014 WP2 “Application use.
1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.
4/27/2006Education Technology Presentation Visual Grid Tutorial PI: Dr. Bina Ramamurthy Computer Science and Engineering Dept. Graduate Student:
Web Application Architecture: multi-tier (2-tier, 3-tier) & mvc
Status of Dynamical Core C++ Rewrite (Task 5) Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller.
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Porting the physical parametrizations on GPUs using directives X. Lapillonne, O. Fuhrer, Cristiano Padrin, Piero Lanucara, Alessandro Cheloni Eidgenössisches.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Operational COSMO Demonstrator OPCODE André Walser and.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Exascale Programming Models Lecture Series 06/12/2014 What is OCR? TG Team (presenter: Romain Cledat) June 12,
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Manno, , © by Supercomputing Systems 1 1 COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno,
August 2001 Parallelizing ROMS for Distributed Memory Machines using the Scalable Modeling System (SMS) Dan Schaffer NOAA Forecast Systems Laboratory (FSL)
Status of Dynamical Core C++ Rewrite Oliver Fuhrer (MeteoSwiss), Tobias Gysi (SCS), Men Muhheim (SCS), Katharina Riedinger (SCS), David Müller (SCS), Thomas.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Lab Activities 1, 2. Some of the Lab Server Specifications CPU: 2 Quad(4) Core Intel Xeon 5400 processors CPU Speed: 2.5 GHz Cache : Each 2 cores share.
TTCN-3 Testing and Test Control Notation Version 3.
Virtual multidisciplinary EnviroNments USing Cloud infrastructures Data Management at VENUS-C Ilja Livenson KTH
Advancing National Wireless Capability Date: March 22, 2016 Wireless Test Bed & Wireless National User Facility Paul Titus Department Manager, Communications.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
cs612/2002sp/projects/ CS612 Term Projects cs612/2002sp/projects/
Geoffrey Fox Panel Talk: February
VisIt Project Overview
Chapter 1 The Systems Development Environment
16CS202 & Software Engineering
Chapter 1 The Systems Development Environment
Enabling Effective Utilization of GPUs for Data Management Systems
GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht
Scientific requirements and dimensioning for the MICADO-SCAO RTC
FET Plans FET - Proactive 1.
Concurrent Data Structures for Near-Memory Computing
Chapter 1 The Systems Development Environment
Summary of WG6 activities
PT Evaluation of the Dycore Parallel Phase (EDP2)
ENIAC ENabling the Icon model on heterogeneous ArChitecture
Grid Computing.
Chapter 1 The Systems Development Environment
PP POMPA status Xavier Lapillonne.
A Pattern Specification and Optimizations Framework for Accelerating Scientific Computations on Heterogeneous Clusters Linchuan Chen Xin Huo and Gagan.
Many-core Software Development Platforms
Ray-Cast Rendering in VTK-m
External Projects related to WG 6
A BRIEF INTRODUCTION TO UNIX OPERATING SYSTEM
Significance of research Challenges & objectives
Dycore Rewrite Tobias Gysi.
Discussion HPC Priority project for COSMO consortium
Commons Credits Pilot – Overview
Karen Bartleson, President, IEEE Standards Association
$1M a year for 5 years; 7 institutions Active:
Future EU Grid Projects
EE 4xx: Computer Architecture and Performance Programming
Why Threads Are A Bad Idea (for most purposes)
University of Missouri Task Force on Reporting Strategies
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
CRKN and Canadiana Update
Automatic Generation of Warp-Level Primitives and Atomic Instruction for Fast and Portable Parallel Reduction on GPU Simon Garcia De Gonzalo and Sitao.
Chapter 1 The Systems Development Environment
Why Threads Are A Bad Idea (for most purposes)
Why Threads Are A Bad Idea (for most purposes)
The George Washington University
OpenDP: A Pitch for a Community Effort
Presentation transcript:

PASC PASCHA Project The next HPC step for the COSMO model Carlos Osuna and Oliver Fuhrer, MeteoSwiss COSMO General Meeting 2017, Jerusalem

Overview (1/2) Platform for Advanced Scientific Computing (PASC) See http://www.pasc-ch.org/ Principle Investigator Prof. Torsten Höfler https://spcl.inf.ethz.ch/ Proposal Handed in 11.2016 Written by Carlos Osuna and Oliver Fuhrer Partners: Prof. Christoph Schär, Christina Schnadt

Overview (2/2) Funding 494’300 CHF Two positions for two years Project duration 1.7.2017 – 30.6.2020 (3 years) Project manager Carlos Osuna (MeteoSchweiz)

Background PASCHA HP2C Architectural diversity Xeon Phi NVIDIA Tesla Vector PASCHA HP2C 2012 2013 2014 2015 2016 2017 2018 2019 Time

WP-A: Strong scalability Improve strong scalability of COSMO by introducing functional parallelism, optimizing computation/communication overlap and optimizing the existing fine-grain parallelism (SIMD and SIMT) in order to improve the usability of the model for the main weather and climate use-cases.

Scalability

Functional parallelism In dynamical core (execute independent stencils in parallel) Coarse grained (e.g. execute radiation in parallel)

WP-B: Xeon Phi portability Develop a Xeon Phi backend for the GridTools library and port the full COSMO model to Xeon Phi in order to improve the portability and performance portability of the source code and avoid vendor lock-in.

WP-C: DSLs for performance portability Replace the STELLA library with the GridTools library in order to reduce software maintenance costs. Develop a GridTools intermediate representation to enable high-level DSLs and to leverage external efforts interoperability with other DSL efforts in the community.

High-level DSLs Acceptance of STELLA / GridTools is low (C++, boiler plate, code safety, manual optimization, …) High-level DSL can leverage the power of GridTools but provide a more acceptable and safe frontend

Prototype high-level DSL Example: gtclang Coriolis force No loops No data structures No halo-updates function avg { offset off; storage in; avg = 0.5 * ( in(off) + in() ) ; } ; function coriolis_force { storage fc, in; coriolis_force = fc() * in(); }; operator coriolis { storage u_tend, u, v_tend, v, fc; vertical_region ( k_start , k_end ) { u_tend += avg(j-1, coriolis_force(fc, avg(i+1, v)); v_tend -= avg(i-1, coriolis_force(fc, avg(j+1, u));

WP-D: Performance model driven optimization Use performance model driven optimization of dynamical core in order to improve the single-socket performance.

Optimization options Individual operations flx(lap) out in lap(in) fly(lap) update(flx,fly) lap(in) flx(lap) fly(lap) update(flx,fly) in out Fuse for data locality lap(in) flx(lap) fly(lap) update(flx,fly) in out Fuse more aggresively

Conclusions Project ranging from simple porting (e.g. Xeon Phi) to more exploratory research (e.g. task parallelism, high-level DSLs) Interdisciplinary team with domain scientists, computational scientists and computer scientists. Focus on applications of COSMO model in weather and climate production. Work has just started. Stay tuned for first results!