Let Them Use Fortran Code generation and optimisation with PSyclone

Slides:



Advertisements
Similar presentations
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Advertisements

Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
© Crown copyright Met Office LFRic Coupling Requirements 3rd Workshop on Coupling Technologies for Earth System Models Steve Mullerworth April 22 nd 2015.
1 Addressing Critical Skills Shortages at the NWS Environmental Modeling Center S. Lord and EMC Staff OFCM Workshop 23 April 2009.
1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.
Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.
SOFTWARE ENGINEERING. Objectives Have a basic understanding of the origins of Software development, in particular the problems faced in the Software Crisis.
System Programming Basics Cha#2 H.M.Bilal. Operating Systems An operating system is the software on a computer that manages the way different programs.
EU-Russia Call Dr. Panagiotis Tsarchopoulos Computing Systems ICT Programme European Commission.
Early Experience with Applications on POWER8 Mike Ashworth, Rob Allan, Rupert Ford, Xiaohu Guo, Mark Mawson, Jianping Meng, Andrew Porter STFC Hartree.
Supercomputing versus Big Data processing — What's the difference?
Computer System Structures
Sub-fields of computer science. Sub-fields of computer science.
INTRO. To I.T Razan N. AlShihabi
cs612/2002sp/projects/ CS612 Term Projects cs612/2002sp/projects/
Geoffrey Fox Panel Talk: February
COMP Compilers Lecture 1: Introduction
Chapter 1 The Systems Development Environment
Parallel Patterns.
Database Development (8 May 2017).
Chapter 1 The Systems Development Environment
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
Current Generation Hypervisor Type 1 Type 2.
Hardware and Software Hardware refers to the physical devices of the computer system e.g. monitor, keyboard, printer, RAM etc. Software is a set of programs,
Programming Language Hierarchy, Phases of a Java Program
CSCI-235 Micro-Computer Applications
Chapter 1 The Systems Development Environment
Parallel Programming By J. H. Wang May 2, 2017.
Introduction SOFTWARE ENGINEERING.
Difficulties in Expert System Development
Parallel Computing in the Multicore Era
PT Evaluation of the Dycore Parallel Phase (EDP2)
Software Maintenance.
ENIAC ENabling the Icon model on heterogeneous ArChitecture
LFRic: A new model for the Met Office
Chapter 1 The Systems Development Environment
Contact person: Mats Brorsson
Green Software Engineering Prof
Software What Is Software?
CSCI/CMPE 3334 Systems Programming
Many-core Software Development Platforms
Introduction Edited by Enas Naffar using the following textbooks: - A concise introduction to Software Engineering - Software Engineering for students-
Computer Science I CSC 135.
COMP Compilers Lecture 1: Introduction
Dycore Rewrite Tobias Gysi.
Performance Optimization for Embedded Software
Compiler Back End Panel
Chapter 2: System Structures
Summary Background Introduction in algorithms and applications
Compiler Back End Panel
GENERAL VIEW OF KRATOS MULTIPHYSICS
Starting Design: Logical Architecture and UML Package Diagrams
Analysis models and design models
Chapter 7 –Implementation Issues
COMP60611 Fundamentals of Parallel and Distributed Systems
CS-3013 Operating Systems Hugh C. Lauer
Lecture 2 The Art of Concurrency
Implementation support
Design Yaodong Bi.
Mapping DSP algorithms to a general purpose out-of-order processor
Re- engineeniering.
Chapter 1 The Systems Development Environment
Southern California Earthquake Center
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
GungHo! A new dynamical core for the Unified Model Nigel Wood, Dynamics Research, UK Met Office © Crown copyright Met Office.
Implementation support
LCPC02 Wei Du Renato Ferreira Gagan Agrawal
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Let Them Use Fortran Code generation and optimisation with PSyclone Andrew Porter and Rupert Ford STFC Hartree Centre

Contents Background: the LFRic Project Portable Performance Separation of Concerns Domain-Specific Language PSyclone Transformations (fparser) Summary

Background: The LFRic Project Met Office project to develop a replacement for the Unified Model Named in honour of Lewis Fry Richardson (first numerical weather ‘prediction’) Using GungHo recommendations: Cubed-sphere mesh Mixed finite-element scheme Aims to be at least as ‘good’, scientifically as the UM Achieve good performance on current and future supercomputers

Hardware vs. Software Hardware is “soft” Software is “hard” The UM has ~1M lines of code Has a flat performance profile optimising it is very labour intensive Has outlasted many, many generations of supercomputer Porting it to a new architecture that may only be around for a few years is not feasible Hardware is “soft” Software is “hard”

Separation of Concerns Separate the Natural Science (meteorology) from the Computational Science (performance): Algorithm Kernel Parallel System Computational Science Natural Science "PSyKAl"

Domain-specific knowledge Finite element/volume/difference-specific Time-stepping Operations over a mesh Typically same operation at each element/volume/point Data parallel (typically independent operations) Nearest neighbour communications for stencils Global reduction(s) for convergence and/or conservation

Knowledge provided via kernel meta-data type, extends(kernel_type) :: compute_cu type(arg), dimension(3) :: meta_args = & (/ arg(WRITE, CU, POINTWISE), & ! CU arg(READ, CT, POINTWISE), & ! P arg(READ, CU, POINTWISE) & ! U /) integer :: ITERATES_OVER = INTERNAL_PTS integer :: index_offset = OFFSET_SW contains procedure, nopass :: code => compute_cu_code end type compute_cu

Given domain-specific knowledge and information about the Algorithm and Kernels, the Parallel System layer can be generated... ...by

Generating the Parallel System Layer with PSyclone A domain-specific compiler for embedded DSL(s) Finite Difference, Finite Element Fortran -> Fortran Supports distributed- and shared-memory parallelism A tool for use by HPC experts Hard to beat a human People* enjoy using tools (Linux c.f. Windows) Reduces programmer errors (both correctness and performance) in implementing parallel code Optimisations encoded as a ‘recipe’ rather than baked into the scientific source code Different recipes for different architectures *that is, RSEs

Example Algorithm Single time-step for NEMOLite2D (shallow-water, finite-difference):

Example Kernel

Generated PSy Layer PSy layer code is generated by PSyclone, e.g.:

Each invoke has a schedule GOSchedule[invoke='invoke_0', Constant loop bounds=True] Loop[type='outer',field_space='ct', it_space='internal_pts'] Loop[type='inner',field_space='ct', it_space='internal_pts'] KernCall continuity(ssha_t,sshn_t, sshn_u,sshn_v,hu,hv,un,vn, rdt,area_t) [mod_inline=False] Loop[type='outer',field_space='cu', it_space='internal_pts'] Loop[type='inner',field_space='cu', it_space='internal_pts'] KernCall momentum_u(ua,un,vn,hu,hv,ht,ssha_u, sshn_t,sshn_u,sshn_v,tmask, dx_u,dx_v,dx_t,dy_u,dy_t, area_u,gphiu) [mod_inline=False] ...

Schedules are manipulated using transformations...

Transformation Example

Transformed schedule GOSchedule[invoke='invoke_0',Constant loop bounds=True] Directive[OMP parallel] Directive[OMP do] Loop[type='outer',field_space='ct', it_space='internal_pts'] Loop[type='inner',field_space='ct', it_space='internal_pts'] KernCall continuity_code(ssha_t,sshn_t,sshn_u, ...,area_t)[mod_inline=False] Directive[OMP do] Loop[type='outer',field_space='cu', it_space='internal_pts'] Loop[type='inner',field_space='cu', it_space='internal_pts'] KernCall momentum_u_code(ua,un,vn,hu,hv, ...,area_u,gphiu)[mod_inline=False] ...

fparser PSyclone uses Fortran parser from the f2py project Development on that project has stalled Removed the parser from PSyclone and created standalone package “fparser” Like PSyclone, is available from github and pypi:

Summary Single source science code and performance portability are key for maintainable high performance software Separation of concerns + DSL is a potential way forward for current architectures and a way to prepare for future architectures PSyclone supports DSLs embedded in Fortran Finite Element (LFRic) Finite Difference (GOcean) Open development on github with use of Travis and Coveralls fparser under development – contributions welcomed!

Thank you for listening PSyclone: github.com/stfc/PSyclone fparser: github.com/stfc/fparser