Automatic LQCD Code Generation Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011 Saint-Hippolyte (France) Claude Tadonki.

Slides:



Advertisements
Similar presentations
Etter/Ingber Engineering Problem Solving with C Fundamental Concepts Chapter 4 Modular Programming with Functions.
Advertisements

Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Accelerator-based Implementation of the Harris Algorithm International Conference on Image and Signal Processing 2012 (ICISP 2012) June 28-30, Agadir,
© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois A Proposal of Operation History Management System for Source-to-Source Optimization.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Networking Problems in Cloud Computing Projects. 2 Kickass: Implementation PROJECT 1.
ISBN Chapter 3 Describing Syntax and Semantics.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
Comp 205: Comparative Programming Languages Semantics of Imperative Programming Languages denotational semantics operational semantics logical semantics.
CS 201 Functions Debzani Deb.
Describing Syntax and Semantics
Guide To UNIX Using Linux Third Edition
Chapter 1 Program Design
Chapter 3 Planning Your Solution
PRE-PROGRAMMING PHASE
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Semi-Automatic Composition of Data Layout Transformations for Loop Vectorization Shixiong Xu, David Gregg University of Dublin, Trinity College
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
AlgoTutor Tutorial (3) Program Pad J. Yoo, S. Yoo, C. Pettey, S. Seo, and Z. Dong MTSU Computer Science Department Making the transition from the algorithm.
CH07: Writing the Programs Does not teach you how to program, but point out some software engineering practices that you should should keep in mind as.
DCT 1123 PROBLEM SOLVING & ALGORITHMS INTRODUCTION TO PROGRAMMING.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Shawlands Academy Higher Computing Software Development Unit.
GENERAL CONCEPTS OF OOPS INTRODUCTION With rapidly changing world and highly competitive and versatile nature of industry, the operations are becoming.
Simple Program Design Third Edition A Step-by-Step Approach
สาขาวิชาเทคโนโลยี สารสนเทศ คณะเทคโนโลยีสารสนเทศ และการสื่อสาร.
Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.
CS 395 Last Lecture Summary, Anti-summary, and Final Thoughts.
Claude Tadonki Mines ParisTech – CRI – Mathématiques et Systèmes Laboratoire de l’Accélérateur Linéaire/IN2P3/CNRS France 2nd.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
1 The Software Development Process  Systems analysis  Systems design  Implementation  Testing  Documentation  Evaluation  Maintenance.
Programming Concepts Chapter 3.
Introduction to Computer Programming Using C Session 23 - Review.
BE-SECBS FISA 2003 November 13th 2003 page 1 DSR/SAMS/BASP IRSN BE SECBS – IRSN assessment Context application of IRSN methodology to the reference case.
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
1 INTRODUCTION TO PROBLEM SOLVING AND PROGRAMMING.
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
ADTs and C++ Classes Classes and Members Constructors The header file and the implementation file Classes and Parameters Operator Overloading.
C++ Basics C++ is a high-level, general purpose, object-oriented programming language.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
The Software Development Process
Strengthening deflation implementation for large scale LQCD inversions Claude Tadonki Mines ParisTech / LAL-CNRS-IN2P3 Review Meeting / PetaQCD LAL / Paris-Sud.
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
Chapter 1 Computers, Compilers, & Unix. Overview u Computer hardware u Unix u Computer Languages u Compilers.
 Programming - the process of creating computer programs.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
How Are Computers Programmed? CPS120: Introduction to Computer Science Lecture 5.
1 The Software Development Process ► Systems analysis ► Systems design ► Implementation ► Testing ► Documentation ► Evaluation ► Maintenance.
Aurora/PetaQCD/QPACE Metting Regensburg University, April 14-15, 2010.
C OMPUTATIONAL R ESEARCH D IVISION 1 Defining Software Requirements for Scientific Computing Phillip Colella Applied Numerical Algorithms Group Lawrence.
What Makes Device Driver Development Hard Synthesizing Device Drivers Roumen Kaiabachev and Walid Taha Department of Computer Science, Rice University.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
ANR Meeting / PetaQCD LAL / Paris-Sud University, May 10-11, 2010.
Chapter 10 Software quality. This chapter discusses n Some important properties we want our system to have, specifically correctness and maintainability.
Introduction to Problem Solving Programming is a problem solving activity. When you write a program, you are actually writing an instruction for the computer.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Chapter 9: Value-Returning Functions
Software Testing.
Introduction to Visual Basic 2008 Programming
SOFTWARE DESIGN AND ARCHITECTURE
Architecture Components
User-Defined Functions
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Focus on Parallel Combinatorial Optimization
Presentation transcript:

Automatic LQCD Code Generation Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011 Saint-Hippolyte (France) Claude Tadonki Mines ParisTech – CRI (Centre de Recherche en Informatique) - Mathématiques et Systèmes LAL (Laboratoire Accélérateur Linéaire)/IN2P3/IN2P3 Joint work with D. Barthou, C. Eisenbeis, G. Grosdidier, M. Kruse, L. Morin, O. Pène, and K. Petrov, PetaQCD Project

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project The only systematic and rigorous method to solve this theory is Lattice QCD (LQCD), which can be numerically simulated on massively parallel supercomputers. Quantum Chromodynamics is the theory of strong interactions, whose ambition is to explain nuclei cohesion as well as neutron and proton structure, i.e. most of the visible matter in the Universe. Background LQCD simulations are based on the Monte Carlo paradigm. The main ingredient of the computation is the resolution of a linear system based on the Dirac matrix, which is an abstract representation of the (local) Dirac operator. The Dirac operator applied on a site x of the lattice can be expressed as follows: The Dirac operator involves a stencil computation, which applies on a large number of sites.

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project Large volume of data ( disk / memory / network ) Significant number of solvers iterations due to numerical intractability Redundant memory accesses coming from interleaving data dependencies Use of double precision because of accuracy need (hardware penalty) Misaligned data (inherent to specific data structures) Exacerbates cache misses (depending on cache size) Becomes a serious problem when consider accelarators Leads to « false sharing » with Shared-Memory paradigm (Posix, OpenMP) Padding is one solution but would dramatically increase memory requirement Memory/Computation compromise in data organization (e.g. gauge replication) Key Computation Issues

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project One application of the Dirac operator involves more than thousand floating point instructions, thus it is hard to plan low level optimization by hand. There are different variants of the Dirac operator and several way of expression the calculation depending on the data layout. There are different target architectures which can be considered. Thus, generating the code for each of them manually can be tedious and error-prone. One way to seek optimal implementation is to filter from an exhaustive search over all possible (or reasonable). Why an Automatic Code Generation System As the formulae could be frequently changed or adapted, a push-button system to get the corresponding code is certainly comfortable.

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project How the chain is implemented From an input file containing the formalism (formulae + algorithm), the QIRAL module generates a code prototype (called pseudo-code) QIRAL is composed with a set of rewriting rules and predefined datatypes. The input, even written in latex, has to follow some syntaxic guidelines Experienced user can modify the kernel of QIRAL in order for instance to enhance the rewriting rules basis or introduce new datatypes. The output of QIRAL (the pseudo-code) is clearly a C-like prototype, which therefore needs to be retreated in order to obtain a valid (C or C++) code.

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project The full working chain

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) The pseudo-code from Latex

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project The needs from the pseudo-code to a valid C code the variables need to be explicitly declared (QIRAL declares part of the main ones) some simplification still need to be done (id(C) x id(S) = id(C x S) and id(n)*u = u ) special statements need to be appropriately expanded (sum(d in {dx,dy,dz,dt} ) libraries calls are required for macroscopic operations (Ap[L].Ap[L]) ; (Ap[L].r[L]) specific routines should be called for special operations like (id(S) + gamma(d))*u) 4D indexation should be correctly handled/matched with its 1D correspondence some profiling and monitoring instructions should be inserted for user convenience I/O routines are required for user parameters and data files The last point is particularly important since it will determine the routines to be called (in standard C) or the method execute (C++ / ad-hoc polymorphism). data types need to be correctly captured for further semantic purposes

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project The C code from the pseudo-code The module is based on Lex and Yacc Its generated a valid C code The output is associated to an external library for special routines A C++ output is planned, will be associated with QDP++ or QUDA through a specific interface A web based interface is available

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project Running the code (mCR solver)

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) PetaQCD Project Concluding remarks and peserspecives Try to generate the final code in one step Insert pragmas (OpenMP, HMPP, …) for special targets or automatic parallelization Add a complexity evaluation module (useful for automatic code optimization) Build the global system including the searching mecanism to reach the optimal code Generalize the framework (LQCD should not remains the main target)

Automatic LQCD Code Generation Claude Tadonki Quatrièmes Rencontres de la Communauté Française de Compilation December 5-7, 2011, Saint-Hippolyte (France) THANKS FOR YOUR ATTENTION END