SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review JLab, May 14, 2007 Code distribution see

Slides:



Advertisements
Similar presentations
Introduction to Grid Application On-Boarding Nick Werstiuk
Advertisements

I/O and the SciDAC Software API Robert Edwards U.S. SciDAC Software Coordinating Committee May 2, 2003.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
SciDAC Software Infrastructure for Lattice Gauge Theory
Nuclear Physics in the SciDAC Era Robert Edwards Jefferson Lab SciDAC 2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
QDP++ and Chroma Robert Edwards Jefferson Lab
Algorithms for Lattice Field Theory at Extreme Scales Rich Brower 1*, Ron Babich 1, James Brannick 2, Mike Clark 3, Saul Cohen 1, Balint Joo 4, Tony Kennedy.
SciDAC Software Infrastructure for Lattice Gauge Theory DOE meeting on Strategic Plan --- April 15, 2002 Software Co-ordinating Committee Rich Brower ---
HackLatt MILC with SciDAC C Carleton DeTar HackLatt 2008.
NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.
May 25, 2010 Mary Hall May 25, 2010 Advancing the Compiler Community’s Research Agenda with Archiving and Repeatability * This work has been partially.
MILC Code Basics Carleton DeTar KITPC MILC Code Capabilities Molecular dynamics evolution –Staggered fermion actions (Asqtad, Fat7, HISQ,
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
HackLatt MILC Code Basics Carleton DeTar HackLatt 2008.
SciDAC Software Infrastructure for Lattice Gauge Theory DOE Grant ’01 -- ’03 (-- ’05?) All Hands Meeting: FNAL Feb. 21, 2003 Richard C.Brower Quick Overview.
I/O and the SciDAC Software API Robert Edwards U.S. SciDAC Software Coordinating Committee May 2, 2003.
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
The QCDOC Project Overview and Status Norman H. Christ DOE LGT Review May 24-25, 2005.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
LQCD Project Overview Don Holmgren LQCD Project Progress Review May 25-26, 2006 Fermilab.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
SSI-OSCAR A Single System Image for OSCAR Clusters Geoffroy Vallée INRIA – PARIS project team COSET-1 June 26th, 2004.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Visual Linker Final presentation.
QCD Project Overview Ying Zhang September 26, 2005.
TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.
Center for Programming Models for Scalable Parallel Computing: Project Meeting Report Libraries, Languages, and Execution Models for Terascale Applications.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Chroma I: A High Level View Bálint Joó Jefferson Lab, Newport News, VA given at HackLatt'06 NeSC, Edinburgh March 29, 2006.
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
Lattice QCD and the SciDAC-2 LQCD Computing Project Lattice QCD Workflow Workshop Fermilab, December 18, 2006 Don Holmgren,
Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.
HackLatt MILC Code Basics Carleton DeTar First presented at Edinburgh EPCC HackLatt 2008 Updated 2013.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
HPC User Forum Back End Compiler Panel SiCortex Perspective Kevin Harris Compiler Manager April 2009.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
Numerical Libraries Project Microsoft Incubation Group Mary Beth Hribar Microsoft Corporation CSCAPES Workshop June 10, 2008 Copyright Microsoft Corporation,
Chroma: An Application of the SciDAC QCD API(s) Bálint Joó School of Physics University of Edinburgh UKQCD Collaboration Soon to be moving to the JLAB.
1 1 What does Performance Across the Software Stack mean?  High level view: Providing performance for physics simulations meaningful to applications 
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower QCD Project Review May 24-25, 2005 Code distribution see
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
Connections to Other Packages The Cactus Team Albert Einstein Institute
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
QDP++ and Chroma Robert Edwards Jefferson Lab Collaborators: Balint Joo.
Lawrence Livermore National Laboratory 1 Science & Technology Principal Directorate - Computation Directorate Scalable Fault Tolerance for Petascale Systems.
TeraGrid Capability Discovery John-Paul “JP” Navarro TeraGrid Area Co-Director for Software Integration University of Chicago/Argonne National Laboratory.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
LQCD Computing Project Overview
VisIt Project Overview
Project Management – Part I
Chapter 4: Multithreaded Programming
Pattern Parallel Programming
Programming Models for SimMillennium
LQCD Computing Operations
ILDG Implementation Status
Many-core Software Development Platforms
Patrick Dreher Research Scientist & Associate Director
Chroma: An Application of the SciDAC QCD API(s)
Presentation transcript:

SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review JLab, May 14, 2007 Code distribution see

Software Committee Rich Brower (chair) Brower (chair) Carleton DeTar DeTar Robert Edwards Edwards Don Holmgren Holmgren Bob Mawhinney Mawhinney Chip Watson Watson Ying Zhang Zhang SciDAC-2 Minutes/Documents &Progress report

Major Participants in SciDAC Project ArizonaDoug ToussaintMITAndrew Pochinsky Dru RennerJoy Khoriaty BURich Brower *North CarolinaRob Fowler James OsbornYing Zhang * Mike ClarkJLabChip Watson * BNLChulwoo JungRobert Edwards * Enno ScholzJie Chen Efstratios EfstathiadisBalint Joo ColumbiaBob Mawhinney *IITXien-He Sun DePaulMassimo DiPierroIndianaSteve Gottlieb FNALDon Holmgren *Subhasish Basak Jim SimoneUtahCarleton DeTar * Jim KowalkowskiLudmila Levkova Amitoj SinghVanderbiltTed Bapty * Software Committee: Participants funded in part by SciDAC grant

QCD Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse high performance hardware. Requirements: 1.Build on 20 year investment in MILC/CPS/Chroma 2.Optimize critical kernels for peak performance 3.Minimize software effort to port to new platforms & to create new applications

Solution for Lattice QCD  (Perfect) Load Balancing: Uniform periodic lattices & identical sublattices per processor.  (Complete) Latency hiding: overlap computation /communications  Data Parallel: operations on Small 3x3 Complex Matrices per link.  Critical kernels : Dirac Solver, HMC forces, etc %-90% Lattice Dirac operator:

Optimized Dirac Operators, Inverters Level 3 QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts Level 2 QMP (QCD Message Passing) QLA (QCD Linear Algebra) Level 1 QIO Binary/ XML Metadata Files SciDAC-1 QCD API C/C++, implemented over MPI, native QCDOC, M-via GigE mesh Optimised for P4 and QCDOC Exists in C/C++ ILDG collab

Data Parallel QDP/C,C++ API Hides architecture and layout Operates on lattice fields across sites Linear algebra tailored for QCD Shifts and permutation maps across sites Reductions Subsets Entry/exit – attach to existing codes

Example of QDP++ Expression  Typical for Dirac Operator:  QDP/C++ code:  Use Portable Expression Template Engine (PETE) Temporaries eliminated, expressions optimized

QOP (Optimized in asm) Dirac Operator, Inverters, Force etc QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts QMP (QCD Message Passing) QLA (QCD Linear Algebra) QIO Binary / XML files & ILDG SciDAC-2 QCD API QMC (QCD Multi-core interface) Uniform User Env Runtime, accounting, grid, QCD Physics Toolbox Shared Alg,Building Blocks, Visualization,Performance Tools Level 4 Workflow and Data Analysis tools Application Codes: MILC / CPS / Chroma / RoleYourOwnMILC CPSChroma Level 3 Level 2 Level 1 SciDAC-1/SciDAC-2 = Gold/Blue PERI TOPS

Level 3 Domain Wall CG Inverter y y L s = 16, 4g is P4 2.8MHz, 800MHz FSB (6) (32 nodes) (4)

Asqtad Inverter on Kaon FNAL Mflop/s per core L Comparison of MILC C code vs SciDAC/QDP on L 4 sub-volumes for 16 and 64 core partition of Kaon

Level 3 on QCDOC Asqtad CG on L 4 subvolumes Mflop/s L Asqtad RHMC kernels 24 3 x32 with subvolumes 6 3 x18 DW RHMC kernels 32 3 x64x16 with subvolumes 4 3 x8x16

Building on SciDAC-1  Common Runtime Environment 1.File transfer, Batch scripts, Compile targets 2.Practical 3 Laboratory “Metafacility”  Fuller use of API in application code. 1. Integrate QDP into MILC & QMP into CPS 2. Universal use of QIO, File Formats, QLA etc 3. Level 3 Interface standards  Porting API to INCITE Platforms 1.BG/L & BG/P: QMP and QLA using XLC & Perl script 2. Cray XT4 & Opteron, clusters

 Workflow and Cluster Reliability 1. Automated campaign to merge lattices, propagators and extract physics. (FNAL & Illinois Institute of Tech) 2.Cluster Reliability: (FNAL & Vanderbuilt)  Tool Box -- shared algorithms / building blocks 1. RHMC, eigenvector solvers, etc 2. Visualization and Performance Analysis (DePaul & PERC) 3. Multi-scale Algorithms (QCD/TOPS Collaboration)  Exploitation of Multi-core 1.Multi-core not Hertz is new paradigm 2.Plans for a QMC API (JLab & FNAL & PERC) New SciDAC-2 Goals See SciDAC-2 kickoff workshop Oct27-28,

QMC – QCD Multi-Threading General Evaluation – OpenMP vs. Explicit Thread library (Chen) –Explicit thread library can do better than OpenMP but OpenMP is compiler dependent Simple Threading API: QMC –based on older smp_lib (Pochinsky) –use pthreads and investigate barrier synchronization algorithms Evaluate threads for SSE-Dslash Consider threaded version of QMP ( Fowler and Porterfield in RENCI) Parallel sites 0..7 Serial Code Parallel sites Idle finalize / thread join

Conclusions  Progress has been made using a common QCD-API & libraries for Communication, Linear Algebra, I/0, optimized inverters etc.  But full Implementation, Optimization, Documentation & Maintenance of shared codes is a continuing challenge.  And there is much work to keep up with changing Hardware and Algorithms.  Still NEW users (young and old) with no prior lattice experience have initiated new lattice QCD research using SciDAC software!  The bottom line is PHYSICS is being well served.