SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003.

Slides:



Advertisements
Similar presentations
I/O and the SciDAC Software API Robert Edwards U.S. SciDAC Software Coordinating Committee May 2, 2003.
Advertisements

Unified Parallel C at LBNL/UCB Implementing a Global Address Space Language on the Cray X1 Christian Bell and Wei Chen.
SciDAC Software Infrastructure for Lattice Gauge Theory
Nuclear Physics in the SciDAC Era Robert Edwards Jefferson Lab SciDAC 2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
Lattice QCD Comes of Age y Richard C. Brower XLIst Rencontres de Moriond March QCD and Hadronic interactions at high energy.
Beowulf Supercomputer System Lee, Jung won CS843.
Data-Parallel Programming Model Basic uniform operations across lattice: C(x) = A(x)*B(x) Distribute problem grid across a machine grid Want API to hide.
QDP++ and Chroma Robert Edwards Jefferson Lab
ILDG File Format Chip Watson, for Middleware & MetaData Working Groups.
SciDAC Software Infrastructure for Lattice Gauge Theory DOE meeting on Strategic Plan --- April 15, 2002 Software Co-ordinating Committee Rich Brower ---
HackLatt MILC with SciDAC C Carleton DeTar HackLatt 2008.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
MILC Code Basics Carleton DeTar KITPC MILC Code Capabilities Molecular dynamics evolution –Staggered fermion actions (Asqtad, Fat7, HISQ,
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
HackLatt MILC Code Basics Carleton DeTar HackLatt 2008.
SciDAC Software Infrastructure for Lattice Gauge Theory DOE Grant ’01 -- ’03 (-- ’05?) All Hands Meeting: FNAL Feb. 21, 2003 Richard C.Brower Quick Overview.
1 I/O Management in Representative Operating Systems.
I/O and the SciDAC Software API Robert Edwards U.S. SciDAC Software Coordinating Committee May 2, 2003.
1 Performance Evaluation of Gigabit Ethernet & Myrinet
The science of simulation falsification algorithms phenomenology machines better theories computer architectures non-perturbative QFT experimental tests.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review JLab, May 14, 2007 Code distribution see
Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
QCD Project Overview Ying Zhang September 26, 2005.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
LQCD Clusters at JLab Chip Watson Jie Chen, Robert Edwards Ying Chen, Walt Akers Jefferson Lab.
JLab SciDAC Activities QCD-API design and other activities at JLab include: –Messaging passing design and code (Level 1) [Watson, Edwards] First implementation.
Lattice QCD and the SciDAC-2 LQCD Computing Project Lattice QCD Workflow Workshop Fermilab, December 18, 2006 Don Holmgren,
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
HackLatt MILC Code Basics Carleton DeTar First presented at Edinburgh EPCC HackLatt 2008 Updated 2013.
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Chroma: An Application of the SciDAC QCD API(s) Bálint Joó School of Physics University of Edinburgh UKQCD Collaboration Soon to be moving to the JLAB.
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower QCD Project Review May 24-25, 2005 Code distribution see
SURA BOT 11/5/02 Lattice QCD Stephen J Wallace. SURA BOT 11/5/02 Lattice.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
Connections to Other Packages The Cactus Team Albert Einstein Institute
What is QCD? Quantum ChromoDynamics is the theory of the strong force
A QCD Grid: 5 Easy Pieces? Richard Kenway University of Edinburgh.
Interconnection network network interface and a case study.
CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters Amit Karwande, Xin Yuan Department of Computer Science, Florida State.
HEP and NP SciDAC projects: Key ideas presented in the SciDAC II white papers Robert D. Ryne.
U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
QDP++ and Chroma Robert Edwards Jefferson Lab Collaborators: Balint Joo.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
LQCD Computing Project Overview
Project Management – Part I
ab initio Chemistry ab initio QCD = flops  10 Mflops
LQCD Computing Operations
ILDG Implementation Status
Patrick Dreher Research Scientist & Associate Director
BlueGene/L Supercomputer
Pluggable Architecture for Java HPC Messaging
Chroma: An Application of the SciDAC QCD API(s)
MPJ: A Java-based Parallel Computing System
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Programming Parallel Computers
Presentation transcript:

SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003

K. Wilson (1989 Capri): “ One lesson is that lattice gauge theory could also require a 10 8 increase in computer power AND spectacular algorithmic advances before useful interactions with experiment... ” ab initio Chemistry = flops  10 Mflops 3.Gaussian Basis functions ab initio QCD = 2030?* 2.10 Mflops  1000 Tflops 3.Clever Collective Variable? vs * Hopefully sooner but need $1/Mflops  $1/Gflops!

Sci D A C Through entific iscovery dvanced omputing

QCD Infrastructure Project Funded (2005?) QCD Infrastructure Project Funded (2005?) HARDWARE: – 10+ Tflops each at BNL, FNAL & JLab BNL (2004), FNAL/JLab ( ) SOFTWARE: – enable US lattice physicists to use the BNL and FNAL & JLab PHYSICS: – Provide Crucial Lattice “Data” that now dominate some tests of the Standard Model. – Deeper understanding of Field Theory (and even String Theory!)

Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse multi-terascale hardware. TASKS: LIBRARIES: I. QCD Data Parallel API  QDP II. Optimize Message Passing  QMP III. Optimize QCD Linear Algebra  QLA IV. I/O, Data Files and Data Grid  QIO V. Opt. Physics Codes  CPS/MILC/Croma/etc. VI. Execution Environment  unify BNL/FNAL/JLab TASKS: LIBRARIES: I. QCD Data Parallel API  QDP II. Optimize Message Passing  QMP III. Optimize QCD Linear Algebra  QLA IV. I/O, Data Files and Data Grid  QIO V. Opt. Physics Codes  CPS/MILC/Croma/etc. VI. Execution Environment  unify BNL/FNAL/JLab

Participants in Software Project (partial list) * Software Coordinating Committee

Lattice QCD – extremely uniform Periodic or very simple boundary conditions SPMD: Identical sublattices per processor Lattice Operator: Dirac operator:

Optimised Dirac Operators, Inverters Level 3 QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts Level 2 QMP (QCD Message Passing) QLA (QCD Linear Algebra) Level 1 QIO XML I/O DIME SciDAC Software Structure Exists in C/C++, implemented over MPI, GM, QCDOC, gigE Optimised for P4 and QCDOC Exists in C/C++

Overlapping communications and computations C(x)=A(x) * shift(B, + mu): – Send face forward non-blocking to neighboring node. – Receive face into pre-allocated buffer. – Meanwhile do A*B on interior sites. – “Wait” on receive to perform A*B on the face. Lazy Evaluation (C style): Shift(tmp, B, + mu); Mult(C, A, tmp); Data layout over processors

QCDOC 1.5 Tflops (Fall 2003) Performance of Dirac Inverters (% peak) – clover Wilson (assembly): 2 4  56%, 4 4  59% – naive Staggered (MILC) 2 4  14%, 4 4  22% (4 4 assembly 38%) – Asqtad Force (MILC) 2 4  3%, 4 4  7% – Asqtad Force (1 st attempt optimize) 4 4  16% as determined by ASIC Simulator with native SciDAC message passing (QMP).

Cluster Performance: 2002

Future Software Goals Critical needs: – On going Optimization, Testing and Hardening of SciDAC software infrastructure – Leverage SciDAC QCD infrastructure with collaborative efforts with ILDG and SciParC projects – Develop mechanism to maintain distributed software libraries. – Foster an international (Linux style?) development of application code.

Message Passing QMP Philosophy: Subset of MPI capability appropriate to QCD Broadcasts, Global reductions, Barrier Minimal copying / DMA where possible Channel-oriented / asynchronous communication Multidirection sends/receives for QCDOC Grid and switch model for node layout Implemented on GM and MPI. gigE nearly completed

QMP Simple Example char buf[size]; QMP_msgmem_t mm; QMP_msghandle_t mh; mm = QMP_declare_msgmem(buf,size); mh = QMP_declare_send_relative(mm,+x); QMP_start(mh); // Do computations QMP_wait(mh); Receiving node coordinates with the same steps except mh = QMP_declare_receive_from(mm,-x); Multiple calls

Data Parallel QDP/C,C++ API Hides architecture and layout Operates on lattice fields across sites Linear algebra tailored for QCD Shifts and permutation maps across sites Reductions Subsets Entry/exit – attach to existing codes

Data-parallel Operations Unary and binary: -a; a-b; … Unary functions: adj(a), cos(a), sin(a), … Random numbers: // platform independent random(a), gaussian(a) Comparisons (booleans) a <= b, … Broadcasts: a = 0, … Reductions: sum(a), … Fields have various types (indices):

QDP Expressions Can create expressions QDP/C++ code multi1d u(Nd); LatticeDiracFermion b, c, d; int mu; c[even] = u[mu] * shift(b,mu) + 2 * d; PETE: Portable Expression Template Engine Temporaries eliminated, expressions optimised

Linear Algebra Implementation Naïve ops involve lattice temps – inefficient Eliminate lattice temps -PETE Allows further combining of operations (adj(x)*y) Overlap communications/computations // Lattice operation A = adj(B) + 2 * C; // Lattice temporaries t1 = 2 * C; t2 = adj(B); t3 = t2 + t1; A = t3; // Merged Lattice loop for (i =... ;... ;...) { A[i] = adj(B[i]) + 2 * C[i]; }

Binary File/Interchange Formats Metadata – data describing data; e.g., physics params Use XML for metadata File formats: – Files mixed mode – XML ascii+binary – Using DIME (similar to MIME) to package – Use BinX (Edinburgh) to describe binary Replica-catalog web-archive repositories

Current Status Releases and documentation QMP, QDP/C,C++ in first release Performance improvements/testing underway Porting & development of physics codes over QDP on-going QIO/XML support near completion Cluster/QCDOC Run-time environment in development

SciDAC Prototype Clusters Myrinet + Pentium 4 – 48 duals 2.0 GHz FNAL (Spring 2002) – 128 single 2.0 GHz JLab (Summer 2002) – 128 dual 2.4 GHz FNAL (Fall 2002) Gigabit Ethernet Mesh + Pentium 4 – 256 (8x8x4) singles 2.8 GHz JLab (Summer 2003) – FPGA NIC for GigE (Summer 2003) – 256 FNAL (Fall 2003?)

Cast of Characters Software Committee * : R.Brower (chair), C.DeTar, R.Edwards, D.Holmgren, R.Mawhinney, C.Mendes, C.Watson Additional Software: J.Chen, E.Gregory, J.Hetrick, B.Joó, C.Jung, J.Osborn, K.Petrov, A.Pochinsky, J.Simone et al ( * Minutes and working documents: Executive Committee: R. Brower, N. Christ, M. Creutz P. Mackenzie, J. Negele, C. Rebbi, S. Sharpe, R. Suger(chair) C. Watson