Dr Chris Maynard Application Consultant, EPCC +44 131 650 5077 UKQCD software for lattice QCD P.A. Boyle, R.D. Kenway and C.M.

Slides:



Advertisements
Similar presentations
SciDAC Software Infrastructure for Lattice Gauge Theory
Advertisements

Lecture 1: basics of lattice QCD Peter Petreczky Lattice regularization and gauge symmetry : Wilson gauge action, fermion doubling Different fermion formulations.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Computer Abstractions and Technology
Nuclear Physics in the SciDAC Era Robert Edwards Jefferson Lab SciDAC 2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
What is QCD? Quantum ChromoDynamics is the theory of the strong force –the strong force describes the binding of quarks by gluons to make particles such.
1 Top Production Processes at Hadron Colliders By Paul Mellor.
Lattice QCD (INTRODUCTION) DUBNA WINTER SCHOOL 1-2 FEBRUARY 2005.
HPC - High Performance Productivity Computing and Future Computational Systems: A Research Engineer’s Perspective Dr. Robert C. Singleterry Jr. NASA Langley.
QDP++ and Chroma Robert Edwards Jefferson Lab
QCD-2004 Lesson 1 : Field Theory and Perturbative QCD I 1)Preliminaries: Basic quantities in field theory 2)Preliminaries: COLOUR 3) The QCD Lagrangian.
Lattice Spinor Gravity Lattice Spinor Gravity. Quantum gravity Quantum field theory Quantum field theory Functional integral formulation Functional integral.
Algorithms for Lattice Field Theory at Extreme Scales Rich Brower 1*, Ron Babich 1, James Brannick 2, Mike Clark 3, Saul Cohen 1, Balint Joo 4, Tony Kennedy.
Reference: Message Passing Fundamentals.
Introduction CS 524 – High-Performance Computing.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Functional renormalization – concepts and prospects.
QCD – from the vacuum to high temperature an analytical approach an analytical approach.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
The science of simulation falsification algorithms phenomenology machines better theories computer architectures non-perturbative QFT experimental tests.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
The QCDOC Project Overview and Status Norman H. Christ DOE LGT Review May 24-25, 2005.
Dr. Chris Musselle – Consultant R Meets Julia Dr Chris Musselle.
Gary MarsdenSlide 1University of Cape Town Computer Architecture – Introduction Andrew Hutchinson & Gary Marsden (me) ( ) 2005.
Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.
RM2D Let’s write our FIRST basic SPIN program!. The Labs that follow in this Module are designed to teach the following; Turn an LED on – assigning I/O.
QCD Project Overview Ying Zhang September 26, 2005.
STRATEGIC NAMING: MULTI-THREADED ALGORITHM (Ch 27, Cormen et al.) Parallelization Four types of computing: –Instruction (single, multiple) per clock cycle.
Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.
Electroweak Theory Mr. Gabriel Pendas Dr. Susan Blessing.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Lattice QCD and the SciDAC-2 LQCD Computing Project Lattice QCD Workflow Workshop Fermilab, December 18, 2006 Don Holmgren,
Accelerating the Singular Value Decomposition of Rectangular Matrices with the CSX600 and the Integrable SVD September 7, 2007 PaCT-2007, Pereslavl-Zalessky.
Chroma: An Application of the SciDAC QCD API(s) Bálint Joó School of Physics University of Edinburgh UKQCD Collaboration Soon to be moving to the JLAB.
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower QCD Project Review May 24-25, 2005 Code distribution see
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Parallel Solution of the Poisson Problem Using MPI
January 2006UKQCD meeting - Edinburgh Light Hadron Spectrum and Pseudoscalar Decay Constants with 2+1f DWF at L s = 8 Robert Tweedie RBC-UKQCD Collaboration.
1 Lattice Quantum Chromodynamics 1- Literature : Lattice QCD, C. Davis Hep-ph/ Burcham and Jobes By Leila Joulaeizadeh 19 Oct
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #4.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
What is QCD? Quantum ChromoDynamics is the theory of the strong force
A QCD Grid: 5 Easy Pieces? Richard Kenway University of Edinburgh.
Interconnection network network interface and a case study.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #2.
U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.
MA/CS 471 Lecture 15, Fall 2002 Introduction to Graph Partitioning.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Introduction to Particle Physics I Sinéad Farrington 18 th February 2015.
Sunpyo Hong, Hyesoon Kim
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
An Introduction to Lattice QCD and Monte Carlo Simulations Sinya Aoki Institute of Physics, University of Tsukuba 2005 Taipei Summer Institute on Particles.
Particle Physics Particle Physics Chris Parkes Feynman Graphs of QFT QED Standard model vertices Amplitudes and Probabilities Forces from particle exchange.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Tools08 1 st July1 PDF issues for Monte Carlo generators Peter Richardson IPPP, Durham University.
1 NJL model at finite temperature and chemical potential in dimensional regularization T. Fujihara, T. Inagaki, D. Kimura : Hiroshima Univ.. Alexander.
WEAK DECAYS: ALL DRESSED UP
Standard Model of Particle Physics
Project Management – Part I
Matter-antimatter coexistence method for finite density QCD
Structure of the Proton mass
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
into a quark-antiquark pair self-coupling of gluons
K Kato, E de Doncker, T Ishikawa and F Yuasa ACAT2017, August 2017
P A R A L L E L C O M P U T I N G L A B O R A T O R Y
Chroma: An Application of the SciDAC QCD API(s)
Adnan Bashir, UMSNH, Mexico
Outline Chapter 2 (cont) OS Design OS structure
Presentation transcript:

Dr Chris Maynard Application Consultant, EPCC UKQCD software for lattice QCD P.A. Boyle, R.D. Kenway and C.M. Maynard UKQCD collaboration

AHM2008 Contents Motivation Brief introduction to QCD –What is the science –What we actually calculate –BYOC UKQCD software –Why use more than one code base –UKQCD contributions to code bases Conclusions

AHM2008 What is stuff? Experiments similar to the Large Hadron Collider (LHC) probe the structure of matter LHC will switch on Need a theory to interpret and understand phenomena and predict new ones! 10 September 2008

AHM2008 The Structure of matter Quarks have –Mass  feel gravity –Charge  feel electromagnetism –Flavour  feel weak interaction –Colour  feel strong interaction Strong interaction binds quarks into hadrons –Protons and neutron –Glued together by gluons

AHM2008 What are Gluons? Gluons carry or mediate the strong interaction Quarks feel each other’s presence by exchanging momentum via gluons (virtual spring?) Similar to photon in electromagnetism Unlike photon carry charge of strong interaction (colour) – couple to themselves Gluons are sticky!

AHM2008 Introducing QCD! 1972 DJ. Gross, F. Wilzcek HD Politzer –construct a Quantum field theory of quarks and gluon based on a symmetry group for colour – Quantum Chromodynamics (QCD) –(prove QCD is asymptotically free) –QFT for strong interaction 2004 Receive Noble prize –Are we done? … um, not quite

AHM2008 Asymptotic freedom Short distance/high momentum strength of interaction is small Converse is infrared slavery Low momentum  strong coupling –Quarks are confined in hadrons –Proton mass is ~1 GeV Analytic tool (perturbation theory) only works for small interactions A Feynman diagram

AHM2008 Quarks and gluons on a lattice Replace 4d space-time with grid –Lattice spacing a Quark fields on sites  (x) –4 component spinor (complex vector) on each site Gluon fields on links U  (x) –3x3 complex matrix on each link Equations of motion are partial differential equations –Replace with finite difference –Large (  Volume ), sparse matrix (Fermion matrix) – Contains quark-gluon coupling

AHM2008 Numerical computation Infinite dimensional path integral  high dimensional sum –Hybrid Monte Carlo (HMC) and variants –Update quark and gluon fields –Invert the fermion matrix each update –Krylov subspace methods – conjugate gradient Generate many paths – many gluon field configurations Compute (or measure) quantities of interest on each configuration –Invert the fermion matrix Average over all configurations

AHM2008 Why lattice QCD is hard Fermion matrix is badly conditioned up and down quarks are nearly massless Statistical uncertainty –Require at least O(10 5 ) MC updates –  N~O(10 2 ) 1-5% stat error for basic quantities Systematic uncertainty –Several quark masses (chiral limit) –Bigger box required for lighter masses –2 or more volumes ( ) and lattice spacings Scales badly with problem size –a 6 or a 7 and at least 1/m q

AHM2008 The bottom line Need to invert matrices which are Very Large ~ O(10 7 ), Really badly conditioned – C N  O(10 4 ) or more Many, many times –~O(10 6 )

AHM2008 Quarks and gluons on a computer Interaction is local –Nearest neighbour interactions Parallel decomposition Sub-volume of lattice on each processor Simple communication pattern (halo-swap) Regular communication pattern

AHM2008 Anatomy of a calculation Gluon and quark fields distributed –Not Fermion matrix –Exploit local interactions (sparsity) when evaluating matrix-vector operations –Matrix-vector is M(x,y;U)  (x) –Colour matrix U(x) is small and dense, not split across PE Dominated by matrix-vector and global sums in iterative solver –Double-precision floating point Computation versus communication –Smaller local volume –Greater proportion of data “near” processor –More communication  Machine limiting factors –Memory bandwidth –Comms latency and bandwidth QCD is ideally suited to MPP machine –Build yer own?

AHM2008 QCD-on-a-chip (14K ASIC) ASIC from IBM technology library PowerPC 440 embedded CPU core 64-bit FPU - One FMA per cycle 4MB fast embedded DRAM On chip memory and Ethernet controller Custom design High speed serial links (DMA) Prefetching EDRAM controller Bootable Ethernet JTAG interface 400 MHz  peak is 0.8Gflop/s Network is 6d torus of nearest neighbour

AHM2008 QCDOC performance Saturate single link bandwidth for even small packet size  Low latency  Good for small local volume Global vol x4 2 local volume 1K PE Super-linear scaling as data goes “on-chip Linear thereafter

AHM2008 UKQCD collaboration 8 UK universities –Plymouth joined in 2007 Prior to QCDOC era (up to 2002) –Consensus on form of calculation –Collaboration owned FORTRAN code –Assembler kernel for performance on Cray T3D/T3E QCDOC era –Several (3) different calculations –Each sub-group collaborates internationally –Two open source c++ codes –CPS and chroma –Assembler kernels for performance

AHM2008 SciDAC US DoE program –Funds all US groups –Hardware and software –USQCD Code development –Common code environment UKQCD actively collaborates with USQCD –Sub-project by sub-project USQCD and UKQCD orgs are funding based –Lateral collaboration based on science! –Collaborate on software module development

AHM2008 CPS before QCDOC Developed by Columbia University (CU) for QCDSP machine –Ancestor of QCDOC –Originally not ANSI c++ code –many QCDSP specific features –Not readily portable –Belongs to CU developers UKQCD chose this code base –Building your own machine is a big risk –CPS code base most likely to run on QCDOC from day 1 –Does have required functionality EPCC project to ANSI-fy the code –Code now ran correctly (if slowly) everywhere else

AHM2008 UKQCD contribution to CPS Assembler version of key kernel –P.A. Boyle via BAGEL assembler generator (see later) UKQCD develops new Rational Hybrid Monte Carlo (RHMC) algorithm –Implement and test in CPS (Clarke and Kennedy) –New algorithm has many parameters –Tuning and testing is a non-trivial task CU+BNL+RBRC (RBC) + UKQCD new physics project –(2+1 flavour DWF) –up and down degenerate + strange quarks UKQCD contribute to AsqTad 2+1 flavour project –Other contributors in USA (MILC)

AHM2008 HMC alg evolves 2 degenerate flavours (M is fermion matrix) –Quark fields are anti-commuting Grassmann variables Take square root to do one flavour Approximate square root to a Rational Function Roots and poles calculated with multi-shift solver RHMC Terms with largest contribution to the fermion force –Change the MC update the most –Cost the least to compute Change CG tolerance –Loosen CG for terms which contribute least –Can reduce CG count 2x Keep algorithm exact with Metropolis accept/reject

AHM2008 Implementing multi-timescale RHMC Can use RHMC n th -root to implement algorithmic tricks Multiple pseudofermions are better Mass preconditioning Multiple timescales –Gluon, triple strange, light –Allows a larger overall step-size with good acceptance Higher order integration schemes RHMC algorithm 5-10 times faster –Binaries frozen since March 2006

AHM2008 CPS: good and bad CPS is written around target (QCDOC) hardware Code base runs (correctly) on target hdw –Helps reduce risk when building own machine –Includes much requisite functionality Adoption of CPS allowed UKQCD to focus on its strength –Very successful algorithmic development –Based on direct collaboration with RBC Still need to do measurements –Invert fermion matrix (quark propagators) on gluon configurations –Do measurements on different architectures

AHM2008 Chroma/qdp++ Open source c++ code base –Used by many different groups world-wide Multi-platform by design Highly modular, layered design –QMP: Wrapper around message passing library e.g. MPI –QDP++: Builds lattice valued physics data objects and manipulation methods –Hides message passing layer –Allows “Under-the-hood” optimisations by expert developers –Includes IO –Chroma The physics library –Rich physics functionality UKQCD has historical links with main developers

AHM2008 multi1d u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } } multi1d u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } } multi1d u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } } qdp++ :: plaquette example Lattice valued data objects Manipulation methods

AHM2008 qdp++ :: Abstraction Data objects are lattice valued –No site index –No explicit sum over index Linear algebra is encoded –Code knows how to multiply 3x3 matrices together This has two consequences Expert HPC developer can modify implementation –Optimisation, parallelism, architecture features –Interface remains the same Application developer (Physicist) writes code which looks like maths!

AHM2008 qdp :: Code like maths multi1d u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

AHM2008 qdp :: Code like maths multi1d u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

AHM2008 qdp :: Code like maths multi1d u(Nd) for(int mu=1; mu < Nd; ++mu){ for(int nu=0; nu < mu; ++nu){ LatticeColorMatrix tmp_0 = shift(u[nu],FORWARD,mu) * adj(shift(u[mu],FORWARD,nu)); LatticeColorMatrix tmp_1 = tmp_0 * adj(u[nu]); Double tmp = sum(real(trace(u[mu]*tmp_1))); w_plaq += tmp; } }

AHM2008 Chroma: potential Downside At time of QCDOC development didn’t have RHMC functionality Heavy use of c++ templates can defeat some compilers –Stuck with Gnu compilers –Code is very advanced c++. Not easy for beginners Main program driven by XML input files –All objects created on the fly –Requires a lot of functions to be registered –QCDOC has small memory (especially.text ) Chroma fails to compile on QCDOC –Runs out of.text segment –Physics library compiles OK

AHM2008 UKhadron Old-style main program –Calls qdp++ and chroma library –Harness the power of qdp++ –Focused on UKQCD physics requirements –Most of measurement code for DWF project –Iterative solvers Pros –Runs on QCDOC and everywhere –Control over code - small group of developers –Can build integrated analysis code on top of qdp++ Cons –Compiling can be a headache! –UKhadron requires specific versions of qdp++/choma –Which require specific versions of Gnu compiles and libxml2

AHM2008 BAGEL Assembler generator written by Peter Boyle – Composed in two parts –library to which one can programme a generic RISC assembler kernel –set of programmes that use the library to produce key QCD and linear algebra operations –generator is retargetable, key targets are ppc440, bgl, and powerIII. Allows kernels to run at up to 50% of peak on target arch

AHM2008 Modular Build QMP Libxml2 Bagel lib Bagel apps (bagel qdp, bagel wilson dslash) qdp++ Chroma UKhadron Both a blessing and a curse –Allows for modular, independent code development –Plug in performance code –Highly portable performance –Module version and compiler version dependence can be a problem Can plug in other kernels eg SSE wilsonDslash

AHM2008 Future Fastest machines are now BlueGene/P –Multicore Cray XT4/BlackWidow –Multicore/vector machine Multi-threading in qdp++: Mixed mode code –Shared memory intra-node –Message passing inter-node PGAS languages? –UPC/CoArrayFORTRAN/Chapel/FORTRESS/X10? Hardware designed for QCD (BlueGene/Q) –Performance kernel libraries

AHM2008 Physics CPS on QCDOC  Gluon cfgs UKhadron on QCDOC +BlueGene/L + HECToR  correlation functions Implemented “twisted BC” in UKhadron  new calculation World’s best calculation of charge radius of pion Can determine CKM matrix elements –Tests standard model at LHC –P.A. Boyle et al JHEP07(2008)112 LQCD data (TW BC) Exp data

AHM2008 Conclusions QCD very complex problem Software is performance critical –Very complicated UKQCD operates in a complex and changing –Collaborative environment –internally and externally –Hardware regime Complex and evolving strategy –Allows maximum flexiblity –Gets the science done!