Lattice QCD and the SciDAC-2 LQCD Computing Project Lattice QCD Workflow Workshop Fermilab, December 18, 2006 Don Holmgren,

Slides:



Advertisements
Similar presentations
I/O and the SciDAC Software API Robert Edwards U.S. SciDAC Software Coordinating Committee May 2, 2003.
Advertisements

Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
SciDAC Software Infrastructure for Lattice Gauge Theory
Parallel computer architecture classification
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Nuclear Physics in the SciDAC Era Robert Edwards Jefferson Lab SciDAC 2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this.
What is QCD? Quantum ChromoDynamics is the theory of the strong force –the strong force describes the binding of quarks by gluons to make particles such.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Today’s topics Single processors and the Memory Hierarchy
Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.
Lattice QCD (INTRODUCTION) DUBNA WINTER SCHOOL 1-2 FEBRUARY 2005.
Designing Lattice QCD Clusters Supercomputing'04 November 6-12, 2004 Pittsburgh, PA.
Design Considerations Don Holmgren Lattice QCD Project Review May 24, Design Considerations Don Holmgren Lattice QCD Computing Project Review Cambridge,
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
Algorithms for Lattice Field Theory at Extreme Scales Rich Brower 1*, Ron Babich 1, James Brannick 2, Mike Clark 3, Saul Cohen 1, Balint Joo 4, Tony Kennedy.
MPI in uClinux on Microblaze Neelima Balakrishnan Khang Tran 05/01/2006.
Analysis and Performance Results of a Molecular Modeling Application on Merrimac Erez, et al. Stanford University 2004 Presented By: Daniel Killebrew.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
SciDAC Software Infrastructure for Lattice Gauge Theory DOE Grant ’01 -- ’03 (-- ’05?) All Hands Meeting: FNAL Feb. 21, 2003 Richard C.Brower Quick Overview.
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
I/O and the SciDAC Software API Robert Edwards U.S. SciDAC Software Coordinating Committee May 2, 2003.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Exploiting Disruptive Technology: GPUs for Physics Chip Watson Scientific Computing Group Jefferson Lab Presented at GlueX Collaboration Meeting, May 11,
LQCD Project Overview Don Holmgren LQCD Project Progress Review May 25-26, 2006 Fermilab.
Basis Light-Front Quantization: a non-perturbative approach for quantum field theory Xingbo Zhao With Anton Ilderton, Heli Honkanen, Pieter Maris, James.
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review JLab, May 14, 2007 Code distribution see
Simulating Quarks and Gluons with Quantum Chromodynamics February 10, CS635 Parallel Computer Architecture. Mahantesh Halappanavar.
1 Chapter 04 Authors: John Hennessy & David Patterson.
QCD Project Overview Ying Zhang September 26, 2005.
Lattice QCD in Nuclear Physics Robert Edwards Jefferson Lab CCP 2011 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Cell processor implementation of a MILC lattice QCD application.
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003.
Physics Steven Gottlieb, NCSA/Indiana University Lattice QCD: focus on one area I understand well. A central aim of calculations using lattice QCD is to.
LQCD Clusters at JLab Chip Watson Jie Chen, Robert Edwards Ying Chen, Walt Akers Jefferson Lab.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Proposed 2007 Acquisition Don Holmgren LQCD Project Progress Review May 25-26, 2006 Fermilab.
LQCD Workflow Execution Framework: Models, Provenance, and Fault-Tolerance Luciano Piccoli 1,3, Abhishek Dubey 2, James N. Simone 3, James B. Kowalkowski.
09/02 ID099-1 September 9, 2002Grid Technology Panel Patrick Dreher Technical Panel Discussion: Progress in Developing a Web Services Data Analysis Grid.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
Lattice QCD and GPU-s Robert Edwards, Theory Group Chip Watson, HPC & CIO Jie Chen & Balint Joo, HPC Jefferson Lab TexPoint fonts used in EMF. Read the.
Chroma: An Application of the SciDAC QCD API(s) Bálint Joó School of Physics University of Edinburgh UKQCD Collaboration Soon to be moving to the JLAB.
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower QCD Project Review May 24-25, 2005 Code distribution see
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
SURA BOT 11/5/02 Lattice QCD Stephen J Wallace. SURA BOT 11/5/02 Lattice.
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
1 Cluster Development at Fermilab Don Holmgren All-Hands Meeting Jefferson Lab June 1-2, 2005.
Site Report on Physics Plans and ILDG Usage for US Balint Joo Jefferson Lab.
GPU Programming Shirley Moore CPS 5401 Fall 2013
1 Lattice Quantum Chromodynamics 1- Literature : Lattice QCD, C. Davis Hep-ph/ Burcham and Jobes By Leila Joulaeizadeh 19 Oct
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
What is QCD? Quantum ChromoDynamics is the theory of the strong force
A QCD Grid: 5 Easy Pieces? Richard Kenway University of Edinburgh.
Status and plans at KEK Shoji Hashimoto Workshop on LQCD Software for Blue Gene/L, Boston University, Jan. 27, 2006.
HEP and NP SciDAC projects: Key ideas presented in the SciDAC II white papers Robert D. Ryne.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
U.S. Department of Energy’s Office of Science Midrange Scientific Computing Requirements Jefferson Lab Robert Edwards October 21, 2008.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
LQCD Computing Project Overview
Project Management – Part I
Computational Requirements
Lattice QCD Computing Project Review
Is System X for Me? Cal Ribbens Computer Science Department
Chroma: An Application of the SciDAC QCD API(s)
Presentation transcript:

Lattice QCD and the SciDAC-2 LQCD Computing Project Lattice QCD Workflow Workshop Fermilab, December 18, 2006 Don Holmgren,

Outline Lattice QCD Computing –Introduction –Characteristics –Machines –Job Types and Requirements SciDAC-2 LQCD Computing Project –What and Who –Subprojects –Workflow

What is QCD? Quantum ChromoDynamics is the theory of the strong force –the strong force describes the binding of quarks by gluons to make particles such as neutrons and protons The strong force is one of the four fundamental forces in the Standard Model of Physics– the others are: –Gravity –Electromagnetism –The Weak force

What is Lattice QCD? Lattice QCD is the numerical simulation of QCD –The QCD action, which expresses the strong interaction between quarks mediated by gluons: where the Dirac operator (“dslash”) is given by –Lattice QCD uses discretized space and time –A very simple discretized form of the Dirac operator is where a is the lattice spacing

A quark,  (x), depends upon  (x + a  ) and the local gluon fields U  –  (x) is complex 3x1 vector, and the U  are complex 3x3 matrices. Interactions are computed via matrix algebra –On a supercomputer, the space-time lattice is distributed across all of the nodes

Computing Constraints Lattice QCD codes require: –Excellent single and double precision floating point performance Majority of Flops are consumed by small complex matrix-vector multiplies [SU(3) algebra] –High memory bandwidth (principal bottleneck) –Low latency, high bandwidth communications Typically implemented with MPI or similar message passing APIs On clusters: Infiniband, Myrinet, gigE mesh

Computing Constraints The dominant computation is the repeated inversion of the Dirac operator –Equivalent to inverting large 4-D and 5-D sparse matrices –Conjugate gradient method is used –The current generation of calculations requires on the order of Tflop/s-yrs to produce the intermediate results (“vacuum gauge configurations”) that are used for further analysis –~ 50% of Flops are spent on configuration generation, and ~ 50% on analysis using those configurations

Near Term Requirements Lattice QCD codes typically sustain 30% of the Tflop/s reported by the Top500 Linpack benchmark, typically 20% of peak performance. Planned configuration generation campaigns in the next few years:

Lattice QCD Codes You may have heard of the following codes: –MILC Written by the MIMD Lattice Computation Collaboration C-based, runs on essentially every machine (MPI, shmem,…) –Chroma C++, runs on any MPI (or QMP) machine –CPS (Columbia Physics System) C++, written for for the QCDSP but ported to MPI (or QMP) machines

LQCD Machines Dedicated Machines (USA): –QCDOC (“QCD” On a Chip) – Brookhaven –GigE and Infiniband Clusters – JLab –Myrinet and Infiniband Clusters – Fermilab Shared Facilities (USA): –Cray XT3 – PSC and ORNL –BG/L – UCSD, MIT, BU –Clusters – NCSA, UCSD, PSC, …

Fermilab LQCD Clusters ClusterProcessorNodesMILC performance qcd2.8 GHz P4E, Intel E7210 chipset, 1 GB main memory, Myrinet MFlops/node 0.1 TFlops pion3.2 GHz Pentium 640, Intel E7221 chipset, 1 GB main memory, Infiniband SDR MFlops/node 0.8 TFlops kaon2.0 GHz Dual Opteron, nVidia CK804 chipset, 4 GB main memory, Infiniband DDR MFlops/node 2.2 TFlops

Job Types and Requirements Vacuum Gauge Configuration Generation –Simulations of the QCD vacuum –Creates ensembles of gauge configurations; each ensemble is characterized by lattice spacing, quark masses, and other physics parameters –Ensembles consist of simulation time steps drawn from a sequence (Markov chain) of calculations –Calculations require a large machine (capability computing) delivering O(Tflop/sec) to a single MPI job –An ensemble of configurations typically has O(1000) time steps –Ensembles are used in multiple analysis calculations and are shared with multiple physics groups worldwide

Sample Configuration Generation Stream Currently running at Fermilab: –48^3 x 144 configuration generation (MILC asqtad) –Job characteristics: MPI job uses 1024 processes (256 dual-core dual Opteron nodes) Each time step requires ~ 3.5 hours of computation Configurations are 4.8 Gbytes Output of each job (1 time step) is input to next job Every 5 th time step goes to archival storage, to be used for subsequent physics analysis jobs Very low I/O requirements (9.6 Gbytes every 3.5 hours)

Job Types and Requirements Analysis computing –Gauge configurations are used to generate valence quark propagators –Multiple propagators are calculated from each configuration using several different physics codes –Each propagator generation job is independent of the others Unlike configuration generation can use many simultaneous job streams Jobs require nodes, typically 4-12 hours in length Propagators are larger than configurations (factor of 3 or greater) Moderate I/O requirements (10’s of Gbytes per few hours); lots of non-archival storage required (10’s of Tbytes)

Job Types and Requirements “Tie Ups” –Two point and three point correlation calculations using propagators generated from configurations –Typically small jobs (4-16 nodes) –Heavy I/O requirement - 10’s of Gbytes for jobs lasting O(hour)

SciDAC-2 Computing Project Scientific Discovery through Advanced Computing Five year project, sponsored by DOE Offices of High Energy Physics, Nuclear Physics, and Advanced Scientific Computing Research –$2.2M/year Renewal of previous 5 year project:

SciDAC-2 LQCD Participants Principal Investigator: Robert Sugar, UCSB Participating Institutions and Co-Investigators: Boston University - Richard Brower and Claudio Rebbi Brookhaven National Laboratory - Michael Creutz DePaul University - Massimo DiPierro Fermi National Accelerator Laboratory - Paul Mackenzie Illinois Institute of Technology - Xian-He Sun Indiana University - Steven Gottlieb Massachusetts Institute of Technology - John Negele Thomas Jefferson National Accelerator Facility - David Richards and William (Chip) Watson University of Arizona - Doug Toussaint University of California, Santa Barbara - Robert Sugar (PI) University of North Carolina - Daniel Reed University of Utah - Carleton DeTar Vanderbilt University - Theodore Bapty

SciDAC-2 LQCD Subprojects Machine-Specific Software –Optimizations for multi-core processors –Native implementations of message passing library (QMP) for Infiniband and BlueGene/L –Opteron linear algebra optimizations (XT3, clusters) –Intel SSE3 optimizations –Optimizations for BG/L, QCDOC, new architectures –“Level-3” codes (highly optimized physics kernels)

SciDAC-2 LQCD Subprojects Infrastructure for Application Code –Integration and optimization of QCD API –Documentation and regression testing –User support (training, workshops) QCD Physics Toolbox –Shared algorithms and building blocks –Graphics and visualization –Workflow –Performance analysis –Multigrid algorithms

SciDAC-2 LQCD Subprojects Uniform Computing Environment –Common runtime environment –Data management –Support for GRID and ILDG (International Lattice Data GRID) –Reliability – monitoring and control of large systems –Accounting tools